Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Architecting, Building and Deploying Successful Commercial Websites Gist.com case study Paul Finster – Chief Technology Officer Dave Ekhaus - Director, Platform Engineering NYU Feb 29, 2000 Page 1 Architecting, Building and Deploying Agenda Page 2 Part I – Hardware Configurations Server Farms Databases Paradigms Part II – Software Technology Architectural elements Relational database Java Server Pages (JSP) Java Beans Testing & Deploying Part III - Questions & Answers Part I Hardware Configurations Page 3 The Internet is here! Informational websites are big today! Even 1% of Yahoo! traffic is a lot of traffic Gist.com runs tv.yahoo.com on 6 servers Like e-Commerce, informational websites are mission critical applications for those business and individuals that rely on it Page 4 Yahoo! Snap. MarketWatch Gist.com These are enterprise class applications! Denial of Service attacks proved popular need What is the Commercial Website landscape? The scale and dynamic nature of the web changes everything Common platforms in use TV Listings – built in-house Generic Content Applications Page 5 XML, WAP, HTTP XSL, CSS Custom Applications Possibly 100’s of thousands of hits per day Dynamic customized content Huge peaks during certain times of day Newsfeeds What is the right Hardware Architecture? Two major hardware philosophies/paradigms Page 6 Many cheap redundant machines Example: Yahoo.com 100’s of Intel BSD machines with specially modify Apache web server Content stored in huge memory caches Cost Estimate: $2,000 per server Few expensive highly-reliable machines Example: IWON.com 12 High-end Sun Solaris web server Content stored in 2 parallel Oracle databases running on Sun E10000 servers Cost Estimate: $20,000-$100,000 per server Common Hardware Requirements Co-location at data centers Hardware Page 7 Exodus GlobalCenter Level3 AboveNet Load Balancing: Cisco,F5, Radware Application level switches Hi-speed virtual networks Firewalls Network Monitoring software Enterprise Storage devices Typical N-tier Hardware Architecture Firewall Load Balancer Web Server Web Server Web Server Database Server Page 8 Web Server Web Server Database Server Enterprise Storage Legacy Applications (if any) Gist’s Hardware Architecture Cisco PIX Firewall NT Web Server NT Web Server RadWare: WSD Load Balancer NT Web Server SQL Server 7.0 Database Server NT Web Server SQL Server 7.0 Database Server Enterprise Storage EMC Page 9 NT Web Server ISAPI DLL Part I I Software Technology Page 10 Gist’s Application Framework Website Production TV Listings GRID Gist Dev Tools Java Beans Other Applications Bulletin Boards Process Step Custom App Scripting Tool: JSP Templates Cookie-based Server-side Sessions Services & APIs Data Sources Page 11 System Independence JSP Adapter OS Drivers Web Drivers NT Solaris IIS NSAPI SQL Abstraction Content Interface JDBC Drivers Oracle & Sybase SQL Server & Informix Article Drivers File system Java Code Samples Page 12 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <%@ page import="gist.external.gistcom.*,gist.*" %> <jsp:useBean id="adf" scope="session" class="gist.internal.publishing.ADFObject"></jsp:useBean> <% UserObject u = UserObject.getUser (request,response); if (u.isDefault()) { response.sendRedirect ("/tv/login.jsp?nexturl=/tv/channels.jsp"); return; } String nexturl = request.getParameter("nexturl"); if ( nexturl == null ) nexturl = "/tv/channels.jsp"; %> <%@ include file="/tv/templates/global.jsp" %> . . Channel channelsDisplay [] = u.getSortedChannels (); String c_name = null ; for(int i = 0; i < channelsDisplay.length; i++) { if (channelsDisplay[i].isVisible() ) { chanchecked = " CHECKED "; } else { chanchecked = ""; } } Technology Architecting Page 13 Architecture Requirements Page 14 Scalability - performance, growth Security - authentication, access control, privacy Management - monitoring, dynamic configuration Availability - fault tolerance Stability - data integrity Portability - OS, DB, WS independence Extensibility - ability to adapt to changes in technology Integration - integration of RDBMS and legacy systems Scalability 1000’s of “transactions” per minute Sub-second response time Database connection pooling Performance Concurrency – multithreading Inherent Page 15 support in Java Growth Load balancing Want to be able to “Throw” hardware at the problem Security Authentication Access Control What is the user permitted to do? Ordering PPV over the web; credit card numbers Attributes of our UserObject Privacy (if required) Page 16 Identifying the user via Cookies Architecture supports “cookie-less” mode with URL re-writing of session parameters Encryption - RC4, MD5 SSL URL rewriting Management Monitoring Dynamic Configuration Page 17 Consistent logging and reporting of system activity One way: Extensive use of site-wide email diagnostics Enterprise Integration – SNMP Integration with load balancer Rebooting crashed servers (NT primarily) Automated Manual (if all else fails) Adding/Removing new features on-the-fly Incremental updating of site content Incremental Database updates Availability 100% Availability Fault Tolerance Page 18 24x7x365 Operations Hardware solutions Backup servers Software solutions Dynamic database connections Stability Data Integrity Protection Isolation of subsystems and application execution Java sandbox and exception handling Resource Recovery Page 19 Support for “transactions” Redundant databases Database Connectivity System Resources – memory Java memory garbage collection Portability OS independence DB independence SQL Server, Sybase, Oracle, Informix WS independence Page 20 NT Various flavors of UNIX LINUX Netscape Enterprise Server, Microsoft IIS, Apache Extensibility Adapt to Changes in Business How well does the architecture allow you to change the content or navigation of your commercial website? Does the architecture support your current legacy systems? Does the architecture provide for Content and/or Editorial changes? Adapt to Changes in Technology How quickly can you leverage new standards? Page 21 XML WAP WDL Integration Partner Advertising Statistical Processing/Analysis Partner cookies versus Gist.com cookies URL links back and forth between partners Billing Partners Page 22 MarketWave statistics Navigation controls Co-branded websites with differing ad serving ratios Measuring Page views What we’ve learned Prototype ASAP in order to discover architectural dependencies Database statements Test more: Then test again! Keep objects as light as possible Never store moving data Member Age vs. Brithdate Channels Change: Excluded channels User Migration is HARD! Page 23 Be specific in your SQL Incremental vs. batch Get training Technology Building Page 24 Development Prototypes Work closely with partners to determine functionality Prototype Deep, not Wide Process Small Development Teams (2-4 people) Include: Designers, Technical producers Develop Components in Parallel Frequent Releases Page 25 3-6 day development cycles Scoping - Controlled Feature Set Technology Testing & Deploying Page 26 Testing Quality Automated Tools Repeatability Consistent and Isolated Environment Metrics Page 27 Unit Test - thread safety, code coverage Smoke Test - quick validation BVT - build validation test Full Functional Test Regression Test - consistent functionality Load Test - high availability Benchmark - by Platform Installation Test - by Platform Measure real world scenarios Load test specific subsystems Deploying Page 28 Capacity Planning - sizing exercise Beta Testing - early, well defined subset Focus Groups – early feedback Performance - simulating real world load Benchmarking - critical areas to measure Maintenance - staging environment, versioning Capacity Planning Sizing Exercise Page 29 What workloads run at each node? What hardware is needed to maintain service due to workload growth? How many more users can each existing server support? How will server utilization be impacted if the number of transactions increase by n%? What are those “transactions” doing? Performance Simulating Real World Load “Transactions” per minute Database access requirements Legacy connection requirements Networking requirements Principles of Algorithms really matter Page 30 This is a Challenge Analyze existing system (web server logs) Forecast activity by looking at competitors Number of registered users Performance (continued) Page 31 Scalability and Fail-over are required for 24x7x365 availability Determine appropriate hardware architecturemaximum acceptable response time, target server CPU utilization at 80% (leave room for growth) Determine # and type of transactionsreading web pages, executing a query, updating a database, searching, sorting Benchmarking “Transactions” Home Page Grid Page Pre-compile pages if possible User Registration Transactions Page 32 Remove archive links Soaps Updates Pages Many hits, as fast as possible Article Pages Many hits, as light as possible As clear as possible The answer to many problems is “caching”! Maintenance Staging Environment Versioning Page 33 Mirrored hardware/software Separate Database Migration strategy Component Version Control Change Management Part I I I Questions & Answers Page 34 Architecting, Building and Deploying Successful Commercial Websites Paul Finster – Chief Technology Officer [email protected] Dave Ekhaus - Director, Platform Engineering [email protected] Page 35 NYU Feb 29, 2000