Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Web Servers Guntis Bārzdiņš Artūrs Lavrenovs Normunds Grūzītis What a basic web server does What a basic web server does ● Implements the HTTP protocol ● Listens for HTTP requests from clients (e.g. browsers) ● Tries to fulfill them with static content from the file system ● ● A web server itself serves only static files Receives content from clients (e.g. via HTML forms, incl. uploading of files) ● Forwards dynamic content requests for external execution ● Does other useful tasks via extension modules Web server market share F: H: I: M: Q: N,O,R: V: W: Apache 1.1, modules supported Apache supports HTTP/1.1 virtual hosting Microsoft IIS/4.0 and Active Server Pages Apache 2.0 Microsoft .NET framework Code Red worm, Nimda worm, SQL Slammer worm Google App Engine Microsoft Hyper-V Apache Constantly has been the most popular server Highly configurable and extensible (compiled modules) Runs on many operating systems (primarily, on Unix) SSL / TSL support Supports various authentication schemes Flexible URL rewriting and aliasing Virtual Hosts Custom log files, etc. Apache modules mod_access Access control based on client hostname or IP address mod_alias Mapping different parts of the host filesystem in the document tree, and URL redirection mod_auth_xxx Various user authentication approaches (file, dbm, form, etc.) mod_autoindex Automatic directory listings mod_cgi Execution of CGI scripts Apache modules mod_include Server-parsed documents (SSI) mod_mime Determining document types using file extensions mod_proxy Caching proxy abilities mod_rewrite Powerful URI-to-filename mapping using regular expressions mod_usertrack User tracking using Cookies Apache modules mod_ssl Provides strong cryptography via the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols by the help of the Open Source SSL/TLS toolkit OpenSSL Since Apache 1.3+ (1998) Latest version: Apache 2.4 (since 2012) Private and Public keys Thawte (thawte.com), Verisign (verisign.com) Apache modules Third-party modules for server-side scripting: mod_php Executes PHP within Apache mod_python Executes Python within Apache mod_ruby Executes Ruby within Apache mod_jk Connects Tomcat with Apache etc. Compiling and installing Apache ./configure --enable-layout=Debian --enable-suexec Compiles, installs and adds the module as .so --disable-MODULE Allows you to uid and gid for spawned processes (CGI, SSI) --enable-MODULE=shared Use Debian style directory layout Some modules are compiled by default (e.g. autoindex, cgi) and have to be disabled explicitly vs. e.g. apt-get install <module> Apache directory layout Debian /etc/init.d/apache2 Default directory for scripts htpasswd, htdigest, htdbm /usr/lib/apache2/modules/ Log files (access.log, error.log) /usr/bin/ Default Document Root /usr/lib/cgi-bin/ Apache configuration files /var/www/ /var/log/apache2/ Apache control script /etc/apache2/ Apache modules /usr/lib/apache2/suexec CGI wrapper Apache access log LogFormat "%v %h %l %u %t \"%r\" %>s %b" common CustomLog /usr/local/apache/logs/access_log common %v – virtual host %h – remote host %u – user %t - time %r – HTTP request %>s – status code %b – size www.atlants.lv 159.148.85.46 - - [21/Nov/2004:17:23:36 +0200] "GET /index.php?m=5 HTTP/1.1" 200 32257 Apache error log ErrorLog /usr/local/apache/logs/error_log LogLevel warn [Sun Nov 21 09:13:42 2004] [error] PHP Fatal error: Call to undefined function PN_DBMsgError() in /home/msaule/public_html/referer. php on line 85 [Sun Nov 21 12:41:09 2004] [error] [client 81.198.145.117] File does not exist: /home/sms/public_html/favicon.ico php on line 85 [Sun Nov 21 13:02:50 2004] [error] [client 66.249.66.173] File does not exist: /home/code/public_html/robots.txt [Sun Nov 21 13:08:26 2004] [error] [client 81.198.176.114] File does not exist: /home/refuser2/public_html/_vti_bin/owssvr.dll [Sun Nov 21 13:08:26 2004] [error] [client 81.198.176.114] File does not exist: /home/refuser2/public_html/MSOffice/cltreq.asp Configuring Apache Edit httpd.conf Check configuration: apachectl configtest Restart Apache Test changes http://httpd.apache.org/docs/ Configuring Apache Virtual hosts <VirtualHost *> ServerName www.jrt.lv ServerAlias www.jrt.com CustomLog /usr/local/apache/logs/jrt_access_log common ErrorLog /usr/local/apache/logs/jrt_error_log DocumentRoot /home/jrt/public_html </VirtualHost> Configuring Apache .htaccess (directory-level, read on every request) AuthType Basic AuthUserFile /home/someuser/passwd AuthName "Admin" require valid-user htpasswd htpasswd -c <password file> <username> user1:Y90u499mUj6xE user2:DOrWgcNwzaQUQ Dynamic content Browser Web Server Script Engine (PHP, Python, ...) Database Server (MySQL, ...) HTML PNG CSS ... LAMP ● Linux - Apache - MySQL - PHP ● The most common web server stack ● Simple to install and configure ● Simple to develop web applications ● Acceptable performance and security ● apt-get install apache2 mysql-server php5 libapache2-mod-php5 MySQL ● Unix distributions moving towards MariaDB after the acquisition of MySQL by Oracle ● MySQL fork, being led by the original developers of MySQL ● Fast relation DB implementation ● Fairly easy to user (app developer) ● Different storage engines ● With/without without transactions, memory based, etc. ● Query caching ● User quotas PHP ● One of the most popular programming languages for web applications ● Easy to learn (though, bad coding practices) ● Interpreted language ● Functions from Unix libraries and tools ● Huge amount of ready applications, libraries and modules Simple web app ● ● Create a database Using the MySQL command prompt accessed by – – – – – – $ mysql -u root -p > CREATE DATABASE `example` COLLATE 'utf8_general_ci'; > CREATE TABLE `posts` (...) > CREATE USER 'example'@'localhost' IDENTIFIED BY PASSWORD '...' > GRANT ... ON `example`.* TO 'example'@'localhost'; > INSERT INTO `posts` (`title`,`info`) VALUES ('a','a'); Simple web app ● Or be lazy and use a web interface like phpMyAdmin or Adminer – Download single file adminer.php – Drop it into /var/www/ – Navigate your browser to http://localhost/adminer.php – Do all the tasks in browser without really knowing SQL Simple web app ● Create file example.php in /var/www/ ● Write your HTML with PHP code inside – Connect to database – Select data – Show data ● Your simple web site is ready ● Navigate your browser to http://localhost/example.php ● Enjoy result Simple web app Simple web app ● From http://localhost/example.php Dynamic content Webservers cannot create dynamic content by themselves Two options how to server dynamic content [Apache] modules CGI / SSI, FastCGI, SCGI, WSGI, ... Potentially many programming languages PHP, Perl, Python, Java, ... C, C++, shell scripts, ... CGI - Common Gateway Interface ● A standard environment for web servers to interface with external executable programs ● ● ● For each request, webserver defines set of environment variables derived from the request and the server configuration Web server starts the external program in the prepared environment ● ● ● Any script or binary executable No additional libraries required Sends GET/POST data as standard input Waits for standard output from executed program, and returns it to the client ● With additional HTTP headers CGI enivronment variables ● REQUEST_METHOD: name of HTTP method ● PATH_INFO: path suffix, if appended to URL after program name and a slash ● PATH_TRANSLATED: corresponding full path as supposed by server, if PATH_INFO is present ● SCRIPT_NAME: relative path to the program, like /cgi-bin/script.cgi ● QUERY_STRING: the part of URL after the ? character (GET) ● REMOTE_HOST: host name of the client ● REMOTE_ADDR: IP address of the client (dot-decimal) ● ● Variables passed by the user agent (HTTP_ACCEPT, HTTP_ACCEPT_LANGUAGE, HTTP_USER_AGENT, HTTP_COOKIE and possibly others) contain values of corresponding HTTP headers Few more CGI example #!/bin/bash echo "Content-type: text/plain" echo "" echo "Hello world!" echo "Today is:" `date` SSI – Server Side Includes • Directives in HTML pages that are evaluated by the server while the pages are being served • Without having to serve the entire page via a CGI program • Configure httpd.conf or .htaccess: Options +Includes • Two ways to tell Apache which files should be parsed: • • Parse any file with a particular file extension: • AddType text/html .shtml • AddOutputFilter INCLUDES .shtml Parse files if they have the execute bit set: • XBitHack on • For existing files: chmod instead of changing the file name SSI – Server Side Includes • <!--#echo var="DATE_LOCAL" --> • <!--#flastmod file="index.html" --> • <!--#include virtual="/footer.html" --> • <!--#include virtual="/cgi-bin/counter.pl" --> • <!--#exec cmd="ls" --> • Setting variables • Conditional expressions • A simple but Turing complete programming language • Loops can be implemented via recursive redirects CGI issues ● ● Each request forks a new process: a big overhead for process creation and destruction All scripts must be interpreted on each request: another overhead ● May be reduced by using compiled CGI programs ● Not scalable ● Not suitable for modern web servers (needs) ● Still widely used in embedded systems (e.g. WiFi router web management consoles) that require occasional requests FastCGI ● One or more persistent processes started (pre-forked) ● Web server communicates over sockets or TCP ● Each process serves many requests ● Performance comparable to modules ● Facilitates reuse of resources (DB connections, inmemory caching, etc.) ● Separation of web server and dynamic content system ● Scalability – deploy processes across a server farm ● apt-get install libapache2-mod-fastcgi php5-fpm Other communication methods ● ● Integrate the dynamic content generation system with the web server process (Apache modules) CGI derivatives ● ● *SGI (web-server gateway interfaces) implement programming language specific method of communication between web server and applications ● ● Simple Common Gateway Interface (SCGI): similar to FastCGI but is designed to be easier to implement WSGI – Python, PSGI – Perl, Rack - Ruby Proxy requests to applications that implement communication via HTTP C10K problem ● ● Dan Kegel, 1999 Web servers should handle 10,000 clients simultaneously (not the same as 10K requests) ● Operating system kernel limitations ● Functionality provided by the operating system ● Web server design flaws C10K – OS kernel ● ● ● Open source nature of Unix kernels allowed to quickly identify C10K bottlenecks and fix them Networking related algorithms and data structures in Unix kernels originally implemented with complexities O(n|n^2|...) which where fixed to O(1|n) As a result networking capabilities of Unix kernels are virtually limitless (limited by hardware resources) C10K – OS functionality ● Implemented new scalable I/O event notification mechanisms (epoll – Linux, kqueue – *BSD) – Better performance than traditional poll/select – – ● e.g. on a large number of file descriptors Can receive all pending event using one system call AIO – the POSIX asynchronous I/O (AIO) interface – allows applications to initiate one or more I/O operations that are performed asynchronously (i.e., in the background) ● The application can select to be notified of completion of the I/O operation in a variety of ways: by delivery of a signal, by instantiation of a thread, or no notification at all C10K – web server design ● Non-blocking I/O for networking and disk – ● Many threads – ● Don't block waiting on action completion, serve other requests and wait for notifications about I/O completion Use all available CPU cores to achieve maximum concurrency, avoid locking data structures Each thread serves many requests – Don't create thread per request, reuse threads, while some non-blocking action completes process other requests C10M problem ● 10 million concurrent connections per server ● ● Current Unix kernels can't handle that – – – ● Doubling the CPU speed does not double the number of open connections Application thread locks in kernel Hardware drivers (NIC) Memory management Solution: new generation of high load Unix kernels – – – 1 main application per server Minimize system call amount Minimize kernel work nginx • A C10K webserver ● ● Apache implements a thread per connection model nginx does not create a new process/thread per connection (does not use the thread scheduler as a packet scheduler) ● ● Typically, one single-threaded worker process per CPU Each worker can asynchronously handle thousands of concurrent connections (handles the scheduling itself) • Event-driven: event is a new connection • Asynchronous: handles interaction for more than one connection at a time • Non-blocking: does not stop disk I/O because the CPU is busy; works on other events until the I/O is freed up nginx ● Efficient CPU usage ● Less cores needed ● Small memory footprint per request ● High-performance ● ● Thousands connections/requests per second Often used as front-end to high-load websites ● Load-balancing (reverse proxy), caching etc. High-load web systems ● ● ● Busy dynamic web sites cannot reside in one server Need some strategy how to split load across multiple web servers One possible strategy – One entry point, front-end, which receives all requests and splits the load (e.g. nginx, Varnish) – Back-ends process requests from redirected from the frontend (e.g. nginx, Apache) Varnish ● Starpniekserveris (proxy server) – Reversais – Kešojošais – Programmējams ● Slodzes dalītājs (load balancer) ● Dinamiskā satura ģenerētājs ● Rīki: žurnalēšana, atkļūdošana, monitorēšana ● Lietotāji: Facebook, Twitter, WikiLeaks, ThePirateBay ● Izstrādāts Norvēģijā Varnish ● ● ● Fantastiska veiktspēja pat uz lētā gala serveriem – no 1000 līdz 10000 pieprasījumu uz serveri sekundē tā ir norma ● C + labi C programmētāji ● Izmanto Unix arhitektūras priekšrocības Pēc «tjūninga» desmitiem tūkstošu pieprasījumu sekundē, testēšanā pārsniegti 100k/s Pieprasījuma orientēta domēnspecifiska konfigurēšanas/programmēšanas valoda VCL Kešošana ● Jebkura dinamiskas tīmekļa lapas ģenerēšana ir ļoti lēna - atkarībā no vides simtiem vai tūkstošiem reižu lēnāka nekā statiska satura atgriešana ● ● Jebkurš izstrādes ietvars padara dinamiskas lapas ģenerēšanu vēl desmitiem vai simtiem reižu lēnāku ● ● Lētā gala serveris var ģenerēt pāris simtus šādu dinamisku lapu sekundē Jau tikai daži desmiti pieprasījumi sekundē Rupja matemātika: 100x100=10 000 reižu lēnāk kā statiska lapa Kešošana ● ● ● ● Ideāli būtu atgriezt dinamisku saturu ar veiktspēju līdzīgu statiskām lapām Saturu, kas noteiktā laika intervālā būtiski nemainās, iespējams uz laiku saglabāt, lai atkalizmantotu Cietā diska izmantošana lēna, labā prakse izmantot tikai RAM vai servera SSD visa kešotā satura glabāšanai Katram konkrētam gadījumam jāveido kešošanas stratēģija, kas var būt ļoti subjektīva Varnish kešošana ● ● Pēc pieprasījuma adreses (pilnas vai regulāras izteiksmes) var noteikt, kurus pieprasījumus kešot, cik ilgi konkrētu elementu kešot vai nekešot Reklamējas, ka var paātrināt lapas atgriešanu no simtiem līdz tūkstošiem reižu, t.i., tikai aptuveni līdz 10 reizēm lēnāk nekā statisks saturs ● Ātrs, salīdzinoši ar citām kešošanas pieejām DSL VCL Vienkārša sintakse (līdzīga C), kas tiek notranslēta uz C un tad nokompilēts uz mašīnkodu ● ● =, ==, !=, ~, !~, !, &&, ||, +, “string” ● if () {} else {}, set, unset, return 9 subrutīnas, kas ir dažādi katra pieprasījuma apstrādes posmi, kurās var kaut ko ietekmēt ● Tikai predefinēti objekti - client, server, req, bereq, beresp, obj, resp ● sub vcl_recv { if (req.request == "GET" && req.url ~ “\.js$”) { return (lookup); } } VCL apstrādes arhitektūra Integrēšana ● Fiksētais kešošanas laiks var nebūt optimāls ● ● ● Saturs var mainīties biežāk par uzstādīto laiku - lietotāji dabū vecu informāciju Retāk – serveri veic nevajadzīgu darbu Risinājums – jāpaziņo serverim, ka saturs ir jāatjaunina acl purge { "192.168.0.0"/24; } sub vcl_recv { if (req.request == "PURGE" ) { if (!client.ip ~ purge) { error 405 "Not allowed."; } (lookup); }} sub vcl_hit { if (req.request == "PURGE") { purge; error 200 "Purged."; }} return Dinamiskā satura ģenerēšana ESI ● Bieži vien tīmekļa lapas sastāv no blokiem, kuru mainība ir dažāda ● ● Vai arī ir neliels informācijas bloks, kas atbilst katram lietotājam (piemēram, “Sveiks, [Jāni Bērziņ], Tev ir [0] jauns ziņas”) Mēs to varam ielādēt pēc lapas ielādes, izmantojot JSON vai arī ģenerēt saturu ar Varnish <TABLE><TR><esi:include src=”sveiks.html”/></TR> <TR><TD><esi:include src=”index.html”/></TD> <TD><esi:include src=”article.html”/></TD></TR> </TABLE> ● Varnish parsē <esi> birkas un saliek elementus kopā, visi elementi konfigurēti un kešoti kā neatkarīgi Slodzes dalīšana ● Vienu adresi var apstrādāt vairāki ar bakendi ● Dažādus url var apstrādāt dažādi bakendi ● Monitorēšana ● ● ● ● Beigto serveru atslēgšana (restart, upgrade, repair) ● Atdzīvojušos serveru pieslēgšana atpakaļ (arī jauni) Faktiski nozīmē, ka var lietot kaudzi LĒTU desktop grade dzelžu dinamiskā satura ģenerēšanai Ja pievienojam vēl vienu frontend, tad iegūstam augstu, bet lētu bojājumpiecietība (fault tolerance) Ja izmantojam NoSQL vai kā savādāk iegūstam replicētu datubāzi, tad nav nepieciešami dārgi serveri vispār Varnish lietojums Latvijā $ curl -I www.tvnet.lv ● $ curl -I www.delfi.lv ● HTTP/1.1 200 OK ● HTTP/1.1 200 OK ● Server: Apache ● X-Fe-Node: nuffy ● Content-type: text/html; charset=utf-8 ● Server: lighttpd/1.4.31 (PLD Linux) ● Content-Length: 159097 ● Date: Wed, 07 Nov 2012 20:20:58 GMT ● X-Varnish: 734492112 734450241 ● Age: 58 ● Via: 1.1 varnish ● Connection: keep-alive ● ● ● ● ● ● ● Last-Modified: Wed, 07 Nov 2012 20:09:08 GMT Expires: Wed, 07 Nov 2012 20:10:08 GMT Cache-Control: max-age=60 Vary: Accept-Encoding Content-Type: text/html; charset=UTF-8 Content-Length: 185924 Date: Wed, 07 Nov 2012 20:10:15 GMT ● X-Varnish: 2025605055 2025545136 ● Age: 67 ● Via: 1.1 varnish ● Connection: keep-alive Situācija šobrīd ● Standarta tīmekļa izstrādes risinājums ir HTTP serveris un kāda klasiska dinamiskā satura ģenerējošā sistēma (PHP, ASP, Python u.c.), pastāv problēmas: ● ● ● ● Ilglaicīgie pieprasījumi un pastāvīgie savienojumi Vienlaicīgi apkalpojamo klientu skaits Savietojamība ar citām tehnoloģijām Nākotnes attīstības iespējas Notikumvirzītie programmēšanas ietvari Ideja un realizācija nav jauni (Python Twisted, Perl Object Environment, Ruby EventMachine, Node.js) ● Maza izplatība tīmekļa risinājumos ● Risina standarta tehnoloģiju problēmas ● Reaktora projektējums, C10K problēma ● Ļauj tīmekļa programmētājiem veidot tīkla risinājumus ●