Download UNIX web serveri

Document related concepts

URL redirection wikipedia , lookup

Transcript
Web Servers
Guntis Bārzdiņš
Artūrs Lavrenovs
Normunds Grūzītis
What a basic web server does
What a basic web server does
●
Implements the HTTP protocol
●
Listens for HTTP requests from clients (e.g. browsers)
●
Tries to fulfill them with static content from the file system
●
●
A web server itself serves only static files
Receives content from clients (e.g. via HTML forms, incl.
uploading of files)
●
Forwards dynamic content requests for external execution
●
Does other useful tasks via extension modules
Web server market share
F:
H:
I:
M:
Q:
N,O,R:
V:
W:
Apache 1.1, modules supported
Apache supports HTTP/1.1 virtual hosting
Microsoft IIS/4.0 and Active Server Pages
Apache 2.0
Microsoft .NET framework
Code Red worm, Nimda worm, SQL Slammer worm
Google App Engine
Microsoft Hyper-V
Apache
 Constantly has been the most popular server
 Highly configurable and extensible (compiled modules)
 Runs on many operating systems (primarily, on Unix)
 SSL / TSL support
 Supports various authentication schemes
 Flexible URL rewriting and aliasing
 Virtual Hosts
 Custom log files, etc.
Apache modules
 mod_access

Access control based on client hostname or IP address
 mod_alias

Mapping different parts of the host filesystem in the document tree,
and URL redirection
 mod_auth_xxx

Various user authentication approaches (file, dbm, form, etc.)
 mod_autoindex

Automatic directory listings
 mod_cgi

Execution of CGI scripts
Apache modules
 mod_include

Server-parsed documents (SSI)
 mod_mime

Determining document types using file extensions
 mod_proxy

Caching proxy abilities
 mod_rewrite

Powerful URI-to-filename mapping using regular expressions
 mod_usertrack

User tracking using Cookies
Apache modules
 mod_ssl

Provides strong cryptography via the Secure Sockets
Layer (SSL) and Transport Layer Security (TLS) protocols
by the help of the Open Source SSL/TLS toolkit OpenSSL

Since Apache 1.3+ (1998)


Latest version: Apache 2.4 (since 2012)
Private and Public keys

Thawte (thawte.com), Verisign (verisign.com)
Apache modules
 Third-party modules for server-side scripting:
 mod_php
 Executes PHP within Apache
 mod_python
 Executes Python within Apache
 mod_ruby
 Executes Ruby within Apache
 mod_jk
 Connects Tomcat with Apache
 etc.
Compiling and installing Apache
 ./configure

--enable-layout=Debian


--enable-suexec


Compiles, installs and adds the module as .so
--disable-MODULE


Allows you to uid and gid for spawned processes (CGI, SSI)
--enable-MODULE=shared


Use Debian style directory layout
Some modules are compiled by default (e.g. autoindex, cgi) and
have to be disabled explicitly
vs. e.g. apt-get install <module>
Apache directory layout
 Debian

/etc/init.d/apache2






Default directory for scripts
htpasswd, htdigest, htdbm
/usr/lib/apache2/modules/


Log files (access.log, error.log)
/usr/bin/

Default Document Root
/usr/lib/cgi-bin/


Apache configuration files
/var/www/
/var/log/apache2/

Apache control script
/etc/apache2/


Apache modules
/usr/lib/apache2/suexec

CGI wrapper
Apache access log
LogFormat "%v %h %l %u %t \"%r\" %>s %b" common
CustomLog /usr/local/apache/logs/access_log common







%v – virtual host
%h – remote host
%u – user
%t - time
%r – HTTP request
%>s – status code
%b – size
www.atlants.lv 159.148.85.46 - - [21/Nov/2004:17:23:36 +0200]
"GET /index.php?m=5 HTTP/1.1" 200 32257
Apache error log
ErrorLog /usr/local/apache/logs/error_log
LogLevel warn
[Sun Nov 21 09:13:42 2004] [error] PHP Fatal error: Call to undefined function
PN_DBMsgError() in /home/msaule/public_html/referer.
php on line 85
[Sun Nov 21 12:41:09 2004] [error] [client 81.198.145.117] File does not exist:
/home/sms/public_html/favicon.ico
php on line 85
[Sun Nov 21 13:02:50 2004] [error] [client 66.249.66.173] File does not exist:
/home/code/public_html/robots.txt
[Sun Nov 21 13:08:26 2004] [error] [client 81.198.176.114] File does not exist:
/home/refuser2/public_html/_vti_bin/owssvr.dll
[Sun Nov 21 13:08:26 2004] [error] [client 81.198.176.114] File does not exist:
/home/refuser2/public_html/MSOffice/cltreq.asp
Configuring Apache
 Edit httpd.conf
 Check configuration: apachectl configtest
 Restart Apache
 Test changes
http://httpd.apache.org/docs/
Configuring Apache
 Virtual hosts
<VirtualHost *>
ServerName www.jrt.lv
ServerAlias www.jrt.com
CustomLog /usr/local/apache/logs/jrt_access_log common
ErrorLog /usr/local/apache/logs/jrt_error_log
DocumentRoot /home/jrt/public_html
</VirtualHost>
Configuring Apache
 .htaccess (directory-level, read on every request)
AuthType Basic
AuthUserFile /home/someuser/passwd
AuthName "Admin"
require valid-user
 htpasswd
htpasswd -c <password file> <username>
user1:Y90u499mUj6xE
user2:DOrWgcNwzaQUQ
Dynamic content
Browser
Web Server
Script Engine
(PHP, Python, ...)
Database Server
(MySQL, ...)
HTML
PNG
CSS
...
LAMP
●
Linux - Apache - MySQL - PHP
●
The most common web server stack
●
Simple to install and configure
●
Simple to develop web applications
●
Acceptable performance and security
●
apt-get install apache2 mysql-server php5
libapache2-mod-php5
MySQL
●
Unix distributions moving towards MariaDB after the
acquisition of MySQL by Oracle
●
MySQL fork, being led by the original developers of MySQL
●
Fast relation DB implementation
●
Fairly easy to user (app developer)
●
Different storage engines
●
With/without without transactions, memory based, etc.
●
Query caching
●
User quotas
PHP
●
One of the most popular programming languages for
web applications
●
Easy to learn (though, bad coding practices)
●
Interpreted language
●
Functions from Unix libraries and tools
●
Huge amount of ready applications, libraries and
modules
Simple web app
●
●
Create a database
Using the MySQL command prompt accessed by
–
–
–
–
–
–
$ mysql -u root -p
> CREATE DATABASE `example` COLLATE 'utf8_general_ci';
> CREATE TABLE `posts` (...)
> CREATE USER 'example'@'localhost' IDENTIFIED BY
PASSWORD '...'
> GRANT ... ON `example`.* TO 'example'@'localhost';
> INSERT INTO `posts` (`title`,`info`) VALUES ('a','a');
Simple web app
●
Or be lazy and use a web interface like phpMyAdmin
or Adminer
–
Download single file adminer.php
–
Drop it into /var/www/
–
Navigate your browser to http://localhost/adminer.php
–
Do all the tasks in browser without really knowing SQL
Simple web app
●
Create file example.php in /var/www/
●
Write your HTML with PHP code inside
–
Connect to database
–
Select data
–
Show data
●
Your simple web site is ready
●
Navigate your browser to http://localhost/example.php
●
Enjoy result
Simple web app
Simple web app
●
From http://localhost/example.php
Dynamic content
 Webservers cannot create dynamic content by
themselves
 Two options how to server dynamic content

[Apache] modules

CGI / SSI, FastCGI, SCGI, WSGI, ...
 Potentially many programming languages

PHP, Perl, Python, Java, ...

C, C++, shell scripts, ...
CGI - Common Gateway Interface
●
A standard environment for web servers to interface with
external executable programs
●
●
●
For each request, webserver defines set of environment
variables derived from the request and the server configuration
Web server starts the external program in the prepared
environment
●
●
●
Any script or binary executable
No additional libraries required
Sends GET/POST data as standard input
Waits for standard output from executed program, and returns it
to the client
●
With additional HTTP headers
CGI enivronment variables
●
REQUEST_METHOD: name of HTTP method
●
PATH_INFO: path suffix, if appended to URL after program name and a slash
●
PATH_TRANSLATED: corresponding full path as supposed by server, if
PATH_INFO is present
●
SCRIPT_NAME: relative path to the program, like /cgi-bin/script.cgi
●
QUERY_STRING: the part of URL after the ? character (GET)
●
REMOTE_HOST: host name of the client
●
REMOTE_ADDR: IP address of the client (dot-decimal)
●
●
Variables passed by the user agent (HTTP_ACCEPT,
HTTP_ACCEPT_LANGUAGE, HTTP_USER_AGENT, HTTP_COOKIE and
possibly others) contain values of corresponding HTTP headers
Few more
CGI example
#!/bin/bash
echo "Content-type: text/plain"
echo ""
echo "Hello world!"
echo "Today is:" `date`
SSI – Server Side Includes
•
Directives in HTML pages that are evaluated by the server
while the pages are being served
•
Without having to serve the entire page via a CGI program
•
Configure httpd.conf or .htaccess: Options +Includes
•
Two ways to tell Apache which files should be parsed:
•
•
Parse any file with a particular file extension:
•
AddType text/html .shtml
•
AddOutputFilter INCLUDES .shtml
Parse files if they have the execute bit set:
•
XBitHack on
•
For existing files: chmod instead of changing the file name
SSI – Server Side Includes
•
<!--#echo var="DATE_LOCAL" -->
•
<!--#flastmod file="index.html" -->
•
<!--#include virtual="/footer.html" -->
•
<!--#include virtual="/cgi-bin/counter.pl" -->
•
<!--#exec cmd="ls" -->
•
Setting variables
•
Conditional expressions
•
A simple but Turing complete programming language
•
Loops can be implemented via recursive redirects
CGI issues
●
●
Each request forks a new process: a big overhead for
process creation and destruction
All scripts must be interpreted on each request:
another overhead
●
May be reduced by using compiled CGI programs
●
Not scalable
●
Not suitable for modern web servers (needs)
●
Still widely used in embedded systems (e.g. WiFi
router web management consoles) that require
occasional requests
FastCGI
●
One or more persistent processes started (pre-forked)
●
Web server communicates over sockets or TCP
●
Each process serves many requests
●
Performance comparable to modules
●
Facilitates reuse of resources (DB connections, inmemory caching, etc.)
●
Separation of web server and dynamic content system
●
Scalability – deploy processes across a server farm
●
apt-get install libapache2-mod-fastcgi php5-fpm
Other communication methods
●
●
Integrate the dynamic content generation system with
the web server process (Apache modules)
CGI derivatives
●
●
*SGI (web-server gateway interfaces) implement
programming language specific method of
communication between web server and applications
●
●
Simple Common Gateway Interface (SCGI): similar to
FastCGI but is designed to be easier to implement
WSGI – Python, PSGI – Perl, Rack - Ruby
Proxy requests to applications that implement
communication via HTTP
C10K problem
●
●
Dan Kegel, 1999
Web servers should handle 10,000 clients
simultaneously (not the same as 10K requests)
●
Operating system kernel limitations
●
Functionality provided by the operating system
●
Web server design flaws
C10K – OS kernel
●
●
●
Open source nature of Unix kernels allowed to quickly
identify C10K bottlenecks and fix them
Networking related algorithms and data structures in
Unix kernels originally implemented with complexities
O(n|n^2|...) which where fixed to O(1|n)
As a result networking capabilities of Unix kernels are
virtually limitless (limited by hardware resources)
C10K – OS functionality
●
Implemented new scalable I/O event notification
mechanisms (epoll – Linux, kqueue – *BSD)
–
Better performance than traditional poll/select
–
–
●
e.g. on a large number of file descriptors
Can receive all pending event using one system call
AIO – the POSIX asynchronous I/O (AIO) interface –
allows applications to initiate one or more I/O
operations that are performed asynchronously (i.e., in
the background)
●
The application can select to be notified of completion of the
I/O operation in a variety of ways: by delivery of a signal, by
instantiation of a thread, or no notification at all
C10K – web server design
●
Non-blocking I/O for networking and disk
–
●
Many threads
–
●
Don't block waiting on action completion, serve other
requests and wait for notifications about I/O completion
Use all available CPU cores to achieve maximum
concurrency, avoid locking data structures
Each thread serves many requests
–
Don't create thread per request, reuse threads, while some
non-blocking action completes process other requests
C10M problem
●
10 million concurrent connections per server
●
●
Current Unix kernels can't handle that
–
–
–
●
Doubling the CPU speed does not double the number of
open connections
Application thread locks in kernel
Hardware drivers (NIC)
Memory management
Solution: new generation of high load Unix kernels
–
–
–
1 main application per server
Minimize system call amount
Minimize kernel work
nginx
• A C10K webserver
●
●
Apache implements a thread per connection model
nginx does not create a new process/thread per connection
(does not use the thread scheduler as a packet scheduler)
●
●
Typically, one single-threaded worker process per CPU
Each worker can asynchronously handle thousands of concurrent
connections (handles the scheduling itself)
•
Event-driven: event is a new connection
•
Asynchronous: handles interaction for more than one connection at
a time
•
Non-blocking: does not stop disk I/O because the CPU is busy;
works on other events until the I/O is freed up
nginx
●
Efficient CPU usage
●
Less cores needed
●
Small memory footprint per request
●
High-performance
●
●
Thousands connections/requests per second
Often used as front-end to high-load websites
●
Load-balancing (reverse proxy), caching etc.
High-load web systems
●
●
●
Busy dynamic web sites cannot reside in one server
Need some strategy how to split load across multiple
web servers
One possible strategy
–
One entry point, front-end, which receives all requests and
splits the load (e.g. nginx, Varnish)
–
Back-ends process requests from redirected from the frontend (e.g. nginx, Apache)
Varnish
●
Starpniekserveris (proxy server)
–
Reversais
–
Kešojošais
–
Programmējams
●
Slodzes dalītājs (load balancer)
●
Dinamiskā satura ģenerētājs
●
Rīki: žurnalēšana, atkļūdošana, monitorēšana
●
Lietotāji: Facebook, Twitter, WikiLeaks, ThePirateBay
●
Izstrādāts Norvēģijā
Varnish
●
●
●
Fantastiska veiktspēja pat uz lētā gala serveriem –
no 1000 līdz 10000 pieprasījumu uz serveri sekundē
tā ir norma
●
C + labi C programmētāji
●
Izmanto Unix arhitektūras priekšrocības
Pēc «tjūninga» desmitiem tūkstošu pieprasījumu
sekundē, testēšanā pārsniegti 100k/s
Pieprasījuma orientēta domēnspecifiska
konfigurēšanas/programmēšanas valoda VCL
Kešošana
●
Jebkura dinamiskas tīmekļa lapas ģenerēšana ir ļoti
lēna - atkarībā no vides simtiem vai tūkstošiem reižu
lēnāka nekā statiska satura atgriešana
●
●
Jebkurš izstrādes ietvars padara dinamiskas lapas
ģenerēšanu vēl desmitiem vai simtiem reižu lēnāku
●
●
Lētā gala serveris var ģenerēt pāris simtus šādu dinamisku
lapu sekundē
Jau tikai daži desmiti pieprasījumi sekundē
Rupja matemātika: 100x100=10 000 reižu lēnāk kā
statiska lapa
Kešošana
●
●
●
●
Ideāli būtu atgriezt dinamisku saturu ar veiktspēju
līdzīgu statiskām lapām
Saturu, kas noteiktā laika intervālā būtiski nemainās,
iespējams uz laiku saglabāt, lai atkalizmantotu
Cietā diska izmantošana lēna, labā prakse izmantot
tikai RAM vai servera SSD visa kešotā satura
glabāšanai
Katram konkrētam gadījumam jāveido kešošanas
stratēģija, kas var būt ļoti subjektīva
Varnish kešošana
●
●
Pēc pieprasījuma adreses (pilnas vai regulāras
izteiksmes) var noteikt, kurus pieprasījumus kešot,
cik ilgi konkrētu elementu kešot vai nekešot
Reklamējas, ka var paātrināt lapas atgriešanu no
simtiem līdz tūkstošiem reižu, t.i., tikai aptuveni līdz
10 reizēm lēnāk nekā statisks saturs
●
Ātrs, salīdzinoši ar citām kešošanas pieejām
DSL VCL
Vienkārša sintakse (līdzīga C), kas tiek notranslēta
uz C un tad nokompilēts uz mašīnkodu
●
●
=, ==, !=, ~, !~, !, &&, ||, +, “string”
●
if () {} else {}, set, unset, return
9 subrutīnas, kas ir dažādi katra pieprasījuma
apstrādes posmi, kurās var kaut ko ietekmēt
●
Tikai predefinēti objekti - client, server, req, bereq,
beresp, obj, resp
●
sub vcl_recv {
if (req.request == "GET" && req.url ~ “\.js$”) {
return (lookup); }
}
VCL apstrādes arhitektūra
Integrēšana
●
Fiksētais kešošanas laiks var nebūt optimāls
●
●
●
Saturs var mainīties biežāk par uzstādīto laiku - lietotāji dabū vecu
informāciju
Retāk – serveri veic nevajadzīgu darbu
Risinājums – jāpaziņo serverim, ka saturs ir jāatjaunina
acl purge { "192.168.0.0"/24; }
sub vcl_recv { if (req.request == "PURGE" ) {
if (!client.ip ~ purge) { error 405 "Not allowed."; }
(lookup);
}}
sub vcl_hit { if (req.request == "PURGE") {
purge;
error 200 "Purged.";
}}
return
Dinamiskā satura ģenerēšana ESI
●
Bieži vien tīmekļa lapas sastāv no blokiem, kuru
mainība ir dažāda
●
●
Vai arī ir neliels informācijas bloks, kas atbilst katram
lietotājam (piemēram, “Sveiks, [Jāni Bērziņ], Tev ir [0]
jauns ziņas”)
Mēs to varam ielādēt pēc lapas ielādes, izmantojot
JSON vai arī ģenerēt saturu ar Varnish
<TABLE><TR><esi:include src=”sveiks.html”/></TR>
<TR><TD><esi:include src=”index.html”/></TD>
<TD><esi:include src=”article.html”/></TD></TR>
</TABLE>
●
Varnish parsē <esi> birkas un saliek elementus kopā, visi
elementi konfigurēti un kešoti kā neatkarīgi
Slodzes dalīšana
●
Vienu adresi var apstrādāt vairāki ar bakendi
●
Dažādus url var apstrādāt dažādi bakendi
●
Monitorēšana
●
●
●
●
Beigto serveru atslēgšana (restart, upgrade, repair)
●
Atdzīvojušos serveru pieslēgšana atpakaļ (arī jauni)
Faktiski nozīmē, ka var lietot kaudzi LĒTU desktop
grade dzelžu dinamiskā satura ģenerēšanai
Ja pievienojam vēl vienu frontend, tad iegūstam
augstu, bet lētu bojājumpiecietība (fault tolerance)
Ja izmantojam NoSQL vai kā savādāk iegūstam
replicētu datubāzi, tad nav nepieciešami dārgi serveri
vispār
Varnish lietojums Latvijā
$ curl -I www.tvnet.lv
●
$ curl -I www.delfi.lv
●
HTTP/1.1 200 OK
●
HTTP/1.1 200 OK
●
Server: Apache
●
X-Fe-Node: nuffy
●
Content-type: text/html; charset=utf-8
●
Server: lighttpd/1.4.31 (PLD Linux)
●
Content-Length: 159097
●
Date: Wed, 07 Nov 2012 20:20:58 GMT
●
X-Varnish: 734492112 734450241
●
Age: 58
●
Via: 1.1 varnish
●
Connection: keep-alive
●
●
●
●
●
●
●
Last-Modified: Wed, 07 Nov 2012 20:09:08
GMT
Expires: Wed, 07 Nov 2012 20:10:08 GMT
Cache-Control: max-age=60
Vary: Accept-Encoding
Content-Type: text/html; charset=UTF-8
Content-Length: 185924
Date: Wed, 07 Nov 2012 20:10:15 GMT
●
X-Varnish: 2025605055 2025545136
●
Age: 67
●
Via: 1.1 varnish
●
Connection: keep-alive
Situācija šobrīd
●
Standarta tīmekļa izstrādes risinājums ir HTTP
serveris un kāda klasiska dinamiskā satura
ģenerējošā sistēma (PHP, ASP, Python u.c.),
pastāv problēmas:
●
●
●
●
Ilglaicīgie pieprasījumi un pastāvīgie savienojumi
Vienlaicīgi apkalpojamo klientu skaits
Savietojamība ar citām tehnoloģijām
Nākotnes attīstības iespējas
Notikumvirzītie programmēšanas ietvari
Ideja un realizācija nav jauni (Python
Twisted, Perl Object Environment, Ruby
EventMachine, Node.js)
● Maza izplatība tīmekļa risinājumos
● Risina standarta tehnoloģiju problēmas
● Reaktora projektējums, C10K problēma
● Ļauj tīmekļa programmētājiem veidot tīkla
risinājumus
●