Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P. Papadopoulos 1 Compromising Web Servers • Phishing and malware pages, redirecting user traffic to malicious sites • Almost 90% of Web attacks take place through legitimate sites that have been compromised • Over of 50% of popular search keywords have at least one malicious link to a compromised site • Communicate with clients behind NATs and firewalls 2 Honeypots • A honeypot is a computer security mechanism set to detect, deflect, or, counteract attempts to gain unauthorized access to information systems. • Client-based - Detect malicious servers that attack clients • Server-based - Emulate vulnerable services/software and passively wait for attackers 3 Heat-seeking Honeypots 1. Actively attrack attackers 2. Dynamically generate and deploy honeypot pages 3. Advertise honeypot pages to attackers via search engines 4. Analyze the honeypot logs to identify attack patterns 4 Heat-seeking Honeypot Architecture 5 Attacker queries How attackers find vulnerable Web servers: 1. Brute-force port scanning on the Internet 2. Make use of search engines Identify malicious queries in the Bing log E.g. “phpizabi v0.848b c1 hfp1” 6 Creation of Honeypot Pages • Deployment: - Search engines (Bing and Google ) - Top three results - The crawler fetches the Web pages at these URLs - Strip all Javascript content and rewrite all links of the page to point to the local Ex. http://path/to/honeypot/includes/joomla.php • Install a few common Web applications - Different VM for each app 7 Advertising Honeypot Pages • Submit the URLs of the honeypot pages to the search engines and wait for the crawlers to visit them • Increase the chance of honeypot pages (pagerank) • Add hidden links (not visible to regular users) pointing to the honeypot pages on other public Web sites 8 Detecting Malicious Traffic • Process the log (visitors) and automatically extract attack traffic • Identifying crawlers - Well-known : Google’s crawler uses Mozilla/5.0 (compatible;Googlebot/2.1;+http://www.google.com/bot.html) - Characterizing the behavior of known crawlers - Identifying unknown crawlers • Identify mallicious traffic 9 Identifying Crawlers 1/2 Known crawlers - Look at the user agent string and verify that the IP address matches the organization - A single search engine uses multiple IP addresses to crawl different pages (AS) - Most of crawlers can visit static links - Only one crawler can visit dynamic links 10 Identifying Crawlers 2/2 Unknown crawlers - Other IPs Also grouped by AS numbers - Similar behavior as the know crawlers - Threshold: K = |P| / |C| (P: fraction pages, C: crawlable pages) 11 Identifying Malicious Traffic • Attackers do not target static pages • Try to access non-existent or private files • Whitelist: All the dynamic and static links, for each site • Try to access links not contained in the whitelist - Exact set of links present in the honeypots - Files visited by well behaved crawlers (robots.txt) 12 Results • Experiment duration: 3 months • Place : Washington university CS personal home page • 96 automatically generated honeypot web pages • 4 manually installed Web application software packages • Received 54.477 visits from 6.438 different IPs 13 Distinguishing malicious visits • Low PageRank • One crawler visitors links are dynamic links in the software. 14 Crawler Visits Bi-modal distribution 16 ASes crawling more than 75% of the hosted pages 18 ASes visiting less than 25% of the pages 15 Attacker Visits Joomla 16 Attacker Visits 17 Geographic Locations & Discovery Time 18 Comparing Honeypots 1. Web Server - No hostname - No hyperlinks 2. Vulnerable Software - Pages accessible on the Internet - Search engines can find them 3. Heat-seeking Honeypot Pages - Generated as simple static HTML pages 19 Comparison of the total number of visits and the number of distinct IP addresses 20 Attack Types 21 Attack Types 22 Applying whitelists to the Internet ● Random set of 100 Web Servers whose HTTP access logs are indexed by search engines ● A request is defined to be from an attacker - If not present in the whitelist Link not accessed by a crawler - Not present at all Request results in an HTTP 404 Error 23 Applying whitelists to the Internet 0.25 For 20% of the sites, almost 90% of the traffic came from attackers 24 Conclusion • Heat-seeking Honeypots ○ Deploy honeypot pages corresponding to vulnerable pages ○ Attract Attackers • Detect malicious IP addresses only through their Web access patterns • False-negative rate of at most 1%. 25