Download Harnessing the Most to find the Least

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Hilarie Orman
Purple Streak, Inc.
And consulting to
PnP Networks
Eitan Fenson, Rich Howard, Phil Straw
Collaboration for the Common
• People like to donate CPU cycles
– Breaking a cipher (DES)
– Factoring large numbers
– Data sifting for extraterrestrial intelligence
• People like to protect their computers
– Viruses
– Trojan Horses
• People should like to donate CPU cycles for
searching for secure application software
• Disclaimer: we haven’t started this yet!
The Search for Extraterrestrial
Security Configurations
• SETI uses thousands of
volunteer computers for data
mining astrophysical signals
• Easy to sign up and get an
• Can we use this approach to
discover how to securely
configure our computers ?
Least Privilege
• No more capability than is necessary to get the job
• Classic failures surround Unix and root privileges
• Examples:
– File permissions: read but not write
– Temporary files: readable by owner only
– Subjobs only if content and application are trusted
• A multi-dimensional min/max problem
– Too little privilege  too little functionality
– Too much privilege  too little security
How to Rank Privileges?
• Strict ordering: Administrator trumps user*, root
trumps user*
• Subsets: (read, write) trumps (read)
• Set size: (execute, *) trumps (execute, /bin)
• Visibility: writing to the network trumps writing to
hard drive
• Information flow: “create executable with A
permissions” and “A permissions allow network
server connections” leads to “proprietary data
Negative Information
• If the privilege levels are too high, what goes
– Privilege escalation
– Unauthorized information use
– Resource misappropriation
• Detection methods:
Virus scanning
Intrusion detection software
Environment monitoring (storage side-effects)
Execution monitoring (writing files in system areas,
network access, etc.)
– Anything unusual
Learning from Event Records
• Collect application privilege information
– Configuration files, registry settings, observed usage
• Collect monitored data
– Watchers monitor task lists, new files, network
connections, etc.
• Anonymize and index it
• Learning
– Cluster
– Min/max
• Distribute recommended privileges for common
usage patterns
Large-Scale Architecture
• Distributed P2P database: Volunteer machines
contribute their own, anonymized event records
• Higher tier of P2P “Planners” develop data mining
tasks and assign them to volunteers
• Volunteers retrieve required database records and
crunch the data
• Higher tier analyzes results and finds optimal
configuration sets
• Publish results on webpage or in P2P system
Collaborative Black-box
Execution Monitoring
Application Name
Anonymized ID
Upper Tier Analysis and
Computation Plan
Cluster Analysis of Summaries
Assignments for parcels of
machine learning from
database portions
Distributed Learning
Fetch database records
Work assignment
Database pieces
Report station
Learned quantum
Work assignment
Database pieces
Report station
Research Questions
• Can we get enough information from
configurations and monitoring to do this?
– Fine-grained (system call) monitoring necessary?
– Is there enough “ground truth” to learn?
• Will the learning algorithms find useful optimal
• Can we distribute the learning algorithm over
thousands of machines? Will the resulting traffic
create hot spots?
• Are the learning algorithms vulnerable to
What Other Uses?
Grass roots health
Information, trends,
treatments, outcomes
Whole world
online realtime
mapping project;
Coordinated GPS,
webcams, photos
Geneaology through
DNA matching (be
careful about what
you wish for!)