Download PPT - Rice University Campus Wiki

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Fayé A. Briggs, PhD
Adjunct Professor and Intel Fellow(Retired)
Rice University
Material derived from other sources and “Mining Massive Datasets” from:
Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University
http://www.mmds.org


Text Book: Mining of Massive Datasets, by Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman
Review of Book Chapters

Research Materials:

Systems Architecture for Big Data and Analytics:



A Big Data Architecture for Large Scale Security Monitoring, by Samuel Marchal∗†, Xiuyan Jiang‡, Radu State∗, Thomas Engel
TBD
Databases & Tools:



Hadoop & HDFS, Hive, SPARK, Map-Reduce
Google Big Table & GoogleFS, Google Cluster
Experiences with MapReduce

Programming Approaches to Big Data Analytics:

Analytics Algorithms and Applications:



OpenMP, MPI, etc (TBD)
GraphX: Unifying Table and Graph Analytics, Joseph Gonzalez
Analytics for all – Challenges in analytics applications

Machine Learning Review:

Modeling and Detection Techniques for Counter-Terror Social Network Analysis and Intent Recognition, by Clifford
Weinstein, William Campbell, Brian Delaney, Gerald O’Leary
Visualization Tools:
Cell Phone Mini Challenge Award: Intuitive Social Network Graphs Visual Analytics of Cell Phone Data using MobiVis and
OntoVis, by Carlos D. Correa Tarik Crnovrsanin Christopher Muelder Zeqian Shen Ryan Armstrong James Shearer Kwan-Liu
Ma ∗



Machine Learning Foundation, By Jason Brownlee
The main topics covered are:

1. Distributed file systems and map-reduce as a tool for creating parallel

algorithms that succeed on very large amounts of data.

2. Similarity search, including the key techniques of minhashing and localitysensitive

hashing.

3. Data-stream processing and specialized algorithms for dealing with data

that arrives so fast it must be processed immediately or lost.

4. The technology of search engines, including Google’s PageRank, link-spam

detection, and the hubs-and-authorities approach.

5. Frequent-itemset mining, including association rules, market-baskets, the

A-Priori Algorithm and its improvements.

6. Algorithms for clustering very large, high-dimensional datasets.

7. Two key problems for Web applications: managing advertising and recommendation

systems.

8. Algorithms for analyzing and mining the structure of very large graphs,

especially social-network graphs.

9. Techniques for obtaining the important properties of a large dataset by

dimensionality reduction, including singular-value decomposition and latent

semantic indexing.

10. Machine-learning algorithms that can be applied to very large data, such

as perceptrons, support-vector machines, and gradient descent.
Big Data impacts e-connected businesses through capture, processing and storage of
huge amount of data efficiently

Instructor:
 Wish I had TAs!

Office hours:
 TBD
6

Course website:
TBD
 Lecture slides(To be posted to a Rice U website- TBD)
 Readings

Readings: Book Mining of Massive Datasets
with A. Rajaraman and J. Ullman
Free online:
http://www.mmds.org
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
7