* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPT
Survey
Document related concepts
Transcript
Big Data Analytics Carlos Ordonez Big Data Analytics research • Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing) • How? Fast external algorithms; memory-efficient data structures at two storage levels; parallel: multi-threaded or multi-node • Efficiency goal: linear time O(n) and linear speedup • Hardware? single node or parallel cluster • Infrastructure? parallel file system; any large files • Challenging: Theory+programming in action Systems research today • Transaction processing? Main memory, lock-free • Efficient analysis? Optimal joins, compiled queries, streams, exploit ample RAM, explout multi-core • Compiler versus interpreter? • Massive storage? Posix, HDFS • Fast external algorithms? Simple tasks. • Parallel computation? Multi-core with threads, Sharednothing, message-passing • Exploiting new hardware? Difficult/customized • Analyzing: queries, cubes, statistics. Machine learning • Hot today: Information integration (database+files) DB Systems involves Core CS research: Theory+Programming • • Theory we use: – Time complexity (big O()) and I/O cost (disk, solid state memory) – Data structures (trees, hash tables, linked lists) – Relational model and information retrieval models – Multivariate statistics, machine learning, discrete mathematics, linear algebra – Compilers and programming languages: parsing/compiling/optimizing code; recursion Programming: – Languages: mostly C++, but also R, SQL, Java – Unix, but we have a lot of past work on MS Windows – Systems: Threads, binary I/O, parallel file systems, code generation, code optimization, interpreter runtime Sample of target problems Business Intelligence: cubes, lattices Bayesian statistics: MCMC, classification, regression, variable/feature selection Big Data summarization: vector outer products Graph transitive closure and linear recursion Why join the DBMS group? • Just came back from ATT Labs (formerly the famous ATT Bell Labs)..my head is spinning with C++ 14 and Unix commands. Currently programming with my PhD students. • Balance between theory (mathematics) and programming (C++) • Mature and stable CS research area • Job prospect upon graduation is excellent. Great opportunity to join industrial labs. • Visit my web page, DBLP. Google “Ordonez SQL”, stop by on my office hours