Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Big Data Analytics Carlos Ordonez Big Data Analytics research • Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing) • How? Fast external algorithms; memory-efficient data structures at two storage levels; parallel: multi-threaded or multi-node • Efficiency goal: linear time O(n) and linear speedup • Hardware? single node or parallel cluster • Infrastructure? parallel file system; any large files • Challenging: Theory+programming in action Systems research today • Transaction processing? Main memory, lock-free • Efficient analysis? Optimal joins, compiled queries, streams, exploit ample RAM, explout multi-core • Compiler versus interpreter? • Massive storage? Posix, HDFS • Fast external algorithms? Simple tasks. • Parallel computation? Multi-core with threads, Sharednothing, message-passing • Exploiting new hardware? Difficult/customized • Analyzing: queries, cubes, statistics. Machine learning • Hot today: Information integration (database+files) DB Systems involves Core CS research: Theory+Programming • • Theory we use: – Time complexity (big O()) and I/O cost (disk, solid state memory) – Data structures (trees, hash tables, linked lists) – Relational model and information retrieval models – Multivariate statistics, machine learning, discrete mathematics, linear algebra – Compilers and programming languages: parsing/compiling/optimizing code; recursion Programming: – Languages: mostly C++, but also R, SQL, Java – Unix, but we have a lot of past work on MS Windows – Systems: Threads, binary I/O, parallel file systems, code generation, code optimization, interpreter runtime Sample of target problems Business Intelligence: cubes, lattices Bayesian statistics: MCMC, classification, regression, variable/feature selection Big Data summarization: vector outer products Graph transitive closure and linear recursion Why join the DBMS group? • Just came back from ATT Labs (formerly the famous ATT Bell Labs)..my head is spinning with C++ 14 and Unix commands. Currently programming with my PhD students. • Balance between theory (mathematics) and programming (C++) • Mature and stable CS research area • Job prospect upon graduation is excellent. Great opportunity to join industrial labs. • Visit my web page, DBLP. Google “Ordonez SQL”, stop by on my office hours