CS 257 Database Systems Dr. T Y Lin Ultimate Goal Data Science (Big Data) Project Overview Verification and Validation of the Core Engine of a Concept Based Semantic Search Engine 2 Main Idea A set of documents is associated with a Matrix, called 1) Latent Semantic Index(LSI) , by treating the row vectors as points in Euclidean space (point=TFIDF), - Google’s approach 3 Main Idea 2) Topological approach : A polyhedron (combinatorially, = a Simplicial Complex) is built to capture and structure the concepts 4 An open segment is a 1-simplex, an open triangle (faces) is a 2-simplex and an open tetrahedron is a 3-simplex, and . . . n-simplex. A collection of simlexes (satisfies closed condition) is called simplicial complex that is a combinatorial representation of a polyhedron that led to a “new” subject called algebraic topology. The project is algebraic topology based search engine. 5