CS 257
Database Systems
Dr. T Y Lin
Ultimate Goal
Data Science (Big Data)
Project Overview
Verification and Validation of the
Core Engine of
a Concept Based
Semantic Search Engine
Main Idea
A set of documents is associated with a
Matrix, called
1) Latent Semantic Index(LSI) , by treating
the row vectors as points in Euclidean
space (point=TFIDF),
- Google’s approach
Main Idea
2) Topological approach : A polyhedron
(combinatorially, = a Simplicial Complex)
is built to capture and structure the concepts
An open segment is a 1-simplex, an open triangle (faces) is a 2-simplex and an
open tetrahedron is a 3-simplex, and . . . n-simplex.
A collection of simlexes (satisfies closed condition) is called simplicial complex
that is a combinatorial representation of a polyhedron that led to a “new” subject
called algebraic topology. The project is algebraic topology based search engine.