Download Patterns not just Data

Patterns not just Data • Information overload which escalates beyond any of our traditional beliefs. • “The world produces between 1 and 2 exabytes of unique information per year, which is roughly 250 megabytes for every man, woman, and child on earth.” [P. Lyman and H.R. Varian, "How Much Information", 2000. Retrieved from http://www.sims.berkeley.edu/how-much-info on January 2002] • Still, even novel DBMS architectures are insufficient to cover the gap between the exponential growth of data and the slow growth of our understanding [Gray02], due to our methodological bottlenecks and simple human limitations. Lowell 2003 Timos Sellis 1 Patterns not just Data • To compensate for these shortcomings, we reduce the available data to knowledge artifacts (i.e., clusters, rules, etc.) through data processing methods (pattern recognition, data mining, knowledge extraction • This reduces their number and size (so that they are manageable from humans) while preserving as much as possible from their hidden/interesting/available information. • These knowledge artifacts are patterns. Patterns can in general be distinguished with respect to how they are constructed and what they are used for. Lowell 2003 Timos Sellis 2 Patterns not just Data - Applications • Data Mining – Clusters, Classifications, Assoc. Rules, Time-Series • Signal Processing – Music, Voice, Vision • Information Retrieval – Corpus • Mathematical applications – Graphs, numbers, Cryptography • You can name more….. Lowell 2003 Timos Sellis 3 Patterns not just Data – The Challenge • Can we find a universal model that allows modelling patterns in general? • What would a query language for patterns look like? • What would be the essential “new” system components (indexing, visualization, etc)? • Can such systems be built on top of ORDBMS? Lowell 2003 Timos Sellis 4 Approximate Data/Answers • In most real, big, applications approximations are the only solution. • At the same time, the user needs to know the quality of the approximations, at the information level as well as at the answer level • Support must be provided by the DBMS at all levels: models, query languages, indexes, physical storage, visualization of results Lowell 2003 Timos Sellis 5 Approximate Data/Answers– The Challenge • Scalable approximation schemes (histograms, wavelets, etc) • Learning out of the tolerance a user can show to approximate answers deemed as acceptable • What is an approximation of an XML document? How much schema/ontology information is required? • Approximation may change according to the context of a user query; how is this taken under account? Lowell 2003 Timos Sellis 6

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Patterns not just Data