Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Warehousing from the Web Chris Fernandes ([email protected]) and Michael Whalen ([email protected]) Department of Computer Science, Union College, Schenectady, NY, 12308 Summary Procedure Results Warehouse projects provide students with robust capstone experiences and produce interesting results. Meeting scheduler Course analysis Enrollment data Room availability Introduction Data warehousing is the process of collecting information from various repositories and combining it into a single structured repository that can be queried for new information such as performance trends. Many Internet web sites contain useful but unstructured data, thus making them ideal for student projects related to data warehouses. We describe one such project, developed from the registration web pages at Union College, which allows faculty and students to get on-line access to course enrollment trends, classroom availability, and other pertinent information. The results of this project were so successful in the type of information that could be obtained that the administration became concerned about student privacy issues, allowing the student to extend his work into the area of warehouse security. HTML data is automatically parsed nightly for content and transferred to the warehouse backend, called SCOUR (Search Contents Of Union’s Registry) Course Number Name Homepage Term … … … … … Query results can be displayed in a variety of formats, including histograms and importing results to a spreadsheet. … Event BeginTime EndTime Location … … … … … User Fname Lname AdvisorID … … … … … Unlike traditional databases, the SCOUR warehouse contains historical and summarized data for use in statistical queries Conclusions 1. Data warehousing projects yield many pedagogical benefits. They allow students to build bridges between many areas of computer science including • database and data warehouse theory • GUI design • security and authorization • interface usability • privacy and ethics 2. Projects can be diverse. Raw data abounds on the Web in many fields. 3. Projects can have flexible scope to meet time constraints. One can easily extend a warehouse with security to restrict access to sensitive queries. UNION registrar web pages contain semistructured data Dynamic web-based front ends were created for a variety of queries. It was essential to maintain ease of use by non-technical operators.