Download Strategic University Management System Exploring Business

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Healthcare Cost and Utilization Project wikipedia , lookup

Database model wikipedia , lookup

Transcript
Strategic University Management System
Exploring Business Intelligence with Pentaho
Marcella Wong
CS 491
Abstract
Business Intelligence, or BI, focuses on the consolidation, organization and reporting of
data for analytical purposes.
This project explores Pentaho, a third party open-source BI tool, through the build and
operation of a practical BI environment for a university application.
The environment consists of reporting and graphing capabilities that allows the user,
most likely a university administrator, the ability to analyze databases consisting of student
survey results and test scores. Users will have the ability to visualize, for example, where
CSULA students stand compared to the rest of the nation in terms of certain standardized tests
or how students and faculty members rate certain learning outcomes.
Administrators will be able to identify which areas students are struggling in and
excelling in. These features allow the user to hone in on the important trends and statistics
regarding his/her institution, providing the basis for making informed decisions.
Introduction
Databases exist in all types of companies and institutions. They contain a lot of valuable
information, but are often underutilized. Companies, for example, may want to know what
impact price change will have on purchasing behavior. The answer can be found in their
database. The issue is finding a smart and easy way to pull appropriate data from tables and
present it in such a way that is easy to analyze and evaluate.
For this project, we have applied the same concepts to a university environment.
Thousands of student records are stored in school databases, including class grades and GPA’s.
Analysis on student databases can give faculty members insight on, for instance, how much
students actually learn by viewing the correlation between their class grades and their scores
on an MFT exam.
Through BI, those analyses can easily be made by using BI tools to select appropriate
data, transform and operate on them and generate graphs and charts to let faculty evaluate for
themselves how well our students are doing.
For this project, the tools used for the handling of data, creation and editing of reports
will be from Pentaho. Even though Pentaho can be integrated with other non-Pentaho tools,
those provided by Pentaho are already compatible with each other so no further configuration
of different modules will be necessary.
The goal of this project is to introduce the capabilities that BI can offer. To do that, we
will create a system to effectively turn raw data into a more visually appealing form. We will be
using student information and the CSNS survey information provided by the CSULA Computer
Science department. Various reports will be built with both types of data.
Technological Background
The technology behind Pentaho is their BI Platform, which is made up of the Pentaho
server and an Eclipse-based Design Studio. In addition, the Pentaho Data Integration and
Pentaho Report Designer tools are imperative to the completion of this project.
The Solution Engine is the main component of the BI Platform. It controls the access to
data and reports by connecting the data sources with user components. It reads and executes
an Action Sequence. Then, it performs certain actions and accesses different types of data
based on the specific Action Sequence.
Pentaho BI Platform
Pentaho
Report
Solution Engine
Database
Figure 1. The Solution Engine connects to the data source, takes the necessary data and produces a
report.
An Action Sequence is an XML document that defines the task that the solution engine
performs. It contains components that the Solution Engine requires before execution, which
includes inputs (from data source or user), data types, resource types and action types.
Inputs are parameters that are taken either from users or from data sources.
<inputs>
<quarter type="string">
<sources>
<request>quarter</request>
</sources>
<default-value><![CDATA[Spring]]></default-value>
</quarter>
</inputs>
Figure 2. Declaring inputs in Action Sequences.
In Figure 2, user input is required and the value is assigned to the variable, “quarter.” If no user
input was provided, then the default value for “quarter” is “Spring.”
Resource types include the additional information that the Solution Engine needs to
know.
<resources>
<report-definition>
<solution-file>
<location>figure1.xml</location>
<mime-type>text/xml</mime-type>
</solution-file>
</report-definition>
</resources>
Figure 3. Declaring resource types in an Action Sequence.
In Figure 3, the type of resource shown is a “solution-file,” which is a file on the file system
relative to the location of the Action Sequence document. In the example above, the file
“figure1.xml” contains information that the Solution Engine needs. “Figure1.xml” is another
XML document that contains the format of the report. How the data is displayed, the types of
graphs used, etc., are all specified in the “Figure1.xml” document. It is created by using the
Pentaho Report Designer tool.
Types of action contains the steps the Solution Engine must follow in order to complete
the task.
<action-definition>
<component-name>SQLLookupRule</component-name>
<action-inputs>
<quarter type="string"/>
</action-inputs>
<action-outputs>
<prepared_component type="sql-query" mapping="chart"/>
<Quarter type="string"/>
<Percentage type="string"/>
<ID type="string"/>
</action-outputs>
<component-definition>
<query><![CDATA[SELECT Quarter, Percentage, ID FROM figure1
WHERE ID in ({quarter});]]>
</query>
<live><![CDATA[true]]></live>
<driver><![CDATA[com.mysql.jdbc.Driver]]></driver>
<user-id><![CDATA[root]]></user-id>
<password><![CDATA[abcd]]></password>
<connection>
<![CDATA[jdbc:mysql://localhost:3307/projects]]>
</connection>
</component-definition>
</action-definition>
Figure 4. Action types in an Action Sequence.
The Action Sequence in Figure 4 instructs the Solution Engine to go into the database based on
the given connections and perform the SQL statement and return the results for the variables
“Quarter,” “Percentage,” and “ID.”
Once the Solution Engine gathers all the data and performs the instructed operations, a
report is produced.
System Overview
As mentioned, several tools must be used to implement an entire BI system using
Pentaho – Pentaho Data Integration, Pentaho Report Designer and Pentaho Design Studio.
An important feature of BI is the ability to take raw data and present it in such a way
that makes it meaningful. In this project, a set of student data and survey was fed into this
system. Some fields include class grades, exam scores for student data and survey ratings for
survey data. Data sources for this particular system are Excel sheets and database. The
Pentaho Data Integration tool was used to join various Excel sheets with databases, perform
operations on selected fields and return a reportable table.
Several Excel files containing Collaborative, Written and Oral Communication ratings for
each student were provided. Using the Data Integration tool, transformations were created. A
transformation contains a set of rules that the tool must perform. In this case, the Excel files
were combined to calculate the average rating for each attribute. The calculated average was
loaded into the database as a separate table, which will be used later for reporting. Also with
the same tool, jobs were created. Jobs are similar to process chains. For example, if more than
one transformation needs to be executed, a job can execute all of them at once or in sequence.
Once the transformations were complete, Report Designer was used to build the design
for the report. From there, the table to be reported on is specified and the format and layout
of the report is produced. Data, tables and charts are organized to meet reporting needs.
When the appropriate layout is created, a .xml and .xaction file will be generated. Both files are
fed into the Pentaho server and are read and executed by the Solution Engine. The XML file
contains the layout information of the report (e.g. the type of chart and the header and footer
information). The .xaction file is the Action Sequence that contains instructions to retrieving
data. An Action Sequence contains connection information and SQL statements stating what to
do in the database.
The Action Sequence that Report Designer produces may not be enough to satisfy
reporting needs, since it will only contain connection information. For situations where user
input is required, Design Studio is needed. Some of the reports that have been created allow
users to enter which quarter he/she wants to display. Reports like that require Design Studio to
help correctly format .xaction file. Recall that an Action Sequence is an XML file as well.
Directly typing the XML tags without the help of Design Studio can be tedious and error-prone.
The completed XML and Action Sequence needs to be placed in the correct folders for
the Pentaho sever to read and execute. The report can be viewed via web browser. Log into
Pentaho and navigate to the Solutions and select the report to view.
Design and Implementation
The three tools that I have mentioned earlier that Pentaho provides will be used for
both the flat file and database data.
Flat Files
Reports regarding course evaluations, course grades and GPA’s are provided in the form
of flat files, which in this case, is in Excel format. Since there are several different types of
reports that were loaded using flat files, the following discussion will be based on the report
that displays the student rankings based on Oral and Written skills.
The data must be operated on and loaded into a database in order for Pentaho to
retrieve and run a report on it. The entire process – extracting, loading and managing of the
data – is called a transformation (See Figure 5). By using the Data Integration tool,
transformations were created for each report. For flat files, transformations include the
following steps:

Extract table/data from flat files

Perform operations to meet reporting needs

Load final table/data into a database
There will be two transformations for this particular report – one for extraction and loading of
Excel files and another for transforming the data.
Figure 5. An example of a transformation. After data has been extracted from flat
files and loaded into the database, the different types of files get joined together
and loaded back into the database.
The provided flat files came in eight different files. Four files were rankings based on
the Spring and Fall quarters of 2006 and 2007 CS 437 class, and the other four from the Spring
and Fall quarters of 2006 and 2007 CS 491 class. In total, eight classes were evaluated.
The first transformation consists of loading all the files into the database as eight
separate tables. A second transformation was created to alter and clean up the data so it can
be ready for reporting (See Figure 5). As shown in Figure 5, the first level contains database
input icons. A SQL statement can be explicitly written to tell the tool which attribute to use and
what to do with it. In this case, a SQL statement was written to calculate the average ratings
for all the tables. The next two levels are appending icons. It tells the tool to append one table
to another. Before this, CS 437 and CS 491 rankings are still separate. The fourth level is a join
icon, which joins the two classes on the “Quarter” attribute. The fifth level gets rid of any
attributes that are unnecessary for reporting. Finally, the last level outputs a table that a report
can be built on top of. See Figure 6 for the final output of the two transformations.
Figure 6. This is the final table after undergoing some transformations.
Since a reportable table now exists in the database, a report can be built using Report
Designer. Many different elements can be incorporated into reports, with the most popular
being graphs. This report includes a graph and a table displaying the graph data.
To build the report, connection information is defined to have the graph refer to the
table that was created from the transformations. Details of the reports, including type of
graph, colors, and labels, were also defined. The table displaying the graph data was added by
specifying which column the report needs to pull data out from.
Report Designer’s main purpose is to format the look and feel of the report. Design
Studio is where reports can be made a little more dynamic. The report created from Report
Designer was brought into Design Studio to add filters. Instead of displaying all four quarters,
the user can indicate which quarter(s) he/she wants to see.
Once that is finished, entire process for making a report is complete.
Figure 7. Schema for database, Surveys.
Database
CSNS survey information was gathered from a database. To demonstrate building
reports from that, the discussion below will be based on the report displaying the average
ratings of learning outcome 1 from different constituents. The schema of relative tables in the
database is shown in Figure 7.
Since the schema is more complex than the example from the flat file, the
transformations were also much more intricate. Part of the transformation is shown in Figure
8.
Data was retrieved using various types of joins. The transformation relevant to
retrieving student information is shown in Figure 8. The alumni, faculty and employer
information was also collected the same way. In the same transformation, the four types of
newly gathered information are appended to one table, so a report can be built from it.
Figure 8. Part of the transformation for retrieving data from
survey information.
Other than defining the transformations, the process for creating the report using
Report Designer was the same as using flat files.
By using transformations, data from various sources, tables and databases can be
extracted and operated on together to output into one source.
System Evaluation
This BI system consists of 12 canned reports, each with the ability to display the report
in pdf, csv, Excel, Word, and html formats. Six of them come from flat files and the other six
from databases. Due to time constraints, only a couple of reports allow the users to filter out
the type of information to be displayed.
From transformation to report creation and generation, this system is able to
demonstrate the power of BI in the reporting aspect. We were able to demonstrate the ability
of an ETL tool and show an alternate and simpler way of collecting data. Moreover, this project
demonstrates the ease and strengths of BI in that it can easily convert raw data into meaningful
diagrams.
Conclusion
This project has achieved what was planned. We were able to demonstrate a different
type of technology that was different from what we are familiar with. A BI system was
successfully implemented and the resulting reports came out as expected.
Even though we accomplished the task of learning a new technology, the experience
was much more valuable. We realized that self-teaching a tool with poor documentation is a
real challenge. Because of the lack of documentation and tutorials, performing simple
operations on any of the three tools took a lot of research. Moreover, as time progressed and
as we are more familiar with the programs, we discovered tricks and shortcuts.
We not only reached our goal, but also gained valuable lessons. This project did not
only prove that self-teaching a new technology is tough, but that experience plays a big role to
the completion time of the final project.
Appendices
a) Pentaho Open Source Business Intelligence
-
http://www.pentaho.com