Download Title of Presentation - University of Kansas

Document related concepts

Microsoft SQL Server wikipedia , lookup

Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Transforming the Student
Administration RDS into a Historical
Data Archive
Session #10591
March 9, 2004 11:50-12:50
HEUG 2004 Conference - Atlanta
Ray Helm
Research Analyst, Office of Institutional Research and Planning
University of Kansas - Lawrence
Ryan Cherland
Director, University Management Information,
Associate Director, Office of Institutional Research and Planning
University of Kansas - Lawrence
Session #10591-Ray Helm
2
Ray Helm is a Research Analyst with the Office of Institutional
Research and Planning. He joined the University of Kansas as a
Programmer Analyst with Decision Support Services in 2000. He has
been involved in reporting from PeopleSoft systems for over 3 years and
was the original RDS Administrator at KU.
Ryan Cherland is Director of University Management
Information and Associate Director of the Office of Institutional
Research and Planning. He has over 18 years of experience in
institutional research, with the last 11 years at the University of KansasLawrence. Ryan created DEMIS and has been involved with extracting
and reporting on data from PeopleSoft systems since
1996. Ryan has a Ph.D. in Higher Education
Administration with a minor emphasis in
Educational Psychology and Research.
Session #10591-Ray Helm
3
Synopsis
This presentation reviews the methodology used to create
a historical archive of student administration data by
customizing PeopleSoft’s RDS service.
• Review Database needs
• Setting Census Parameters
• Modifying Decision Stream Fact Builds
• Modifying Decision Stream Dimension
• Special Jobstreams
• Reporting Examples
Session #10591-Ray Helm
4
About the University of Kansas
• History - The University of Kansas opened its doors in 1866.
• Academics - The university offers more than 100 undergraduate
and graduate majors and programs including allied health,
architecture, business, education, engineering, fine arts, journalism,
law, liberal arts and sciences, nursing, pharmacy, and social welfare.
• Lawrence Campus Enrollment – 20,692 undergraduates and
6,122 graduate students from every state in the nation and more than
100 countries around the world.
• FTE students per FTE faculty ratio: 14.7
Session #10591-Ray Helm
5
PeopleSoft Student Admin/RDS at KU
• PS Version 8 implemented for Fall, 2003 enrollment,
Financial Aid module implementation pending
• RDS installed October, 2002
• Census RDS created August, 2003
• All databases are Oracle 8.17 running on Unix servers
• Cognos Decision Stream runs on Windows 2000 server
Session #10591-Ray Helm
6
Why Build a Census RDS?
• Daily RDS provides current snapshot, not
designed for historical reporting
• Needed capacity to report across semesters to
track historical trends and fulfill institutional
research needs.
• Running queries against transactional “live”
data was not desired.
• Census extract begun under legacy student
records system needed to be continued
Session #10591-Ray Helm
7
Census RDS Resource Needs
• Requires no additional software purchase
• Only additional resources needed are:
• Storage Space
• Staff time to create, execute, and update
• DBA assistance in setup and occasional support
Session #10591-Ray Helm
8
KU’s RDS Catalogs and Databases
• Decision Stream Catalog Databases
• Production Catalog for Daily RDS
• Development Catalog for RDS Admins
• Census Catalog
• PS Delivered Catalog (Static)
• RDS Databases
• Daily Production data
• Development Test data
• Production Census data
Session #10591-Ray Helm
9
Getting Started: Create RDS Census
Databases
1. Replicate Daily RDS Catalog database as Census
Catalog database. If Daily RDS Catalog cannot be
copied, restore using Decision Stream Catalog
Backup/Restore process.
2. Create output database. Size should be roughly equal
to Daily RDS database for starters. If possible make a
database copy of Daily RDS to migrate users and
roles.
Session #10591-Ray Helm
10
Create CEN_PARAMS Table
• Same structure as ODS_PARAMS:
VARIABLE_NAME
RESULT
• Supplements ODS_PARAMS as source of
Decision Stream variable values
• Stores variables specific to census process
Session #10591-Ray Helm
11
Census Variables
• CENSUS_TERM Term code as it is referenced in PS
source data.
• CENSUS Unique identifier that will distinguish census
point records within the Census RDS
• DATE_CUTOFF Official date and time when the
census data was collected
Session #10591-Ray Helm
12
ACTIVE***SQL Variables
Variables containing SQL statements that identify
active students, instructors, recruiters, and
applicant
• ACTIVESTUDENTSQL
• ACTIVEINSTRUCTORSQL
• ACTIVERECRUITERSQL
• ACTIVEAPPLICANTSQL
Session #10591-Ray Helm
13
CENS_PARAMS Examples
VARIABLENAME
RESULT
CENSUS_TERM
4042
CENSUS
2004201
ACTIVEINSTRUCTORSQL
(SELECT EMPLID FROM PS_CLASS_INSTR
WHERE STRM='4042')
DATE_CUTOFF
01-23-2004 00:10
Session #10591-Ray Helm
14
Decision Stream Fact Build
Session #10591-Ray Helm
15
RDS Fact Build Modification
• Add Census variables to Fact Build Properties and
modify existing CURDATE_SOURCE
• Modify DataStream SQL against source tables
• Add CENSUS variable to Transformation
• Add CENSUS variable to Fact Delivery, with index
• Change Delivery Method from Truncate to Append
Session #10591-Ray Helm
16
Adding Variables to
Fact Build Properties
Session #10591-Ray Helm
17
Modify CURDATE_SOURCE
• Change from being the current date to the value of
DATE_CUTOFF
• Variable Expression changed from:
LOOKUP('ODS_CURRENT', 'SELECT RESULT FROM ODS_PARAMS WHERE
VARIABLENAME=''CURDATE_SOURCE''')
To:
Concat( 'TO_DATE(''', LOOKUP('ODS_CURRENT', 'SELECT RESULT
FROM CENS_PARAMS WHERE VARIABLENAME=''DATE_CUTOFF'''),
''',''MM-DD-YYYY HH24:MI'')')
Session #10591-Ray Helm
18
Modifying SQL WHERE Clause
• Replace
STRM BETWEEN {$START_TERM} AND {$END_TERM}
with AND STRM={$CENSUS_TERM}
• Leave {$CURDATE_SOURCE} unchanged, value in
DATE_CUTOFF will replace system date value when variable
is resolved.
• Add AND EMPLID IN {$ACTIVE***SQL} as needed
Session #10591-Ray Helm
19
Add CENSUS Derivation to
Transformation
Session #10591-Ray Helm
20
Add CENSUS to Fact Delivery
Session #10591-Ray Helm
21
Change Delivery Method to Append
Session #10591-Ray Helm
22
Result of Fact Build Modifications
• Extract against source data selects only records pertaining
to current term with effective dates before specified cutoff
date
• CENSUS column added to output table to identify census
point for each record
• New census data appended to existing data to create
historical record
Session #10591-Ray Helm
23
Modifications to Dimensions
• SHARED_LOOKUPS and XLATTABLE
• No changes needed if CURDATE_SOURCE
variable references DATE_CUTOFF value
• Review referenced lookups as each build is modified
is recommended
• Dimensions Referencing Fact Builds
• Add “AND CENSUS={$CENSUS}” to WHERE
clause in Lookup DataStream SQL
Session #10591-Ray Helm
24
Creating New Jobstreams
• CONFIGURE_CENSUS_RUN Jobstream
Changes values in CENS_PARAMS table to current
census point
Updates census point information table
• ROLLBACK Jobstream
Allows for removal of all data for a specific census point
Run only when errors or run failures occur.
Session #10591-Ray Helm
25
CONFIGURE_CENSUS_RUN
Jobstream
Session #10591-Ray Helm
26
EDIT_CENS_PARAMS SQL Node
Edit/Update Census variables
UPDATE CENS_PARAMS
SET RESULT='4042'
WHERE VARIABLENAME='CENSUS_TERM';
UPDATE CENS_PARAMS
SET RESULT='2004201'
WHERE VARIABLENAME='CENSUS';
UPDATE CENS_PARAMS
SET RESULT='01-23-2004 00:10'
WHERE VARIABLENAME='DATE_CUTOFF';
Session #10591-Ray Helm
27
EDIT_CENS_PARAMS SQL Node
TEMP_PARAMS table
CREATE GLOBAL TEMPORARY TABLE TEMP_PARAMS
(CTERM CHAR(4),
STR1 CHAR(255),
STR2 CHAR(20)
);
INSERT INTO TEMP_PARAMS
SELECT TRIM(TRAILING FROM RESULT),'', '‘
FROM CENS_PARAMS
WHERE VARIABLENAME='CENSUS_TERM';
Session #10591-Ray Helm
28
EDIT_CENS_PARAMS SQL Node
UPDATE TEMP_PARAMS
SET STR1='(SELECT EMPLID FROM PS_CLASS_INSTR
WHERE STRM='''
WHERE CTERM IS NOT NULL;
UPDATE TEMP_PARAMS
SET STR1=TRIM(TRAILING FROM
STR1)||CTERM||TRIM(TRAILING FROM STR2)
WHERE CTERM IS NOT NULL;
UPDATE CENS_PARAMS
SET RESULT=(SELECT STR1 FROM TEMP_PARAMS
WHERE CTERM IS NOT NULL)
WHERE VARIABLENAME='ACTIVEINSTRUCTORSQL';
Session #10591-Ray Helm
29
ROLLBACK_CENSUS_POINT
Jobstream
Session #10591-Ray Helm
30
Rollback Procedure Variables
Session #10591-Ray Helm
31
SQL1: BUILD ROLLBACK TABLE
CREATE TABLE ROLLBACK1
(ROWNUMB NUMBER(8), TABLE_NAME CHAR(255));
INSERT INTO ROLLBACK1
SELECT ROW_NUMBER() OVER
(ORDER BY TABLE_NAME) , TABLE_NAME
FROM ALL_TABLES WHERE OWNER='RDSSA'
AND (SUBSTR(TABLE_NAME,1,3) IN
('ADM','CC_','FA_','REC','SF_','IR_','XSY')
OR TABLE_NAME='CENSUS_POINT_INFO_TBL');
Session #10591-Ray Helm
32
CENSUS ROLLBACK PROCEDURE
while $COUNTER <= RB_MAX()
do
BEGIN
$CNTR:=$COUNTER;
$TABLE:=RB_GETTABLE($CNTR);
LOGMSG(CONCAT('ROLLING BACK CENSUS POINT
',$CENS,' FROM TABLE: ',$TABLE));
RB_DelRows($TABLE,$RB_CENSUS);
$COUNTER:=$COUNTER+1;
END
Session #10591-Ray Helm
33
UDF: RB_MAX()
Returns Total Table Count
• Implementation: Internal Calculation
• Returns Integer value calculated as:
RETURN LOOKUP('ODS_CURRENT',
'SELECT MAX(ROWNUMB) FROM
ROLLBACK1');
Session #10591-Ray Helm
34
UDF: RB_GETTABLE($CNTR)
Returns name of table to rollback
• $CNTR passed to UDF
• Internal Calculation returning table name:
RETURN LOOKUP('ODS_CURRENT',
CONCAT('SELECT TABLE_NAME FROM
ROLLBACK1 WHERE
ROWNUMB=',TOCHAR($CNTR)));
Session #10591-Ray Helm
35
RB_DelRows($TABLE,$RB_CENSU
S)
Deletes rows from table
• $TABLE (table name) and $RB_CENSUS (Census value)
are passed to UDF
• Internal Implementation executes:
SQL('ODS_CURRENT',CONCAT('DELETE FROM ',
$TABLE, ' WHERE CENSUS=''', $CENS,''''));
• Example:
RB_DELROWS(‘MYTABLE’,’2004201’) executes:
DELETE FROM MYTABLE WHERE CENSUS=‘2004201’
Session #10591-Ray Helm
36
CENSUS ROLLBACK PROCEDURE
while $COUNTER <= RB_MAX()
do
BEGIN
$CNTR:=$COUNTER;
$TABLE:=RB_GETTABLE($CNTR);
LOGMSG(CONCAT('ROLLING BACK CENSUS POINT
',$CENS,' FROM TABLE: ',$TABLE));
RB_DelRows($TABLE,$RB_CENSUS);
$COUNTER:=$COUNTER+1;
END
Session #10591-Ray Helm
37
Migrating changes in Daily RDS to
Census RDS
• New data needs are being identified as analysts from
across the University begin using the Daily RDS, and as
development of the transactional system continues.
• Changes to Daily RDS are relatively simple, since tables
are truncated daily.
• Changing Census RDS tables is more difficult:
•Need to add new columns without losing existing data
•How do you get data for added columns (or new
tables) for past census points?
Session #10591-Ray Helm
38
Adding a New Table to Census RDS
• Adding new fact build is relatively simple:
• Export Daily RDS fact build and related dimensions
using Decision Stream CATEXP
• Import into Census RDS using DS CATIMP
• Modify fact build using methods described previously
• Decide how to populate past census points:
• Restore census point backups one at a time and
execute build?
• Use current data? Leave empty?
Session #10591-Ray Helm
39
Adding a New Column to Existing
Fact Build
• More complicated than adding a new build
• Need to modify existing table, cannot just create new
• Potential for losing data exists (so be careful!)
• Backing up both Catalog and Output databases before
beginning is highly recommended
Session #10591-Ray Helm
40
Adding a New Column to Existing
Fact Build (Cont.)
• Restoring census point backups and running fact build
•Requires effort from technical support staff
•Ensures data accuracy
Session #10591-Ray Helm
41
Database Size Comparisons
DATABASE
SIZE (MB)
Daily RDS
13,500
Census RDS
(4 census points)
17,300
Session #10591-Ray Helm
42
Options for Reducing Census RDS
Size
•Drop tables/fact builds for unused tables
•Add SQL procedure steps to drop Staging Tables
•Staging tables account for 1,250 MB of storage (7.2%)
•Alternative: leave Staging table delivery as Truncate
•Remove unused/unnecessary columns from tables
•Can you live with just one DESCR?
•Drop unneeded indexes
Session #10591-Ray Helm
43
Making Use of Census RDS Data
Ryan Cherland
Session #10591-Ray Helm
44
Examples of how Institutional
Research uses the Census RDS
• Creation and storage of a Census single
record per student Reporting Datamart table
• Audit checks
– A source for audit checks with production
SQRs run against the SASTATIC SA database
• Reports
– Census Class Roster
Session #10591-Ray Helm
45
Single Record Report Table
• Contains basic bio-demographic data
• Campus location where majority of credit hours
are taken (only location for this)
• Various credit hour and FTE fields covering
campus location, special hours (ROTC,
Dissertation/Thesis), overall hours, etc.
• “Primary” school, major, and student level
selected from all active programs and plans
• Test scores (SAT, ACT, GRE)
• 104 fields in total
Session #10591-Ray Helm
46
Creation of
REPORTING_DMART table
• Audit reports and correction efforts are made prior
to the Census day in the Live SA database, but
some errors are too complicated or new to be fixed
in time
• SQR written by SA developers runs against the
SASTATIC database and creates a flat-file and
error check report
• So this table is where corrections are applied to
the flat-file (wrong student levels, missing
program / plan stacks, etc.) before loading into
SARDSCEN using SAS
Session #10591-Ray Helm
47
Screen Shots of Fields in Reporting Datamart…
Session #10591-Ray Helm
48
Example of an Audit Check
Report
• Departmental Load Analysis
– A summarized report by school, department,
and course level of credit hour enrollment in
courses
– The OIRP SAS program using the SARDSCEN
data allows us to validate the data in the
SARDSCEN against the SQR version of this
report that is run against SASTATIC
Session #10591-Ray Helm
49
Results for the School of Business
for the Fall 2003 Semester
Session #10591-Ray Helm
50
Reporting from the SARDSCEN
database
• Advantages in having the same data
structures
– Can quickly point the report to using the
SARDSCEN database or the daily SARDS
database
• Familiarity with the daily SARDS allows
other “non-IR” staff to start using the
historical data with confidence
Session #10591-Ray Helm
51
Using SAS macro code to “flip”
which database is being accessed
/* Uncomment these for daily access */
%LET LIB=SARDS;
%let cen=;
/* Uncomment these for CENSUS access */
%*LET LIB=SARDSCEN;
%*LET cen=CENSUS EQ '2003920' AND;
/* Set Term Code below for desired semester */
%*let trm=4039;
%let trm=4042;
Session #10591-Ray Helm
52
SAS Macro fields in the query…
PROC SQL;
create table enrolled as
select * from &LIB..rec_enrollment_fact_ku
where &CEN ENRL_STRM="&TRM"
and enrl_status_cd='E'
and class_sid in (select distinct class_sid
from &LIB..rec_class_dim
where &CEN class_term_cd="&trm"
and class_subject_cd='TH&F')
order by class_sid;
Session #10591-Ray Helm
53
Output Results using Census
Session #10591-Ray Helm
54
Output Results using Daily
Session #10591-Ray Helm
55
Conclusion
Creating a historical archive version of the RDS from a copy of the daily
RDS is a bit labor intensive, but the results are well worth it!!
Ray Helm, [email protected]
Research Analyst, University of Kansas
Ryan Cherland, [email protected]
Director University Management Information, University of Kansas - Lawrence
http://www.heug.org
Attendees may download HEUG 2004 presentations from HEUG On-Line
Session #10591-Ray Helm
56