* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Title of Presentation - University of Kansas
Survey
Document related concepts
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Clusterpoint wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Object-relational impedance mismatch wikipedia , lookup
Transcript
Transforming the Student Administration RDS into a Historical Data Archive Session #10591 March 9, 2004 11:50-12:50 HEUG 2004 Conference - Atlanta Ray Helm Research Analyst, Office of Institutional Research and Planning University of Kansas - Lawrence Ryan Cherland Director, University Management Information, Associate Director, Office of Institutional Research and Planning University of Kansas - Lawrence Session #10591-Ray Helm 2 Ray Helm is a Research Analyst with the Office of Institutional Research and Planning. He joined the University of Kansas as a Programmer Analyst with Decision Support Services in 2000. He has been involved in reporting from PeopleSoft systems for over 3 years and was the original RDS Administrator at KU. Ryan Cherland is Director of University Management Information and Associate Director of the Office of Institutional Research and Planning. He has over 18 years of experience in institutional research, with the last 11 years at the University of KansasLawrence. Ryan created DEMIS and has been involved with extracting and reporting on data from PeopleSoft systems since 1996. Ryan has a Ph.D. in Higher Education Administration with a minor emphasis in Educational Psychology and Research. Session #10591-Ray Helm 3 Synopsis This presentation reviews the methodology used to create a historical archive of student administration data by customizing PeopleSoft’s RDS service. • Review Database needs • Setting Census Parameters • Modifying Decision Stream Fact Builds • Modifying Decision Stream Dimension • Special Jobstreams • Reporting Examples Session #10591-Ray Helm 4 About the University of Kansas • History - The University of Kansas opened its doors in 1866. • Academics - The university offers more than 100 undergraduate and graduate majors and programs including allied health, architecture, business, education, engineering, fine arts, journalism, law, liberal arts and sciences, nursing, pharmacy, and social welfare. • Lawrence Campus Enrollment – 20,692 undergraduates and 6,122 graduate students from every state in the nation and more than 100 countries around the world. • FTE students per FTE faculty ratio: 14.7 Session #10591-Ray Helm 5 PeopleSoft Student Admin/RDS at KU • PS Version 8 implemented for Fall, 2003 enrollment, Financial Aid module implementation pending • RDS installed October, 2002 • Census RDS created August, 2003 • All databases are Oracle 8.17 running on Unix servers • Cognos Decision Stream runs on Windows 2000 server Session #10591-Ray Helm 6 Why Build a Census RDS? • Daily RDS provides current snapshot, not designed for historical reporting • Needed capacity to report across semesters to track historical trends and fulfill institutional research needs. • Running queries against transactional “live” data was not desired. • Census extract begun under legacy student records system needed to be continued Session #10591-Ray Helm 7 Census RDS Resource Needs • Requires no additional software purchase • Only additional resources needed are: • Storage Space • Staff time to create, execute, and update • DBA assistance in setup and occasional support Session #10591-Ray Helm 8 KU’s RDS Catalogs and Databases • Decision Stream Catalog Databases • Production Catalog for Daily RDS • Development Catalog for RDS Admins • Census Catalog • PS Delivered Catalog (Static) • RDS Databases • Daily Production data • Development Test data • Production Census data Session #10591-Ray Helm 9 Getting Started: Create RDS Census Databases 1. Replicate Daily RDS Catalog database as Census Catalog database. If Daily RDS Catalog cannot be copied, restore using Decision Stream Catalog Backup/Restore process. 2. Create output database. Size should be roughly equal to Daily RDS database for starters. If possible make a database copy of Daily RDS to migrate users and roles. Session #10591-Ray Helm 10 Create CEN_PARAMS Table • Same structure as ODS_PARAMS: VARIABLE_NAME RESULT • Supplements ODS_PARAMS as source of Decision Stream variable values • Stores variables specific to census process Session #10591-Ray Helm 11 Census Variables • CENSUS_TERM Term code as it is referenced in PS source data. • CENSUS Unique identifier that will distinguish census point records within the Census RDS • DATE_CUTOFF Official date and time when the census data was collected Session #10591-Ray Helm 12 ACTIVE***SQL Variables Variables containing SQL statements that identify active students, instructors, recruiters, and applicant • ACTIVESTUDENTSQL • ACTIVEINSTRUCTORSQL • ACTIVERECRUITERSQL • ACTIVEAPPLICANTSQL Session #10591-Ray Helm 13 CENS_PARAMS Examples VARIABLENAME RESULT CENSUS_TERM 4042 CENSUS 2004201 ACTIVEINSTRUCTORSQL (SELECT EMPLID FROM PS_CLASS_INSTR WHERE STRM='4042') DATE_CUTOFF 01-23-2004 00:10 Session #10591-Ray Helm 14 Decision Stream Fact Build Session #10591-Ray Helm 15 RDS Fact Build Modification • Add Census variables to Fact Build Properties and modify existing CURDATE_SOURCE • Modify DataStream SQL against source tables • Add CENSUS variable to Transformation • Add CENSUS variable to Fact Delivery, with index • Change Delivery Method from Truncate to Append Session #10591-Ray Helm 16 Adding Variables to Fact Build Properties Session #10591-Ray Helm 17 Modify CURDATE_SOURCE • Change from being the current date to the value of DATE_CUTOFF • Variable Expression changed from: LOOKUP('ODS_CURRENT', 'SELECT RESULT FROM ODS_PARAMS WHERE VARIABLENAME=''CURDATE_SOURCE''') To: Concat( 'TO_DATE(''', LOOKUP('ODS_CURRENT', 'SELECT RESULT FROM CENS_PARAMS WHERE VARIABLENAME=''DATE_CUTOFF'''), ''',''MM-DD-YYYY HH24:MI'')') Session #10591-Ray Helm 18 Modifying SQL WHERE Clause • Replace STRM BETWEEN {$START_TERM} AND {$END_TERM} with AND STRM={$CENSUS_TERM} • Leave {$CURDATE_SOURCE} unchanged, value in DATE_CUTOFF will replace system date value when variable is resolved. • Add AND EMPLID IN {$ACTIVE***SQL} as needed Session #10591-Ray Helm 19 Add CENSUS Derivation to Transformation Session #10591-Ray Helm 20 Add CENSUS to Fact Delivery Session #10591-Ray Helm 21 Change Delivery Method to Append Session #10591-Ray Helm 22 Result of Fact Build Modifications • Extract against source data selects only records pertaining to current term with effective dates before specified cutoff date • CENSUS column added to output table to identify census point for each record • New census data appended to existing data to create historical record Session #10591-Ray Helm 23 Modifications to Dimensions • SHARED_LOOKUPS and XLATTABLE • No changes needed if CURDATE_SOURCE variable references DATE_CUTOFF value • Review referenced lookups as each build is modified is recommended • Dimensions Referencing Fact Builds • Add “AND CENSUS={$CENSUS}” to WHERE clause in Lookup DataStream SQL Session #10591-Ray Helm 24 Creating New Jobstreams • CONFIGURE_CENSUS_RUN Jobstream Changes values in CENS_PARAMS table to current census point Updates census point information table • ROLLBACK Jobstream Allows for removal of all data for a specific census point Run only when errors or run failures occur. Session #10591-Ray Helm 25 CONFIGURE_CENSUS_RUN Jobstream Session #10591-Ray Helm 26 EDIT_CENS_PARAMS SQL Node Edit/Update Census variables UPDATE CENS_PARAMS SET RESULT='4042' WHERE VARIABLENAME='CENSUS_TERM'; UPDATE CENS_PARAMS SET RESULT='2004201' WHERE VARIABLENAME='CENSUS'; UPDATE CENS_PARAMS SET RESULT='01-23-2004 00:10' WHERE VARIABLENAME='DATE_CUTOFF'; Session #10591-Ray Helm 27 EDIT_CENS_PARAMS SQL Node TEMP_PARAMS table CREATE GLOBAL TEMPORARY TABLE TEMP_PARAMS (CTERM CHAR(4), STR1 CHAR(255), STR2 CHAR(20) ); INSERT INTO TEMP_PARAMS SELECT TRIM(TRAILING FROM RESULT),'', '‘ FROM CENS_PARAMS WHERE VARIABLENAME='CENSUS_TERM'; Session #10591-Ray Helm 28 EDIT_CENS_PARAMS SQL Node UPDATE TEMP_PARAMS SET STR1='(SELECT EMPLID FROM PS_CLASS_INSTR WHERE STRM=''' WHERE CTERM IS NOT NULL; UPDATE TEMP_PARAMS SET STR1=TRIM(TRAILING FROM STR1)||CTERM||TRIM(TRAILING FROM STR2) WHERE CTERM IS NOT NULL; UPDATE CENS_PARAMS SET RESULT=(SELECT STR1 FROM TEMP_PARAMS WHERE CTERM IS NOT NULL) WHERE VARIABLENAME='ACTIVEINSTRUCTORSQL'; Session #10591-Ray Helm 29 ROLLBACK_CENSUS_POINT Jobstream Session #10591-Ray Helm 30 Rollback Procedure Variables Session #10591-Ray Helm 31 SQL1: BUILD ROLLBACK TABLE CREATE TABLE ROLLBACK1 (ROWNUMB NUMBER(8), TABLE_NAME CHAR(255)); INSERT INTO ROLLBACK1 SELECT ROW_NUMBER() OVER (ORDER BY TABLE_NAME) , TABLE_NAME FROM ALL_TABLES WHERE OWNER='RDSSA' AND (SUBSTR(TABLE_NAME,1,3) IN ('ADM','CC_','FA_','REC','SF_','IR_','XSY') OR TABLE_NAME='CENSUS_POINT_INFO_TBL'); Session #10591-Ray Helm 32 CENSUS ROLLBACK PROCEDURE while $COUNTER <= RB_MAX() do BEGIN $CNTR:=$COUNTER; $TABLE:=RB_GETTABLE($CNTR); LOGMSG(CONCAT('ROLLING BACK CENSUS POINT ',$CENS,' FROM TABLE: ',$TABLE)); RB_DelRows($TABLE,$RB_CENSUS); $COUNTER:=$COUNTER+1; END Session #10591-Ray Helm 33 UDF: RB_MAX() Returns Total Table Count • Implementation: Internal Calculation • Returns Integer value calculated as: RETURN LOOKUP('ODS_CURRENT', 'SELECT MAX(ROWNUMB) FROM ROLLBACK1'); Session #10591-Ray Helm 34 UDF: RB_GETTABLE($CNTR) Returns name of table to rollback • $CNTR passed to UDF • Internal Calculation returning table name: RETURN LOOKUP('ODS_CURRENT', CONCAT('SELECT TABLE_NAME FROM ROLLBACK1 WHERE ROWNUMB=',TOCHAR($CNTR))); Session #10591-Ray Helm 35 RB_DelRows($TABLE,$RB_CENSU S) Deletes rows from table • $TABLE (table name) and $RB_CENSUS (Census value) are passed to UDF • Internal Implementation executes: SQL('ODS_CURRENT',CONCAT('DELETE FROM ', $TABLE, ' WHERE CENSUS=''', $CENS,'''')); • Example: RB_DELROWS(‘MYTABLE’,’2004201’) executes: DELETE FROM MYTABLE WHERE CENSUS=‘2004201’ Session #10591-Ray Helm 36 CENSUS ROLLBACK PROCEDURE while $COUNTER <= RB_MAX() do BEGIN $CNTR:=$COUNTER; $TABLE:=RB_GETTABLE($CNTR); LOGMSG(CONCAT('ROLLING BACK CENSUS POINT ',$CENS,' FROM TABLE: ',$TABLE)); RB_DelRows($TABLE,$RB_CENSUS); $COUNTER:=$COUNTER+1; END Session #10591-Ray Helm 37 Migrating changes in Daily RDS to Census RDS • New data needs are being identified as analysts from across the University begin using the Daily RDS, and as development of the transactional system continues. • Changes to Daily RDS are relatively simple, since tables are truncated daily. • Changing Census RDS tables is more difficult: •Need to add new columns without losing existing data •How do you get data for added columns (or new tables) for past census points? Session #10591-Ray Helm 38 Adding a New Table to Census RDS • Adding new fact build is relatively simple: • Export Daily RDS fact build and related dimensions using Decision Stream CATEXP • Import into Census RDS using DS CATIMP • Modify fact build using methods described previously • Decide how to populate past census points: • Restore census point backups one at a time and execute build? • Use current data? Leave empty? Session #10591-Ray Helm 39 Adding a New Column to Existing Fact Build • More complicated than adding a new build • Need to modify existing table, cannot just create new • Potential for losing data exists (so be careful!) • Backing up both Catalog and Output databases before beginning is highly recommended Session #10591-Ray Helm 40 Adding a New Column to Existing Fact Build (Cont.) • Restoring census point backups and running fact build •Requires effort from technical support staff •Ensures data accuracy Session #10591-Ray Helm 41 Database Size Comparisons DATABASE SIZE (MB) Daily RDS 13,500 Census RDS (4 census points) 17,300 Session #10591-Ray Helm 42 Options for Reducing Census RDS Size •Drop tables/fact builds for unused tables •Add SQL procedure steps to drop Staging Tables •Staging tables account for 1,250 MB of storage (7.2%) •Alternative: leave Staging table delivery as Truncate •Remove unused/unnecessary columns from tables •Can you live with just one DESCR? •Drop unneeded indexes Session #10591-Ray Helm 43 Making Use of Census RDS Data Ryan Cherland Session #10591-Ray Helm 44 Examples of how Institutional Research uses the Census RDS • Creation and storage of a Census single record per student Reporting Datamart table • Audit checks – A source for audit checks with production SQRs run against the SASTATIC SA database • Reports – Census Class Roster Session #10591-Ray Helm 45 Single Record Report Table • Contains basic bio-demographic data • Campus location where majority of credit hours are taken (only location for this) • Various credit hour and FTE fields covering campus location, special hours (ROTC, Dissertation/Thesis), overall hours, etc. • “Primary” school, major, and student level selected from all active programs and plans • Test scores (SAT, ACT, GRE) • 104 fields in total Session #10591-Ray Helm 46 Creation of REPORTING_DMART table • Audit reports and correction efforts are made prior to the Census day in the Live SA database, but some errors are too complicated or new to be fixed in time • SQR written by SA developers runs against the SASTATIC database and creates a flat-file and error check report • So this table is where corrections are applied to the flat-file (wrong student levels, missing program / plan stacks, etc.) before loading into SARDSCEN using SAS Session #10591-Ray Helm 47 Screen Shots of Fields in Reporting Datamart… Session #10591-Ray Helm 48 Example of an Audit Check Report • Departmental Load Analysis – A summarized report by school, department, and course level of credit hour enrollment in courses – The OIRP SAS program using the SARDSCEN data allows us to validate the data in the SARDSCEN against the SQR version of this report that is run against SASTATIC Session #10591-Ray Helm 49 Results for the School of Business for the Fall 2003 Semester Session #10591-Ray Helm 50 Reporting from the SARDSCEN database • Advantages in having the same data structures – Can quickly point the report to using the SARDSCEN database or the daily SARDS database • Familiarity with the daily SARDS allows other “non-IR” staff to start using the historical data with confidence Session #10591-Ray Helm 51 Using SAS macro code to “flip” which database is being accessed /* Uncomment these for daily access */ %LET LIB=SARDS; %let cen=; /* Uncomment these for CENSUS access */ %*LET LIB=SARDSCEN; %*LET cen=CENSUS EQ '2003920' AND; /* Set Term Code below for desired semester */ %*let trm=4039; %let trm=4042; Session #10591-Ray Helm 52 SAS Macro fields in the query… PROC SQL; create table enrolled as select * from &LIB..rec_enrollment_fact_ku where &CEN ENRL_STRM="&TRM" and enrl_status_cd='E' and class_sid in (select distinct class_sid from &LIB..rec_class_dim where &CEN class_term_cd="&trm" and class_subject_cd='TH&F') order by class_sid; Session #10591-Ray Helm 53 Output Results using Census Session #10591-Ray Helm 54 Output Results using Daily Session #10591-Ray Helm 55 Conclusion Creating a historical archive version of the RDS from a copy of the daily RDS is a bit labor intensive, but the results are well worth it!! Ray Helm, [email protected] Research Analyst, University of Kansas Ryan Cherland, [email protected] Director University Management Information, University of Kansas - Lawrence http://www.heug.org Attendees may download HEUG 2004 presentations from HEUG On-Line Session #10591-Ray Helm 56