* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DBMS - Computer Information Systems
Entity–attribute–value model wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Clusterpoint wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
3D optical data storage wikipedia , lookup
Data vault modeling wikipedia , lookup
Information privacy law wikipedia , lookup
ICS321 / IBM205 Management Information Systems Dr. Ken Cosh Databases: Foundations of Business Intelligence Review • Last week we looked at; – Hardware • Input, Output devices, Processors, Memory – Types of computer • Mainframe, Midrange, PC’s – The emergence of Client Server relationships • Thin Clients, Fat Clients, P2P – Networks • Centralised, Decentralised, Distributed This Weeks Topics • The ‘Data’ Resource – Organising Data – Databases The Importance of Good Data • Garbage In, Garbage Out; – Bad data will result in bad information (hence bad decisions) • An effective information system depends not only on having good data, but on storing, organising and accessing the data. • Challenges to this include; – Organisational Obstacles – Data Integration and Quality Organisational Obstacles • Implementing new data models requires reexamining the role of data within an organisation, – Who has access to what data, and when? – Changing the allocation (or sharing) of data can impact on current power relationships, and so is often met by political resistance. • Traditionally data is stored in file format, with each department having a selection of files. Modern approaches introduce databases and DBMS that can be shared across multiple departments Data Integration and Quality • When moving data from a file format to a database (or sharing data from one source across multiple systems), converting data formats can be costly, and risks losing important information. Traditional File Format • • • • Bit (binary digit) – smallest unit a computer handles. Byte – Group of bits which can represent a character Field – Group of bytes which can represent a word Record – Group of related fields (such as name, course, grade, date) • File – Group of records of the same type (such as a group of students records) • Database – Group of related files (such as grouping the student’s personal history, financial records) More terms • Entity – a person, place, thing or event for which data is stored about. • Attribute – Any characteristic or quality describing an entity, for example ‘age’. • Key Field – Every record in a file, should contain at least one key field, which uniquely identifies instances of that record, so that they can be retrieved, updated or sorted, for example ‘order number’. Key Terms Database Course File History File Finance File File NAME John Stewart Sarah Smith Ken Cosh COUR ICS321 ICS321 ICS321 DATE 2005 2005 2005 GRADE C+ B A+ Record NAME John Stewart COUR DATE ICS321 2005 GRADE C+ Field John Stewart (NAME Field) Byte 01001010 (Letter J in ASCII) Bit 0 So What’s the Problem? • Remember Systems within systems (subsystems), interfacing systems and adaptive systems? – Each system tends to grow and adapt independently. – Functional units develop systems isolated from other units. – Each functional unit develops many databases; personnel has personnel, payroll, medical insurance, pensions, mailing file…. Problems • Data Redundancy and Confusion – Duplicate Data in multiple data files. – The same data can have different names, different meanings, different related data in different places. – The same name might be used for different data in different places. – Database confusion makes implementing a SCM, CRM or Enterprise wide system difficult. Problems 2 • Program-Data Dependence – There is a tight relationship between the data in files and the programs using them. – Any changes to the data, results in necessary changes to the programs that use the data. – Maintaining data becomes costly. • Lack of Flexibility – Scheduled reports can easily be generated from the data. – Ad Hoc reports however are costly to generate. While the information is somewhere in the system getting it out is tricky. Problems 3 • Poor Security – Or poor control. – There is now a lot of data in a lot of databases throughout the organisation. It is difficult to control or manage the data – who is accessing what data? • Lack of Data Sharing & Availability – With poor control over data, its difficult to share data between functions. – Accounts might benefit from some data that manufacturing has, etc. Databases • “Group of related files (such as grouping the student’s personal history, financial records)” • Ideally a database is a group of related data which can be used to serve many different applications, by centralising data and minimising redundancy. Data Management Employees Name Address ID Position Payroll Hours Worked Pay Rate Tax Gross Pay Benefits Life Insurance Golf Membership Healthcare Retirement Personnel Applications DBMS Payroll Applications Benefits Applications DBMS • The DBMS sits between the actual data and the applications which use the data. • This saves the user from needing to understand the actual physical way the data is stored, instead presenting a logical view of it. • The user doesn’t need to know the data definition language, but instead could use a data manipulation language such as SQL. • In reality often the manipulation language is hidden within an application. DBMS Data Definition Data Manipulation Creating & Changing the logical structure of a database Querying & making changes to the information Database Application Generation Data Administration Menus, data entry screens, reports and application software Who can see what information; methods for backup and recovery Hierarchical Database ROOT FIRST CHILD Performance Ratings SECOND CHILD Employee Compensation Job Assignment Benefits Salary History Pension History Life Insurance Health Hierarchical Data • Suppose from the previous data structure, we wanted to access the salary history for all people with the job title “Assistant”, accessing that data would not be easy. • While certain scheduled reports can be generated, ad hoc reports are not as flexible. Relational Databases • Data is organised into tables, which could be visualised as a spreadsheet. In each table data is organised into rows / records (or tuples). • Any piece of data from any table can be linked to any piece of data in another table, so long as they have a common data element (field). Object-Oriented DB • Hierarchical and Relational databases assume that data is in character or numerical form. • Modern (and futuristic) databases store data which can’t easily be represented in files and tables (such as graphics, sounds, java applets or any other multimedia). • O-O databases are designed to deal with these diverse data types, however they tend to be a lot slower than relational databases. SQL • Structured Query Language – The commonest data manipulation language for relational databases. – Used to query (get information from) databases, and also to put information into databases. • SELECT Part_Number, Supplier_Code FROM Part WHERE Unit_Price < 25.00; – This would return the all the part numbers and supplier codes from the database called ‘part’ which cost less than 25.00. SQL • Suppose we want to know; – “By actual vs budgeted, how many size 8 shoes in black did we sell last month in the southeast and southwest regions, compared to the same month over the past 5 years.” • A complex request… – If we can build the query, it could effect the performance of the database system! • Especially if it is a live operating system. Data Warehouse • Logical collection of information gathered from many different operational databases. • Used to create business intelligence, assist with analysis and decision making. • Multi-dimensional ‘hypercube’ of information. Data Mining • • • • Query and Reporting Tools Intelligent Agents Multidimensional Analysis Tools Statistical Tools