Download Mobule 1 - Intro to Database and File Structures

Module 1 – Database Information & File Processing Links: http://www.answers.com/Q/What_is_file_processing_system http://ecomputernotes.com/fundamental/what-is-a-database/traditional-file-processingsystem http://ecomputernotes.com/database-system/rdbms/types-of-file-organization http://wps.aw.com/wps/media/objects/7095/7265991/appendices/AppendixF.pdf http://en.wikipedia.org/wiki/Sequential_access http://www.worldbestlearningcenter.com/index_files/csharp-file-stream-sequential.htm http://en.wikipedia.org/wiki/Relational_database http://computer.howstuffworks.com/question599.htm https://www.youtube.com/watch?v=HiXoEQzf-Zg https://www.youtube.com/watch?v=NqvduIFX_0U https://www.youtube.com/watch?v=9cxNg9_7DDc https://www.youtube.com/watch?v=Y-bvjtYgRVU https://www.youtube.com/watch?v=XStKmS-Atds https://www.youtube.com/watch?v=Qf9s2R_Olf4 https://www.youtube.com/watch?v=F9qNhcgbj3A https://www.youtube.com/watch?v=9GidBWigKnA https://www.youtube.com/watch?v=Tqrzo0A1F8Y https://www.youtube.com/watch?v=9GidBWigKnA Content: As we begin examining database concepts, we need to be certain we are familiar with certain terms - data versus information. Data refers to facts concerning objects and events that could be recorded and stored on computer media. Information is data that have been proceeded in such a way that the knowledge of the person using the data is increased. Another term to become familiar with is metadata - this is data that describes the properties or characteristics of end-user data and the context of that data. As described in the introduction, database organizations (models) evolved after the flat file model. File processing systems encompassed the flat file group of data files. Flat files had several disadvantages - such as dependence between flat files and programs, duplication of data, limited data sharing between programs, lengthy development times, and excessive program maintenance whenever a file was changed. When database models came into use, the database models emphasized integration and sharing of data and included advantages of program-data independence, planned data redundancy (limited redundancy), improved data consistency, improved data sharing, increased productivity of application development, improved data quality, improved data accessibility and responsiveness, reduced program maintenance, and improved decision making. Some of the costs and risks include: new specialized personnel, installation and management cost and complexity, conversion costs, the need for explicit backup and recovery and organizational conflict with ownership of data. The software system that controls the data base is DBMS - database management system. A database management system is a software system that is used to create, maintain, and provide controlled access to user databases - in other words it creates, updates, stores and retrieves data from the database. Databases have been subdivided into how they are used - Personal databases, workgroup databases, departmental/divisional databases, enterprise-wide databases, and web-enabled databases. As mentioned in the introduction - database models evolved after the flat files. Database models began in the late 1960's with the hierarchy model that was begun primarily through IBM. This model uses the tree structure for data storage. This model still exists today, but is primarily legacy systems. In the 1970's the Network model came into being through a consortium of companies. The model uses the linked list as the structure for data storage. This model did not gain popularity and did not last long. In the 1980's the Relational model was created. This model uses a 2-dimentional table as the structure for data storage. This model is still the number one database model in use today. In the 1990's the object-oriented model was created. This model uses the object as the structure for data storage. Also in the 1990's the object-relational model was created. This model is actually a relational database model in reality, but has an object-oriented front end for user interface. Data warehousing came into being in the 1990's as well. This is a method for extracting database "data" - cleaning that data (making it consistent and correct), and restoring the data into "warehouses" for use with data mining products. Web-endabled databases started in the late 1990's where traditional relational databases were combined with web-programs to support such things as e-commerce, customer management and service areas. Random Access Files Random access files consist of records that can be accessed in any sequence. This means the data is stored exactly as it appears in memory, thus saving processing time (because no translation is necessary) both in when the file is written and in when it is read. Random files are a better solution to database problems than sequential files, although there are a few disadvantages. For one thing, random files are not especially transportable. Unlike sequential files, you cannot peek inside them with an editor, or type them in a meaningful way to the screen. In fact, moving a PowerBASIC random file to another computer or language will probably require that you write a translator program to read the random file and output a text (sequential) file. One example of the transportability problem strikes close to home. Interpretive BASIC uses Microsoft's non-standard format for floating-point values, and PowerBASIC uses IEEE standard floating-point conventions, this means you cannot read the floating-point fields of random files created by Interpretive BASIC with a PowerBASIC program, or vice versa, without a bit of extra work. The major benefit of random files is implied in their name: every record in the file is available at any time. For example, in a database of 23,000 alumni, a program can go straight to record number 11,663 or 22,709 without reading any of the other records. This capability makes it the only reasonable choice for large files, and probably the better choice for small ones, especially those with relatively consistent record lengths. However, random access files can be wasteful of disk space because space is allocated for the longest possible field in every record. For example, a 100-byte comment field forces every record to use an extra 100 bytes of disk space, even if only one in a thousand actually uses it. At the other extreme, if records are consistent in length, especially if they contain mostly numbers, random files can save space over the equivalent sequential form. In a random file, every number of the same type (Integer, Long-integer, Quad-integer, Byte, Word, Double-word, Single-precision, Doubleprecision, Extended-precision or Currency) occupies the same amount of disk space, regardless of the value itself. For example, the following five Single-precision values each require four bytes (the same space they occupy in memory): 0 1.660565E-27 15000.1 641 623000000 By contrast, numbers in a sequential file require as many bytes as they have ASCII characters when printed (plus one for the delimiting comma if WRITE# was used instead of PRINT#). For example: WRITE #1, 0;0 ' takes 3 bytes PRINT #1, 0;0 ' takes 5 bytes PRINT #1, 1.660565E-27 ' takes 13 bytes You can create, write, and read random access files using the following steps: 1. First, OPEN the file and specify the length of each record: OPEN filespec FOR RANDOM AS [#]filenum [LEN = recordsize] The LEN parameter indicates to PowerBASIC the total size of each record in bytes. If you do not specify a LEN parameter, PowerBASIC assumes 128. Unlike sequential files, you do not have to declare whether you are opening for input or output because you can simultaneously read and write a random file. 2. Define a structure for records in the file using the TYPE statement. TYPE StudentRecord LastName AS STRING * 20 ' A 20-character string FirstName AS STRING * 15 ' A 15-character string IDnum AS LONG ' Student ID, a Long-integer Contact AS STRING * 30 ' Emergency contact person ContactPhone AS STRING * 14 ' Their phone number ContactRel AS STRING * 8 ' Relationship to student. AverageGrade AS SINGLE ' Single-precision % grade END TYPE DIM Student AS StudentRecord 3. Fill the UDTs members with the values you want, and write records to the file using the PUT statement. Student.LastName = "Anderson" Student.FirstName = "Bob" Student.IDnum = 494425610 Student.Contact = "Ma Anderson" Student.ContactPhone = "(800) BOBSMOM" Student.ContactRel = "Mother" Student.AverageGrade = 98.9 PUT #fileNumber, recordNumber, Student 4. Read records from the file using the GET statement. GET #fileNumber, recordNumber, Student 5. When finished, CLOSE the file. Binary Files PowerBASIC's binary file technique, an extension to Interpretive BASIC, allows you to treat any file as a numbered sequence of bytes without regard to anything, including the following: ASCII characters, number versus string considerations, record length, carriage returns. With the binary approach to a file problem, you read and write a file by specifying exactly which bytes to read or write. This is similar to the services provided by Windows API functions used for reading and writing files. Flexibility always comes at a price. Binary files require that you do all the work to decide what goes where. Binary may be the best option when dealing with alien files that aren't in ASCII format; for example, a file created by a spreadsheet or database product. Of course, you will have to know the precise structure of the file before you can even attempt to break it down into numbers and strings agreeable to PowerBASIC. Every file opened in binary mode has an associated position indicator that points to the place in the file that will be read or written to next. Use the SEEK statement to set the position indicator, and the SEEK function to read it. Binary files are accessed in the following way: 1. First, OPEN the file in BINARY mode. You need not specify whether you are reading or writing; you can do either, or both. 2. To read the file, use SEEK to position the file pointer at the byte you want to read. Then use GET$ to read a specified number of characters into a string variable. 3. To write to the file, load a string variable with the information to be written. Then use SEEK to position the point in the file to which it should be written, and use PUT$ to write the data. 4. When finished, CLOSE the file. Sequential Files Sequential file techniques provide a straightforward way to read and write files. PowerBASIC's sequential file commands manipulate text files: files of ANSI or WIDE characters with carriagereturn/linefeed pairs separating records. Quite possibly, the best reason for using sequential files is their degree of portability to other programs, programming languages, and computers. Because of this, you can often look at sequential files as the common denominator of data processing, since they can be read by word-processing programs and editors (such as PowerBASIC's), absorbed by other applications (such as database managers), and sent over the Internet to other computers. The idea behind sequential files is simplicity itself: write to them as though they were the screen and read from them as though they were the keyboard. Create a sequential file using the following steps: 1. Open the file in sequential output mode. To create a file in PowerBASIC, you must use the OPEN statement. Sequential files have two options to prepare a file for output: OUTPUT: If a file does not exist, a new file is created. If a file already exists, its contents are erased, and the file is then treated as a new file. APPEND: If a file does not exist, a new file is created. If a file already exists, PowerBASIC appends (adds) data at the end of that file. 2. Output data to a file. Use WRITE# or PRINT# to write data to a sequential file. 3. Close the file. The CLOSE statement closes a file after the program has completed all I/O operations. To read a sequential file: 1. First, OPEN the file in sequential INPUT mode. This prepares the file for reading. 2. Read data in from the file. Use PowerBASIC's INPUT# or LINE INPUT# statements. 3. Close the file. The CLOSE statement closes a file after the program has completed all I/O operations. The drawback to sequential files is, not surprisingly, that you only have sequential access to your data. You access one line at a time, starting with the first line. This means if you want to get to the last line in a sequential file of 23,000 lines, you will have to read the preceding 22,999 lines. Sequential files, therefore, are best suited to applications that perform sequential processing (for example, counting words, checking spelling, printing mailing labels in file order) or in which all the data can be held in memory simultaneously. This allows you to read the entire file in one fell swoop at the start of a program and to write it all back at the end. In between, the information can be stored in an array (in memory) which can be accessed randomly. Although the SEEK statement can be used to change the point in the file where the next read or write will occur, the calculations required to determine the position of the start of each record in a sequential file would add considerable overhead. Sequential files typically consist of records of varying sizes. Either you would have to maintain a separate index file indicating the starting byte position of each record, or you would have to seek randomly until you found the correct position. However, SEEK does have its uses with sequential files. For instance, after reading an entire file, you could use SEEK to reposition the file pointer to the start of the file, in order to process the data a second time. This is certainly quicker than closing and re-opening the file. Sequential files lend themselves to database situations in which the length of individual records is variable. For example, suppose an alumni list had a comments field. Some people may have 100 bytes or more of comments. Others, perhaps most, will have none. Sequential files handle this problem without wasting disk space. The OPEN statement provides an optional LEN parameter for use with sequential files. This instructs PowerBASIC to use internal buffering to speed up reading of sequential files, using a buffer of the size specified by the LEN parameter. A buffer of 8192 bytes is suggested for best general performance, especially when networks are involved. However, this value can be increased in size to gain additional performance - the best value will always be specific to a particular combination of hardware and software, and may vary considerably from PC to PC, network to network, etc. The OPEN statement also provides an optional character mode parameter. This specifies the character mode for this file: ANSI or WIDE (Unicode). Since sequential files consist of text alone, the selected mode is enforced by PowerBASIC. All data read or written to the file is automatically forced to the selected mode, regardless of the type of variables or expressions used. With binary or random files, this specification has no effect, but it may be included in your code for self-documentation purposes. ANSI characters in the U.S. range of CHR$(0) to CHR$(127) are known as ASCII, and are always represented by a single byte. International ANSI characters in the range of CHR$(128) to CHR$(255) may be followed by one or more additional bytes in order to accurately represent non-U.S. characters. The exact definition of these characters depends upon the character set in use. WIDE characters are always represented by two bytes per character. If the Chr option is not specified, the default mode is ANSI.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Mobule 1 - Intro to Database and File Structures