Download Mobule 1 - Intro to Database and File Structures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Module 1 – Database Information & File Processing
Links: http://www.answers.com/Q/What_is_file_processing_system
http://ecomputernotes.com/fundamental/what-is-a-database/traditional-file-processingsystem
http://ecomputernotes.com/database-system/rdbms/types-of-file-organization
http://wps.aw.com/wps/media/objects/7095/7265991/appendices/AppendixF.pdf
http://en.wikipedia.org/wiki/Sequential_access
http://www.worldbestlearningcenter.com/index_files/csharp-file-stream-sequential.htm
http://en.wikipedia.org/wiki/Relational_database
http://computer.howstuffworks.com/question599.htm
https://www.youtube.com/watch?v=HiXoEQzf-Zg
https://www.youtube.com/watch?v=NqvduIFX_0U
https://www.youtube.com/watch?v=9cxNg9_7DDc
https://www.youtube.com/watch?v=Y-bvjtYgRVU
https://www.youtube.com/watch?v=XStKmS-Atds
https://www.youtube.com/watch?v=Qf9s2R_Olf4
https://www.youtube.com/watch?v=F9qNhcgbj3A
https://www.youtube.com/watch?v=9GidBWigKnA
https://www.youtube.com/watch?v=Tqrzo0A1F8Y
https://www.youtube.com/watch?v=9GidBWigKnA
Content:
As we begin examining database concepts, we need to be certain we are familiar with certain
terms - data versus information. Data refers to facts concerning objects and events that could be
recorded and stored on computer media. Information is data that have been proceeded in such a
way that the knowledge of the person using the data is increased. Another term to become
familiar with is metadata - this is data that describes the properties or characteristics of end-user
data and the context of that data.
As described in the introduction, database organizations (models) evolved after the flat file
model. File processing systems encompassed the flat file group of data files. Flat files had
several disadvantages - such as dependence between flat files and programs, duplication of data,
limited data sharing between programs, lengthy development times, and excessive program
maintenance whenever a file was changed.
When database models came into use, the database models emphasized integration and sharing
of data and included advantages of program-data independence, planned data redundancy
(limited redundancy), improved data consistency, improved data sharing, increased productivity
of application development, improved data quality, improved data accessibility and
responsiveness, reduced program maintenance, and improved decision making. Some of the
costs and risks include: new specialized personnel, installation and management cost and
complexity, conversion costs, the need for explicit backup and recovery and organizational
conflict with ownership of data.
The software system that controls the data base is DBMS - database management system. A
database management system is a software system that is used to create, maintain, and provide
controlled access to user databases - in other words it creates, updates, stores and retrieves data
from the database.
Databases have been subdivided into how they are used - Personal databases, workgroup
databases, departmental/divisional databases, enterprise-wide databases, and web-enabled
databases.
As mentioned in the introduction - database models evolved after the flat files. Database models
began in the late 1960's with the hierarchy model that was begun primarily through IBM. This
model uses the tree structure for data storage. This model still exists today, but is primarily
legacy systems.
In the 1970's the Network model came into being through a consortium of companies. The
model uses the linked list as the structure for data storage. This model did not gain popularity
and did not last long.
In the 1980's the Relational model was created. This model uses a 2-dimentional table as the
structure for data storage. This model is still the number one database model in use today.
In the 1990's the object-oriented model was created. This model uses the object as the structure
for data storage.
Also in the 1990's the object-relational model was created. This model is actually a relational
database model in reality, but has an object-oriented front end for user interface.
Data warehousing came into being in the 1990's as well. This is a method for extracting
database "data" - cleaning that data (making it consistent and correct), and restoring the data into
"warehouses" for use with data mining products.
Web-endabled databases started in the late 1990's where traditional relational databases were
combined with web-programs to support such things as e-commerce, customer management and
service areas.
Random Access Files
Random access files consist of records that can be accessed in any sequence. This means the data is
stored exactly as it appears in memory, thus saving processing time (because no translation is necessary)
both in when the file is written and in when it is read.
Random files are a better solution to database problems than sequential files, although there are a few
disadvantages. For one thing, random files are not especially transportable. Unlike sequential files, you
cannot peek inside them with an editor, or type them in a meaningful way to the screen. In fact, moving
a PowerBASIC random file to another computer or language will probably require that you write a
translator program to read the random file and output a text (sequential) file.
One example of the transportability problem strikes close to home. Interpretive BASIC uses Microsoft's
non-standard format for floating-point values, and PowerBASIC uses IEEE standard floating-point
conventions, this means you cannot read the floating-point fields of random files created by Interpretive
BASIC with a PowerBASIC program, or vice versa, without a bit of extra work.
The major benefit of random files is implied in their name: every record in the file is available at any
time. For example, in a database of 23,000 alumni, a program can go straight to record number 11,663
or 22,709 without reading any of the other records. This capability makes it the only reasonable choice
for large files, and probably the better choice for small ones, especially those with relatively consistent
record lengths.
However, random access files can be wasteful of disk space because space is allocated for the longest
possible field in every record. For example, a 100-byte comment field forces every record to use an
extra 100 bytes of disk space, even if only one in a thousand actually uses it.
At the other extreme, if records are consistent in length, especially if they contain mostly numbers,
random files can save space over the equivalent sequential form. In a random file, every number of the
same type (Integer, Long-integer, Quad-integer, Byte, Word, Double-word, Single-precision, Doubleprecision, Extended-precision or Currency) occupies the same amount of disk space, regardless of the
value itself. For example, the following five Single-precision values each require four bytes (the same
space they occupy in memory):
0
1.660565E-27
15000.1
641
623000000
By contrast, numbers in a sequential file require as many bytes as they have ASCII characters when
printed (plus one for the delimiting comma if WRITE# was used instead of PRINT#). For example:
WRITE #1, 0;0
' takes 3 bytes
PRINT #1, 0;0
' takes 5 bytes
PRINT #1, 1.660565E-27 ' takes 13 bytes
You can create, write, and read random access files using the following steps:
1.
First, OPEN the file and specify the length of each record:
OPEN filespec FOR RANDOM AS [#]filenum [LEN = recordsize]
The LEN parameter indicates to PowerBASIC the total size of each record in bytes. If you do not specify
a LEN parameter, PowerBASIC assumes 128. Unlike sequential files, you do not have to declare whether
you are opening for input or output because you can simultaneously read and write a random file.
2.
Define a structure for records in the file using the TYPE statement.
TYPE StudentRecord
LastName
AS STRING * 20 ' A 20-character string
FirstName AS STRING * 15 ' A 15-character string
IDnum
AS LONG
' Student ID, a Long-integer
Contact
AS STRING * 30 ' Emergency contact person
ContactPhone AS STRING * 14 ' Their phone number
ContactRel AS STRING * 8 ' Relationship to student.
AverageGrade AS SINGLE
' Single-precision % grade
END TYPE
DIM Student AS StudentRecord
3.
Fill the UDTs members with the values you want, and write records to the file using the PUT
statement.
Student.LastName = "Anderson"
Student.FirstName = "Bob"
Student.IDnum = 494425610
Student.Contact = "Ma Anderson"
Student.ContactPhone = "(800) BOBSMOM"
Student.ContactRel = "Mother"
Student.AverageGrade = 98.9
PUT #fileNumber, recordNumber, Student
4.
Read records from the file using the GET statement.
GET #fileNumber, recordNumber, Student
5.
When finished, CLOSE the file.
Binary Files
PowerBASIC's binary file technique, an extension to Interpretive BASIC, allows you to treat any file as a
numbered sequence of bytes without regard to anything, including the following: ASCII characters,
number versus string considerations, record length, carriage returns. With the binary approach to a file
problem, you read and write a file by specifying exactly which bytes to read or write. This is similar to
the services provided by Windows API functions used for reading and writing files.
Flexibility always comes at a price. Binary files require that you do all the work to decide what goes
where. Binary may be the best option when dealing with alien files that aren't in ASCII format; for
example, a file created by a spreadsheet or database product. Of course, you will have to know the
precise structure of the file before you can even attempt to break it down into numbers and strings
agreeable to PowerBASIC.
Every file opened in binary mode has an associated position indicator that points to the place in the file
that will be read or written to next. Use the SEEK statement to set the position indicator, and the SEEK
function to read it.
Binary files are accessed in the following way:
1. First, OPEN the file in BINARY mode. You need not specify whether you are reading or writing; you
can do either, or both.
2. To read the file, use SEEK to position the file pointer at the byte you want to read. Then use GET$
to read a specified number of characters into a string variable.
3. To write to the file, load a string variable with the information to be written. Then use SEEK to
position the point in the file to which it should be written, and use PUT$ to write the data.
4.
When finished, CLOSE the file.
Sequential Files
Sequential file techniques provide a straightforward way to read and write files. PowerBASIC's
sequential file commands manipulate text files: files of ANSI or WIDE characters with carriagereturn/linefeed pairs separating records.
Quite possibly, the best reason for using sequential files is their degree of portability to other programs,
programming languages, and computers. Because of this, you can often look at sequential files as the
common denominator of data processing, since they can be read by word-processing programs and
editors (such as PowerBASIC's), absorbed by other applications (such as database managers), and sent
over the Internet to other computers.
The idea behind sequential files is simplicity itself: write to them as though they were the screen and
read from them as though they were the keyboard.
Create a sequential file using the following steps:
1. Open the file in sequential output mode. To create a file in PowerBASIC, you must use the OPEN
statement. Sequential files have two options to prepare a file for output:
OUTPUT: If a file does not exist, a new file is created. If a file already exists, its contents are
erased, and the file is then treated as a new file.
APPEND: If a file does not exist, a new file is created. If a file already exists, PowerBASIC appends
(adds) data at the end of that file.
2. Output data to a file. Use WRITE# or PRINT# to write data to a sequential file.
3. Close the file. The CLOSE statement closes a file after the program has completed all I/O
operations.
To read a sequential file:
1. First, OPEN the file in sequential INPUT mode. This prepares the file for reading.
2. Read data in from the file. Use PowerBASIC's INPUT# or LINE INPUT# statements.
3. Close the file. The CLOSE statement closes a file after the program has completed all I/O
operations.
The drawback to sequential files is, not surprisingly, that you only have sequential access to your
data. You access one line at a time, starting with the first line. This means if you want to get to the last
line in a sequential file of 23,000 lines, you will have to read the preceding 22,999 lines.
Sequential files, therefore, are best suited to applications that perform sequential processing (for
example, counting words, checking spelling, printing mailing labels in file order) or in which all the data
can be held in memory simultaneously. This allows you to read the entire file in one fell swoop at the
start of a program and to write it all back at the end. In between, the information can be stored in an
array (in memory) which can be accessed randomly.
Although the SEEK statement can be used to change the point in the file where the next read or write
will occur, the calculations required to determine the position of the start of each record in a sequential
file would add considerable overhead. Sequential files typically consist of records of varying
sizes. Either you would have to maintain a separate index file indicating the starting byte position of
each record, or you would have to seek randomly until you found the correct position. However, SEEK
does have its uses with sequential files. For instance, after reading an entire file, you could use SEEK to
reposition the file pointer to the start of the file, in order to process the data a second time. This is
certainly quicker than closing and re-opening the file.
Sequential files lend themselves to database situations in which the length of individual records is
variable. For example, suppose an alumni list had a comments field. Some people may have 100 bytes
or more of comments. Others, perhaps most, will have none. Sequential files handle this problem
without wasting disk space.
The OPEN statement provides an optional LEN parameter for use with sequential files. This instructs
PowerBASIC to use internal buffering to speed up reading of sequential files, using a buffer of the size
specified by the LEN parameter. A buffer of 8192 bytes is suggested for best general performance,
especially when networks are involved. However, this value can be increased in size to gain additional
performance - the best value will always be specific to a particular combination of hardware and
software, and may vary considerably from PC to PC, network to network, etc.
The OPEN statement also provides an optional character mode parameter. This specifies the character
mode for this file: ANSI or WIDE (Unicode). Since sequential files consist of text alone, the selected
mode is enforced by PowerBASIC. All data read or written to the file is automatically forced to the
selected mode, regardless of the type of variables or expressions used. With binary or random files, this
specification has no effect, but it may be included in your code for self-documentation purposes.
ANSI characters in the U.S. range of CHR$(0) to CHR$(127) are known as ASCII, and are always
represented by a single byte. International ANSI characters in the range of CHR$(128) to CHR$(255) may
be followed by one or more additional bytes in order to accurately represent non-U.S. characters. The
exact definition of these characters depends upon the character set in use. WIDE characters are always
represented by two bytes per character. If the Chr option is not specified, the default mode is ANSI.