Download Writing Variable-Length Records to the File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
UNIT-II
Preeti Deshmukh
Writing Variable-Length Records to the File

Involves some problems:
 If
length indicator at beginning – must know the sum of
the lengths of fields in each record before writing to
file
 What form should we write record-length to the file
 Binary integer or ASCII character
 For this case – character array as a buffer
 Where
them
we can place fields and field delimiters as we collect
Representing record length

One optionin form of 2-byte binary integer before each record (C)
 Can represent much bigger numbers with an integer than
same number of ASCII byte


Another optionConvert length into character string using formatted output.
 With C streams –fprintf()
 With C++ stream classes – overloaded insertion “<<“
operator

Example:
fprintf(file, “%d”, length);// C
stream<< length << ‘ ‘; //C++
Above example inserts integers and a blank as delimiter


Easy to store integers as fixed 2 bytes require
Reading Variable-Length Records from the File
Using classes to Manipulate Buffers

C++ classes to encapsulate buffer pack, unpack,
read and write operations of buffer objects
 For
output:
Starts with empty
pack field values into the object one by one
buffer contents to an output stream
 For
input:
Initialize a buffer object by reading a record from a input
stream
Extract the object’s field values one by one
Classes
1.
2.
3.
One for delimited fields
One for length-based fields
One for fixed length-fields

Buffer Class for Delimited Text Fields
 DelimTextBuffer
class
 Supports variable length buffer whose fields are
represented as delimited text
 deltext.h –defiition
 Operations on buffers – constructors, read, write, field
pack and unpack
 Example bellow packs the person(object) into the
buffer and writes the buffer to a file
int DelimTextBuffer :: Unpack(char *str)
//extract the value of next field of the buffer
{
int len = -1; //length of packed string
int start = NextByte; //first character to be unpacked
for(int i = start; i<BufferSize; i++)
{
len = i – start;
break;
}
if(len == -1) return FALSE; //delimiter not found
NextByte += len + 1;
if(NextByte >BufferSize ) return FALSE;
strncpy(str, &Buffer[start], len);
str[len] = 0; //zero termination for string
return TRUE;
}

Buffer classes for Length-based and Fixed-length
fields
 Change
in the implementation of the Pack and Unpack
methods of the delimited class
 Class definition almost same
 Full definition available in lentext.h and lentext.cpp
Full definition available in lentext.h and lentext.cpp
class LengthTextBuffer
{ public:
LengthTextBuffer (int maxBytes = 1000);
int Read( istream &file);
int Write(ostream &file) const;
int Pack(const char *field, int size = -1);
int Unpack(char * field);
private:
char * Buffer; //character array to hold field values
int BufferSize; //size of packed fields
Int MaxBytes; //maximum number of characters in Buffer
int NextByte; // packing/unpacking position in buffer
};
Full definition available in fixtext.h and fixtext.cpp
class FixedTextBuffer
{ public:
FixedTextBuffer (int maxBytes = 1000);
int AddField(int fieldSize);
int Read( istream &file);
int Write(ostream &file) const;
int Pack(const char *field);
int Unpack(char * field);
private:
char * Buffer; //character array to hold field values
int BufferSize; //size of packed fields
Int MaxBytes; //maximum number of characters in Buffer
int NextByte; // packing/unpacking position in buffer
int * FieldSizes; //array of field sizes
};

Example:
int Person :: InitBuffer (FixedTextBuffer &Buffer)
//initialize a FixedTextBuffer to be used for Person Objects
{
Buffer.Init(6,61);
Buffer.AddField(10); // LastName[11]
Buffer.AddField(10); // FirstName[11]
Buffer.AddField(15); // Address[16]
Buffer.AddField(15); // City[16]
Buffer.AddField(2); // State[3]
Buffer.AddField(9); // ZipCode[10]
return 1;
}
Record Access


“Record is the quantity of information that is being read
or written.”
Record Keys:
Convenient identifying record through a key
 Standard form of keys must be defined along with
associated rules – called as “Canonical(conforming to the
rules)”
 Canonical form is the single representation for that key

suppose searching a record with name “Ames”- in different input
forms “AMES”/ “ames” / “Ames”
 Canonical key example:
key consists only uppercase letters, no blank spaces at the end

 Distinct
 To
keys: keys that uniquely identify a single record.
avoid/prevent confusion
 Unique canonical key = primary key
 Secondary key
 Primary key should be dataless, unchanging

Sequential search:
Reading a file record by record with particular key
 Evaluating Performance of Sequential Search



Work required to sequentially search for a record in a file with n records
is – proportional to n: takes at most n comparisons and average n/2
comparisons
Improving Sequential Search Performance with Record Blocking
Logical organization within the file
 As a performance measure
 Example:

File with 4000 records
Avg length of record is 512 bytes
If OS uses sector sized buffers of 512 bytes
Then unblocked sequential search needs avg. 2000 read calls
With blocking
Group of 16 records per block
No of read comes at 8 kb’s worth of records
Avg 125 search
Unix tools for sequential processing

Most common file structure in UNIX is ASCII file







New line as record delimiter
White space as field delimiter
Simple and easy to process
UNIX provides rich array of tools
File structure- Inherently sequential
Most of tools process sequentially
Examples:

cat:
% cat myfile


wc:
% wc myfile
grep : (generalized regular expression)
% grep Ada myfile
Direct Access




Alternative to sequential access
Direct access when- seek directly to the beginning
of record and read it
Sequential searches O(n)- Direct searches O(1)
Get record in single seek

IOBuffer class includes
DRead (Direct read ) and Dwrite (Direct Write)
Operations using byte address of record as reference
Example:
int IOBuffer:: DRead (istream & stream, int recref)
// reads specified record
{
stream . seekg( recref, ios::beg);
if(stream. tellg () != recref )
return -1;

reaturn Read (stream);
}



Major issue is knowing where the beginning
Info carried in separate index file
Relative record number(RRN)
 -emerges from viewing a file as a sequence of records
 RRN
of a record gives its position relative to beginning
of file


 Can
First record-RRN 0
Second –RRN 1 and so on
tie a record to its RRN by assigning membership
number





Support direct access with RRN
records with fixed size
Records RRN to calculate the byte offset of the start
of the record relative to the start of the file.
Byte offset = n x r
Example:
Record with RRN -546
File with fixed-length record size of 128 bytes per record
Byte offset = 546 x 128 = 69888
Record Structure

Choosing a record Structure and Record Length:
Fixed length of records
 Depends on the size of the fields in record
 Example:
For building a file of sales transactions containing info:
1.
Six digit account number of the purchaser
2.
Six digits for the date field
3.
Five- character stock number for item purchased
4.
Three- digit field for quantity
5.
Ten-position field for total cost
Sum of fields is 30 bytes


Suppose:
To store a record on typical sectored disk with sector size 512bytes
We might need to pad records to 32 bytes for integral number record.

Two approaches:
1.
Has virtue of simplicity:
“break out” – fixed length fields in fixed length record
2.
An averaging-out effect that usually occurs:
fixed length record with variable length fields

Combination of these both structures can be made
Header records







Necessary to keep track of some general information about file
Header record placed at beginning of file to hold this information
Some languages doesn't support easy way to jump end of file even
with direct access.
Simple solution to keep count of record somewhere else (with length
of record, date & time of most recent update to file, so on )
Header Record help file to become self-describing object, freeing
the s/w with all information in prior about the file
Header record is with different structure than normal data record
It contains Header size, number of records, and each record size
File Access And File Organization




Variable-length records
Relate to aspect of file
organization
Fixed-length records
Sequential access
Relate to aspect of file access
Direct access

What have considered for the categories of file
organization
Can the file divided into fields?
 Is there a higher level of organization to the file that combines the
fields into records?
 Do all the records have same number of bytes or fields?
 How do we distinguish one record from another?
 How do we organize the internal structure of a fixed-length
record so we can distinguish between data and extra space?


Many possible answers- choice of a file organization
depends on:



Many things
File handling facilities of the language
Use you want to make of the file
Sequential access
•Developing sequential search
•Unknown about beginning of
records


Direct access
•Fixed length record access
•Allowing to calculate precisely
record beginning
•Seeking directly
We can use both fixed and variable length records with direct
access
With variable length records we can simply keep a list of byte
offsets from start of the file for placement of each record
Abstract Data Models for File Access








Common on computers primarily with magnetic tape, punched cards
Memory space and programming languages was primitive
Compelled to view File data exactly on tape or card- as sequence of fields and
records
Data processing meant processing fields and records in the traditional sense
Gradually it is recognized computers can process images, sounds, documents except
fields and records only.
This type of information does fit in metaphor of data stored as sequences records
divided in fields
Envision data objects as documents, images, sounds
“The notion that we need not view data as it appears on a particular medium is
captures in the phrase abstract data models”
Encourages an application-oriented view of data rather than medium-oriented
 Abstract data model- described- how an application views data
rather than how might physically be stored
One way is- keep information in file that file-access software can use to “understand” those
objects.
Example: put file structure information in a header

Metadata






Definition : “Metadata is data that describes the
primary data in a file”
A common place to store metadata in a file is the
header record.
Typically users of particular kind of data agrees on a
standard format for holding metadata
Example: FITS(Flexible Image Transport System)
developed by International Astronomers’ Union
ASCII headers are easy to read and process and since
they occur only once, take up relatively little space.
In which each record contains single piece of metadata
Extensibility

Mixed objects type file
 Identifying
 Example
fields and records using – Keywords
: keyword=value format
Portability and Standardization

Achieving Portability:
Major problems: Differences among languages, OS, and machine
architecture
 Achieving portability means determining how to deal with above issues

Agree on a Standard Physical Record Format and Stay with it :
1.




Physical standard: represented same physically; language, OS, and
machine doesn't matter. Example: FITS(header record)
Once standard is established-very tempting to improve on it
If standard is sufficiently extensible – temptation can be avoided
Way to sure standard has staying power- make it simple enough

so files can be written in standard format from wide range of machines,
OS and languages.
Agree on a Standard Binary Encoding for Data Elements
2.





Basic two data elements are text and Numbers
Text- ASCII or EBCDIC represents most common encoding schemes
Number- encoding schemes not large but
Sharing data among machines – uses different binary encoding can be
high
IEEE has established standard format specification



for- 32 bit , 64 bit and 128 bit floating point numbers
8 bit, 16 bit and 32 bit for integers
XDR(External Data Representation



Not only specifies standard encoding for all files
But provides set of routines for each machine for converting from binary
when writing to file and vice-versa
Ex. : when we want to store numbers in XDR , we can read and write them
by replacing read and write routines in our program with XDR routines.
XDR routines take care of conversion
Number and Text Conversion
3.



Sometimes use of standard data encoding is not feasible
Every time numbers or characters have to translate from one
format to another
Time consuming and possibility of loss of accuracy
Continue..

To move files between two /more different platforms


Example : IBM and VAX (which uses different native formats for numbers
and ASCII for characters.)
Solution : write or borrow a program that translates
1.
Converting between IBM and VAX native format requires two conversion
routines
For many different platforms using different encodings
Write a program to convert from each of the representation (for n –>
n(n-1) translators)


2.
Converting directly between five different native formats requires 20
conversion routines
Better alternative




3.
Agree on standard intermediate format
Ex. : XDR
Reduces translators from n(n-1) to 2n
Converting five different native formats via an intermediate standard format
requires 10 conversion routines
3. File Structure Conversion
Conversion problems that apply to atomic data encoding also
apply to file structure for more complex objects
Like images

Complex objects & their representation - Need specific
applications

4. File System Differences


Differences physical file organization
Example: Unix writes files to tapes in 512-byte blocks-thirty –
six80-byte record
5. Unix and Portability


For block-size problem Unix provides a utility –dd
dd- for coping tape data- can be used to convert data from
any physical source.