Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
UNIT-II Preeti Deshmukh Writing Variable-Length Records to the File Involves some problems: If length indicator at beginning – must know the sum of the lengths of fields in each record before writing to file What form should we write record-length to the file Binary integer or ASCII character For this case – character array as a buffer Where them we can place fields and field delimiters as we collect Representing record length One optionin form of 2-byte binary integer before each record (C) Can represent much bigger numbers with an integer than same number of ASCII byte Another optionConvert length into character string using formatted output. With C streams –fprintf() With C++ stream classes – overloaded insertion “<<“ operator Example: fprintf(file, “%d”, length);// C stream<< length << ‘ ‘; //C++ Above example inserts integers and a blank as delimiter Easy to store integers as fixed 2 bytes require Reading Variable-Length Records from the File Using classes to Manipulate Buffers C++ classes to encapsulate buffer pack, unpack, read and write operations of buffer objects For output: Starts with empty pack field values into the object one by one buffer contents to an output stream For input: Initialize a buffer object by reading a record from a input stream Extract the object’s field values one by one Classes 1. 2. 3. One for delimited fields One for length-based fields One for fixed length-fields Buffer Class for Delimited Text Fields DelimTextBuffer class Supports variable length buffer whose fields are represented as delimited text deltext.h –defiition Operations on buffers – constructors, read, write, field pack and unpack Example bellow packs the person(object) into the buffer and writes the buffer to a file int DelimTextBuffer :: Unpack(char *str) //extract the value of next field of the buffer { int len = -1; //length of packed string int start = NextByte; //first character to be unpacked for(int i = start; i<BufferSize; i++) { len = i – start; break; } if(len == -1) return FALSE; //delimiter not found NextByte += len + 1; if(NextByte >BufferSize ) return FALSE; strncpy(str, &Buffer[start], len); str[len] = 0; //zero termination for string return TRUE; } Buffer classes for Length-based and Fixed-length fields Change in the implementation of the Pack and Unpack methods of the delimited class Class definition almost same Full definition available in lentext.h and lentext.cpp Full definition available in lentext.h and lentext.cpp class LengthTextBuffer { public: LengthTextBuffer (int maxBytes = 1000); int Read( istream &file); int Write(ostream &file) const; int Pack(const char *field, int size = -1); int Unpack(char * field); private: char * Buffer; //character array to hold field values int BufferSize; //size of packed fields Int MaxBytes; //maximum number of characters in Buffer int NextByte; // packing/unpacking position in buffer }; Full definition available in fixtext.h and fixtext.cpp class FixedTextBuffer { public: FixedTextBuffer (int maxBytes = 1000); int AddField(int fieldSize); int Read( istream &file); int Write(ostream &file) const; int Pack(const char *field); int Unpack(char * field); private: char * Buffer; //character array to hold field values int BufferSize; //size of packed fields Int MaxBytes; //maximum number of characters in Buffer int NextByte; // packing/unpacking position in buffer int * FieldSizes; //array of field sizes }; Example: int Person :: InitBuffer (FixedTextBuffer &Buffer) //initialize a FixedTextBuffer to be used for Person Objects { Buffer.Init(6,61); Buffer.AddField(10); // LastName[11] Buffer.AddField(10); // FirstName[11] Buffer.AddField(15); // Address[16] Buffer.AddField(15); // City[16] Buffer.AddField(2); // State[3] Buffer.AddField(9); // ZipCode[10] return 1; } Record Access “Record is the quantity of information that is being read or written.” Record Keys: Convenient identifying record through a key Standard form of keys must be defined along with associated rules – called as “Canonical(conforming to the rules)” Canonical form is the single representation for that key suppose searching a record with name “Ames”- in different input forms “AMES”/ “ames” / “Ames” Canonical key example: key consists only uppercase letters, no blank spaces at the end Distinct To keys: keys that uniquely identify a single record. avoid/prevent confusion Unique canonical key = primary key Secondary key Primary key should be dataless, unchanging Sequential search: Reading a file record by record with particular key Evaluating Performance of Sequential Search Work required to sequentially search for a record in a file with n records is – proportional to n: takes at most n comparisons and average n/2 comparisons Improving Sequential Search Performance with Record Blocking Logical organization within the file As a performance measure Example: File with 4000 records Avg length of record is 512 bytes If OS uses sector sized buffers of 512 bytes Then unblocked sequential search needs avg. 2000 read calls With blocking Group of 16 records per block No of read comes at 8 kb’s worth of records Avg 125 search Unix tools for sequential processing Most common file structure in UNIX is ASCII file New line as record delimiter White space as field delimiter Simple and easy to process UNIX provides rich array of tools File structure- Inherently sequential Most of tools process sequentially Examples: cat: % cat myfile wc: % wc myfile grep : (generalized regular expression) % grep Ada myfile Direct Access Alternative to sequential access Direct access when- seek directly to the beginning of record and read it Sequential searches O(n)- Direct searches O(1) Get record in single seek IOBuffer class includes DRead (Direct read ) and Dwrite (Direct Write) Operations using byte address of record as reference Example: int IOBuffer:: DRead (istream & stream, int recref) // reads specified record { stream . seekg( recref, ios::beg); if(stream. tellg () != recref ) return -1; reaturn Read (stream); } Major issue is knowing where the beginning Info carried in separate index file Relative record number(RRN) -emerges from viewing a file as a sequence of records RRN of a record gives its position relative to beginning of file Can First record-RRN 0 Second –RRN 1 and so on tie a record to its RRN by assigning membership number Support direct access with RRN records with fixed size Records RRN to calculate the byte offset of the start of the record relative to the start of the file. Byte offset = n x r Example: Record with RRN -546 File with fixed-length record size of 128 bytes per record Byte offset = 546 x 128 = 69888 Record Structure Choosing a record Structure and Record Length: Fixed length of records Depends on the size of the fields in record Example: For building a file of sales transactions containing info: 1. Six digit account number of the purchaser 2. Six digits for the date field 3. Five- character stock number for item purchased 4. Three- digit field for quantity 5. Ten-position field for total cost Sum of fields is 30 bytes Suppose: To store a record on typical sectored disk with sector size 512bytes We might need to pad records to 32 bytes for integral number record. Two approaches: 1. Has virtue of simplicity: “break out” – fixed length fields in fixed length record 2. An averaging-out effect that usually occurs: fixed length record with variable length fields Combination of these both structures can be made Header records Necessary to keep track of some general information about file Header record placed at beginning of file to hold this information Some languages doesn't support easy way to jump end of file even with direct access. Simple solution to keep count of record somewhere else (with length of record, date & time of most recent update to file, so on ) Header Record help file to become self-describing object, freeing the s/w with all information in prior about the file Header record is with different structure than normal data record It contains Header size, number of records, and each record size File Access And File Organization Variable-length records Relate to aspect of file organization Fixed-length records Sequential access Relate to aspect of file access Direct access What have considered for the categories of file organization Can the file divided into fields? Is there a higher level of organization to the file that combines the fields into records? Do all the records have same number of bytes or fields? How do we distinguish one record from another? How do we organize the internal structure of a fixed-length record so we can distinguish between data and extra space? Many possible answers- choice of a file organization depends on: Many things File handling facilities of the language Use you want to make of the file Sequential access •Developing sequential search •Unknown about beginning of records Direct access •Fixed length record access •Allowing to calculate precisely record beginning •Seeking directly We can use both fixed and variable length records with direct access With variable length records we can simply keep a list of byte offsets from start of the file for placement of each record Abstract Data Models for File Access Common on computers primarily with magnetic tape, punched cards Memory space and programming languages was primitive Compelled to view File data exactly on tape or card- as sequence of fields and records Data processing meant processing fields and records in the traditional sense Gradually it is recognized computers can process images, sounds, documents except fields and records only. This type of information does fit in metaphor of data stored as sequences records divided in fields Envision data objects as documents, images, sounds “The notion that we need not view data as it appears on a particular medium is captures in the phrase abstract data models” Encourages an application-oriented view of data rather than medium-oriented Abstract data model- described- how an application views data rather than how might physically be stored One way is- keep information in file that file-access software can use to “understand” those objects. Example: put file structure information in a header Metadata Definition : “Metadata is data that describes the primary data in a file” A common place to store metadata in a file is the header record. Typically users of particular kind of data agrees on a standard format for holding metadata Example: FITS(Flexible Image Transport System) developed by International Astronomers’ Union ASCII headers are easy to read and process and since they occur only once, take up relatively little space. In which each record contains single piece of metadata Extensibility Mixed objects type file Identifying Example fields and records using – Keywords : keyword=value format Portability and Standardization Achieving Portability: Major problems: Differences among languages, OS, and machine architecture Achieving portability means determining how to deal with above issues Agree on a Standard Physical Record Format and Stay with it : 1. Physical standard: represented same physically; language, OS, and machine doesn't matter. Example: FITS(header record) Once standard is established-very tempting to improve on it If standard is sufficiently extensible – temptation can be avoided Way to sure standard has staying power- make it simple enough so files can be written in standard format from wide range of machines, OS and languages. Agree on a Standard Binary Encoding for Data Elements 2. Basic two data elements are text and Numbers Text- ASCII or EBCDIC represents most common encoding schemes Number- encoding schemes not large but Sharing data among machines – uses different binary encoding can be high IEEE has established standard format specification for- 32 bit , 64 bit and 128 bit floating point numbers 8 bit, 16 bit and 32 bit for integers XDR(External Data Representation Not only specifies standard encoding for all files But provides set of routines for each machine for converting from binary when writing to file and vice-versa Ex. : when we want to store numbers in XDR , we can read and write them by replacing read and write routines in our program with XDR routines. XDR routines take care of conversion Number and Text Conversion 3. Sometimes use of standard data encoding is not feasible Every time numbers or characters have to translate from one format to another Time consuming and possibility of loss of accuracy Continue.. To move files between two /more different platforms Example : IBM and VAX (which uses different native formats for numbers and ASCII for characters.) Solution : write or borrow a program that translates 1. Converting between IBM and VAX native format requires two conversion routines For many different platforms using different encodings Write a program to convert from each of the representation (for n –> n(n-1) translators) 2. Converting directly between five different native formats requires 20 conversion routines Better alternative 3. Agree on standard intermediate format Ex. : XDR Reduces translators from n(n-1) to 2n Converting five different native formats via an intermediate standard format requires 10 conversion routines 3. File Structure Conversion Conversion problems that apply to atomic data encoding also apply to file structure for more complex objects Like images Complex objects & their representation - Need specific applications 4. File System Differences Differences physical file organization Example: Unix writes files to tapes in 512-byte blocks-thirty – six80-byte record 5. Unix and Portability For block-size problem Unix provides a utility –dd dd- for coping tape data- can be used to convert data from any physical source.