Data Structures: Complete reference for IP University, Lecture notes of Data Structures and Algorithms

Complete Data Structures Lectures in PPTs, best to read before exam to revise everything

Typology: Lecture notes

2020/2021

Uploaded on 02/25/2021

snbhardwaj1994
snbhardwaj1994 🇮🇳

1 document

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
4/11/2015MCA-102, Data and File Structures
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.1
DATA STRUCTURE
FILES
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.1
UNIT IV
Learning Objectives
Files
Sequential File Organization
Buffering
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.2
Buffering
Handling Sequential Files in C
External Sorting
Sequential File Organization
ISAM is the most popular sequential file organization
Cylinder surface index is maintained for primary key.
Makes search based on PK efficient
Search
based
on
other
attributes
require
of
an
alternate
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.3
Search
based
on
other
attributes
require
of
an
alternate
indexing technique
Insertion, Deletion are time consuming
Batch processes and Range queries are executed efficiently
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Data Structures: Complete reference for IP University and more Lecture notes Data Structures and Algorithms in PDF only on Docsity!

DATA STRUCTURE

FILES

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

UNIT IV

Learning Objectives

 Files

Sequential File Organization

Buffering

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

Buffering

Handling Sequential Files in C

 External Sorting

Sequential File Organization

  • ISAM is the most popular sequential file organization
    • Cylinder surface index is maintained for primary key.
  • Makes search based on PK efficient
  • Search based on other attributes require use of an alternate

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

  • Search based on other attributes require use of an alternate indexing technique
  • Insertion, Deletion are time consuming
  • Batch processes and Range queries are executed efficiently

Sequential Files

  • Sequential files are files where the order of the records in the file is based on the order the records are placed in the file (that is, in arrival sequence)
  • The order of the records is fixed.

R d i th fil l b d itt

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

  • Records in these files can only be read or written sequentially i.e. to read N th^ record we must first access N- records.

Sequential File Organization

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

Sequential Files

  • One common storage medium used for sequential files is Magnetic Tape.
  • Here data is recorded digitally as magnetized spots in the film coating.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

  • Positive magnetization may represent 1-bit and negative magnetization may represent a 0-bit or vice-versa.
  • The magnetized areas are not randomly located on the medium but are arranged in tracks parallel to the edge of the tape.

Data Representation

This is how .byte 11 (in MIPS) is stores in memory: 0000 0000 0000 0000 0000 0000 0000 1011

Big Endian: (^) Little Endian:

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

(e.g. Intel)

(e.g. SPARC)

Data Representation

We need a way to map: data -> binary
Data Types:
  • Number  Integer

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

g  Signed/Unsigned  Real

  • Char
  • Other (Picture, etc.)

Data Representation: Integer

Integers:
A decimal example: 2734 =

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

A binary example: [01011] 2 =
0 * 24 + 1 * 23 + 0 * 22 + 1 * 21 + 1 * 20 = [11] 10

Data Representation: Unsigned

Unsigned Unsigned integers are positive (or to be more precise, do not have a sign).

Size Limit (low, high) 8 bits 0 255 (255 = 2^8 - 1)

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

8 bits 0, 255 (255 2 1) 16 bits 0, 64k (64k = 2^16 - 1) … … N bits 0, 2N^ – 1

Signed Represented in 2’s complement form

Data Representation: Real Nos.

  • Real Numbers  There are many standards to represent real numbers. E.g. the IEEE 754 standard.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

Sign 8-bits Exponent

23-bits Mantissa

Data Representation: Real

  • The first bit is the sign,
  • The next 8-bit portion is the Biased Exponent (the exponent of 2 plus 127 to ensure we get a positive number)
  • The last 23-bit portion is Mantisa.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

p

  • Note that there are 2^24 digits in the fraction part (including the 1 normalized form that is not stored in memory) which represents 6 digits.  This is the number of significant figures in a floating point number.

•Most CPUs have 6-7 significant figures

Parity Checking

Suppose the sender wants to send the word world. In ASCII
the five characters are coded as
The following shows the actual bits sent

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

Parity Checking

Two-dimensional parity

In two-dimensional parity check, a block of bits is divided into rows and a redundant row of bits is added to the whole block.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

2D Parity Checking

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

Other Techniques

  • Other techniques for error control include:  Arithmetic Checksums  CRC Checks

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

Blocks

  • Data are read or written to a tape in groups of characters called blocks.
  • A block is the smallest amount of data that can be transferred between secondary memory and primary memory in one access.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

  • A block may contain one or more records.
  • A block is sometimes referred to as a physical record.
  • Between each pair of blocks, there is a space or gap termed as interblock gap.

Buffer

  • A buffer is a region of a physical memory storage used to temporarily hold data while it is being moved from one place to another.
  • Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

  • A buffer may also be used when moving data between processes within a computer.

File Open Modes

Mode Operations Allowed

Action

r Read Return NULL if file doesn’t exist

w Write Create if file doesn’t exist. Destroy file contents if it exists a Write at end Create if file doesn’t exist

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

a Write at end Create if file doesn t exist. Retain old contents r+ Read Write Return NULL if file doesn’t exist w+ Read Write Create if file doesn’t exist. Destroy file contents if it exists a+ Read, Write-at-end

Create if file doesn’t exist. Retain old contents

Insertion in Files

  • Writing of new records will be done in a sequential manner at the end of the file.
  • Content initially stored will be

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

  • Content initially stored will be overwritten
  • If we want to insert a new record at i th position then we’ll have to copy all records (i‐n) to a temporary file.

Functions for Reading from a File

int fgetc(FILE *stream)

int fgets(char *s, int n, FILE

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

int fscanf(FILE *stream, const char *format, ...)

int fgets(char s, int n, FILE *stream)

size_t fread(const void *ptr, size_t size, size_t nobj, FILE *stream)

Functions for Writing to a File

int fputc(int c, FILE *stream)

int fputs(const char *s, FILE *stream)

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

int fprintf(FILE *stream, const char *format, ...)

int fputs(const char s, FILE stream)

size_t fwrite(const void *ptr, size_t size, size_t nobj, FILE *stream)

Repositioning the Read-Write Locator

Repositions the file pointer on a stream: int fseek (FILE fp, long offset, int origin);*  offset is the number of bytes to move the position indicator  origin says where to move from

Three options/constants are defined for origin

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

 SEEK_SET

move the indicator offset bytes from the beginning  SEEK_CUR move the indicator offset bytes from its current position  SEEK_END move the indicator offset bytes from the end

Retrieve the Position of Read Write Locator

long ftell (FILE * fp) ;

  • Returns the current value of the file position indicator, i.e. the number of bytes from the start of the file (starts at zero)

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

  • To determine where the position indicator is use: long pos= ftell (fp) ; Returns a long giving the current position in bytes. The first byte of the file is byte 0. If an error occurs, ftell () returns -1.

Objectives

 External Sorting Techniques  K-way Merge Sort

Balanced Merge Sort with 2*K Tapes
Balanced Merge Sort with K+
Tapes

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

 Poly-Phase Merge Sort

External Sorting

Need  Entire data to be sorted might not fit in the available internal memory

Considerations  When data resides in internal memory

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

When data resides in internal memory Data access time << Computation time Need to reduce the number of CPU operations  When data resides on external storage devices Data access time >> Computation time Need to reduce disk accesses

Algorithms

  • Merge sort
  • Multi-way / k-way merge sort  Balanced  Poly-phase

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

General Approach

  • Divide data into smaller segments that can fit into internal memory
  • Sort them internally
  • Write the sorted segments (called runs ) to secondary

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

g ( ) y storage

  • Merge the runs together to get runs of larger size
  • Continue until a single run is left

Contd...

Assumptions

 There are N records on the disk  It is possible to sort M records using internal sort (at a time)

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

External- Merge Sort

Sort process

 Create N/M sorted runs, reading M records at a time  Set aside 3 blocks of internal memory each capable of holding M/3 records  First two blocks act as input buffers  Third acts as output buffer

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

 Third acts as output buffer  Merge runs {R1, R2}; {R3, R4} to get N/2M runs of size 2M each  Continue merging till a single run of size N is not obtained

Also called 2-way merge sort

After Pass I

After Pass IV

T1 2 3 5 6 7 11 12 15 22 23 31 40 45 50 78 90
T
T

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

T

2-Way Merge (with Tape Drives)

Assumptions

 Available number of tape drives: 4 (2*2)  Say the tapes are named U, V, W, X  All the data is initially on tape U  Internal memory can sort M records at a time T t l b f d i N

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

 Total number of records is N Depending upon the pass number the pair (U,V) or (W,X) can act either as a set of input tapes or output tapes

2-Way Merge (with Tape Drives)

Read M records from U Sort them internally and Write them alternately to W/X Do Merge Ith run from W with Ith run on X; Write to U Merge (I+1)th run from W with (I+1)th run on X; Write to V Continue till all runs are not processed

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

Result: N/2M runs of length 2M each, placed alternately on tapes W & X W & X become the input tapes U & V become the output tapes Repeat the merge process till you don’t get a single run of length N

Say total number of passes is P
M * (2 * 2 * 2 … P times) = N
M * 2P^ = N

Contd...

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

M 2 = N
Log 2 (N/M) = P

K-way Merge Sort using 2*K tapes

Motivation Need for better performance

Strategy Increase the number of runs that are merged at a time Say, K runs are merged at a time

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.

Requirement According to the above algorithm we require K input tapes and K output tapes => 2*K number of tape drives

Number of passes Log (^) k (N/M) = P

Illustration: 3-Way Merge using 6 tapes

Requirement 2*3 Tapes Say, M=

T1 2 12 3 5 23 6 50 7 31 90 22 11 15 78 45 40 T

T T

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh (^) U4.

T T

After Pass I

  • T
  • T
  • T3
  • © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.
    • T4
    • T1 After Pass II
    • T2
    • T
  • © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.
    • T
    • T After Pass III
    • T
    • T3
  • © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.
    • T4
    • T
    • T
    • T
    • T4 - Runs of size M (3) distributed on tapes T4, T5 and T
  • © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.
    • T5
    • T6
    • T1 After Pass II
      • Action : Corresponding runs from the three input tapes T4, T
      • Result : Runs of size 3*3 = and T6 merged and placed on T1, T2 and T3 respectively.
  • © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.
    • T2
    • T
    • T
    • T
    • T
    • T After Pass III
      • Action : Corresponding runs from the three input tapes T1, T
      • Result : Runs of size 3*6 = and T3 merged and placed on T4, T5 and T6 respectively.
  • © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Shalini Singh U4.
    • T
    • T
    • T4
    • T
    • T