Big data and data processing, Papers of Data Structures and Algorithms

About systems of data science. Analysis of bog data papers

Typology: Papers

2021/2022

Uploaded on 03/23/2022

krutika-sarode
krutika-sarode 🇺🇸

1 document

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CMPSCI645 Database Design and Implementation Assignment 2
(100 Points)
Now please answer the following questions:
(1) [12 Points] Table sizes of paperauths
Before data cleaning, how many tuples are there in the paperauths table? Ans: 8998548
After data cleaning, how many tuples are there in the paperauths table? Ans: 8997872
(2) [8 Points] Indexes created for the DBLP database
Show the content after typing the PostgreSQL \di, \d authors and \d paperauths command.
ksarode=> \di;
List of relations
Schema | Name | Type | Owner | Table
--------+---------------+-------+---------+------------
public | authors_index | index | ksarode | authors
public | authors_pkey | index | ksarode | authors
public | pa_index | index | ksarode | paperauths
public | papers_pkey | index | ksarode | papers
public | venue_pkey | index | ksarode | venue
ksarode=> \d authors;
Table "public.authors"
Column | Type | Collation | Nullable | Default
--------+------------------------+-----------+----------+---------
id | integer | | not null |
pf3
pf4
pf5

Partial preview of the text

Download Big data and data processing and more Papers Data Structures and Algorithms in PDF only on Docsity!

CMPSCI645 Database Design and Implementation Assignment 2

(100 Points)

Now please answer the following questions: (1) [12 Points] Table sizes of paperauths Before data cleaning, how many tuples are there in the paperauths table? Ans: 8998548 After data cleaning, how many tuples are there in the paperauths table? Ans: 8997872 (2) [8 Points] Indexes created for the DBLP database Show the content after typing the PostgreSQL \di, \d authors and \d paperauths command. ksarode=> \di; List of relations Schema | Name | Type | Owner | Table --------+---------------+-------+---------+------------ public | authors_index | index | ksarode | authors public | authors_pkey | index | ksarode | authors public | pa_index | index | ksarode | paperauths public | papers_pkey | index | ksarode | papers public | venue_pkey | index | ksarode | venue ksarode=> \d authors; Table "public.authors" Column | Type | Collation | Nullable | Default --------+------------------------+-----------+----------+--------- id | integer | | not null |

name | character varying(200) | | | Indexes: "authors_pkey" PRIMARY KEY, btree (id) "authors_index" btree (name) Referenced by: TABLE "paperauths" CONSTRAINT "fk_auth" FOREIGN KEY (authid) REFERENCES authors(id) ksarode=> \d paperauths; Table "public.paperauths" Column | Type | Collation | Nullable | Default ---------+---------+-----------+----------+--------- paperid | integer | | | authid | integer | | | Indexes: "pa_index" btree (paperid, authid) Foreign-key constraints: "fk_auth" FOREIGN KEY (authid) REFERENCES authors(id) "fk_paper" FOREIGN KEY (paperid) REFERENCES papers(id)

Question 2 [18 points]: Functional Dependencies and Normalization

(1) [6 Points] Suppose that we have the following three tuples in a legal instance of a relation schema S with three attributes ABC: Which of the following dependencies does not hold over schema S?

(3) How many records fit in a page?

(4) How many records fit in a cylinder?

(5) If the file is arranged sequentially on the disk, how many sectors are needed? How many pages are needed? And how many cylinders are needed? ( ASSUMING THAT THE FILE IS ARRANGED SEQUENCIALLY SECTOR WISE ) (a) Num. of sectors: 5,00, (b) Num. of pages: 62, (c) Num. of cylinders: 397 (6) How much time is needed to read this file sequentially? (a) Seek time: Seek time is the time required to move the head to a track. We require total 500000 sectors in total. Hence number of tracks required will be 5,00,000/63=7937 tracks. So seek time will be 9 + 7936*1= 7972 msec (b) Rotational delay: Rotational delay is the time required for half a rotation on avg. The rotation time of the disk is 7200rpm. Hence the rotation time for half rotation will be 4.1 msec (c) Transfer time:

Transfer time is the time taken to read or write data in a block(page). T = b/rN where b is number of bytes to transfer, r is rotational speed in rev/sec and N is number of bytes on a track. We know that r= rev/sec and N=32256. We have 62,500 pages in total. Hence transfer time will be 62,5004096/(12032256)= 66.13 sec (d) Total: Seektime+rotational delay+ transfer time = 7972+4.1+66130= 76106. msec (7) How much time is needed to read 50% of the pages in the file randomly , that is, one random I/O per page as observed through index lookups? (a) Cost per page: Seek time= 9ms Rotational delay=4.1msec Transfer time= For a single page, b=4096. Hence transfer time will be 4096/(120*32256)= 1.05 msec Total time= 9+4.1+1.05 = 14.15msec (b) Total cost: Seek time: There are total 62,500 pages. So 50% will be 31,250 random pages. Total seek time to access random pages will be

(1) [10 points] Show the results of entering the keys 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 (in that order) to an initially empty B+ tree. Assume that every non-leaf node can hold up to 3 index entries and every leaf node can hold up to 3 data entries. In case of overflow, split the node (do not re-distribute keys to neighbors). (2) [10 points] Now demonstrate a different insertion order that leads to a tree of different depth than the one in Part (1).