



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
About systems of data science. Analysis of bog data papers
Typology: Papers
1 / 7
This page cannot be seen from the preview
Don't miss anything!




Now please answer the following questions: (1) [12 Points] Table sizes of paperauths Before data cleaning, how many tuples are there in the paperauths table? Ans: 8998548 After data cleaning, how many tuples are there in the paperauths table? Ans: 8997872 (2) [8 Points] Indexes created for the DBLP database Show the content after typing the PostgreSQL \di, \d authors and \d paperauths command. ksarode=> \di; List of relations Schema | Name | Type | Owner | Table --------+---------------+-------+---------+------------ public | authors_index | index | ksarode | authors public | authors_pkey | index | ksarode | authors public | pa_index | index | ksarode | paperauths public | papers_pkey | index | ksarode | papers public | venue_pkey | index | ksarode | venue ksarode=> \d authors; Table "public.authors" Column | Type | Collation | Nullable | Default --------+------------------------+-----------+----------+--------- id | integer | | not null |
name | character varying(200) | | | Indexes: "authors_pkey" PRIMARY KEY, btree (id) "authors_index" btree (name) Referenced by: TABLE "paperauths" CONSTRAINT "fk_auth" FOREIGN KEY (authid) REFERENCES authors(id) ksarode=> \d paperauths; Table "public.paperauths" Column | Type | Collation | Nullable | Default ---------+---------+-----------+----------+--------- paperid | integer | | | authid | integer | | | Indexes: "pa_index" btree (paperid, authid) Foreign-key constraints: "fk_auth" FOREIGN KEY (authid) REFERENCES authors(id) "fk_paper" FOREIGN KEY (paperid) REFERENCES papers(id)
(1) [6 Points] Suppose that we have the following three tuples in a legal instance of a relation schema S with three attributes ABC: Which of the following dependencies does not hold over schema S?
(3) How many records fit in a page?
(4) How many records fit in a cylinder?
(5) If the file is arranged sequentially on the disk, how many sectors are needed? How many pages are needed? And how many cylinders are needed? ( ASSUMING THAT THE FILE IS ARRANGED SEQUENCIALLY SECTOR WISE ) (a) Num. of sectors: 5,00, (b) Num. of pages: 62, (c) Num. of cylinders: 397 (6) How much time is needed to read this file sequentially? (a) Seek time: Seek time is the time required to move the head to a track. We require total 500000 sectors in total. Hence number of tracks required will be 5,00,000/63=7937 tracks. So seek time will be 9 + 7936*1= 7972 msec (b) Rotational delay: Rotational delay is the time required for half a rotation on avg. The rotation time of the disk is 7200rpm. Hence the rotation time for half rotation will be 4.1 msec (c) Transfer time:
Transfer time is the time taken to read or write data in a block(page). T = b/rN where b is number of bytes to transfer, r is rotational speed in rev/sec and N is number of bytes on a track. We know that r= rev/sec and N=32256. We have 62,500 pages in total. Hence transfer time will be 62,5004096/(12032256)= 66.13 sec (d) Total: Seektime+rotational delay+ transfer time = 7972+4.1+66130= 76106. msec (7) How much time is needed to read 50% of the pages in the file randomly , that is, one random I/O per page as observed through index lookups? (a) Cost per page: Seek time= 9ms Rotational delay=4.1msec Transfer time= For a single page, b=4096. Hence transfer time will be 4096/(120*32256)= 1.05 msec Total time= 9+4.1+1.05 = 14.15msec (b) Total cost: Seek time: There are total 62,500 pages. So 50% will be 31,250 random pages. Total seek time to access random pages will be
(1) [10 points] Show the results of entering the keys 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 (in that order) to an initially empty B+ tree. Assume that every non-leaf node can hold up to 3 index entries and every leaf node can hold up to 3 data entries. In case of overflow, split the node (do not re-distribute keys to neighbors). (2) [10 points] Now demonstrate a different insertion order that leads to a tree of different depth than the one in Part (1).