Question 1 (40 points, 4 points each part)
1: Will splitting and merging data entry pages in B+tree change record ids? Discuss this for the three alternatives for storing data entries.
2: For page format for variable length records, describe the steps for deleting a record on a page
3: repeat Q2 for inserting a new record into a given page.
4: For Heap File with a page directory, assume that a page can contain up to 100 directory entries and a page can contain 1000 records, for a file of 1,000,000 records, what is the number of pages to store these records, assuming that on average data pages are filled up 70%? What is the number of pages to store the directory, assuming that each directory page is full?
5: In Q4, assuming both the directory and the file are stored on disk, what is the I/O cost to insert one record in worst case, and in the average case?
6: After finding the data entry k* using B+tree, what is the I/O cost for retrieving the corresponding data records of the data entry? Assume each page can contain 1000 data records. Discuss the answer for the three alternatives for storing data entries.
7: How many sparse indexes can a file have? How many clustered indexes can a file have? How many Alternative (1) indexes can a file have? Is a sparse index always clustered?
8: For ISAM, the data entries in a primary page and associated overflow pages are not sorted, instead, are inserted in arbitrary order. Explain why we do not sort data entries k* by k value in overflow pages, even though sorting can make search for a data entry k* much faster?
9: For inserting a data entry into B+tree, explain why the middle key is copied up when splitting a leaf page whereas the middle key is pushed up when splitting a non-leaf page. What differences does this cause when merging two pages for deleting a data entry from B+tree?
10: A typical disk page size is 1KB, 2KB, 4KB, 8KB, etc. Discuss the impact of having a large disk page size vs a small disk page size on the time of I/O. How do you choose a proper disk page size?
Question 2: (24 points, 8 points each part). Consider a dense B+tree index with the search key on Salary for 1,000,000 records stored in 1000 disk pages. The data entries use Alternative (2) format and each data entry has the size equal to 25% of the data record size.
(1) What is the number of pages required for storing all data entries?
(2) If the B+tree is unclustered and if 1% of all records satisfy the condition 100K ≤ Salary ≤ 200K, what is the number of I/O pages required to retrieve the satisfying records using the B+tree. You can assume the height of the B-tree is h.
(3) Repeat (2) for a clustered B+tree.
Question 3 (20 points, 10 point each part) Consider the linear hash structure below.
(1) If we want to find the data records 6* (binary 110), explain the steps required, what is the I/O cost of this operation? (2) Explain the steps required for inserting the data entry 10* (binary 1010). What is the I/O cost of this operation?
(For Q3 of Assignment 1, Next=-0, you can remove =-0. It should be just Next with the arrow pointing to bucket 11