Geographic Information System
You will implement a system that indexes and provides search features for a file of GIS
records, as described above.
Your system will build and maintain several in-memory index data structures to support
•Importing new GIS records into the database file
•Retrieving data for all GIS records matching given geographic coordinates
•Retrieving data for all GIS records matching a given feature name and state
Retrieving data for all GIS records that fall within a given (rectangular) geographic
•Displaying the in-memory indices in a human-readable manner
You will implement a single software system in C++ to perform all system functions.
The program will take the names of three files from the command line, like this:
./GIS <database file name> <command script file name> <log file name>
Note that this implies your main class must be named GIS, and be able to be compiled
simply using a g++ compile command. Preferably, you are encouraged to create make files
for the project and provide the required dependency files in your submission.
Thedatabase file shouldbe createdas anempty file; note thatthe specifieddatabase filemay
already exist, in which case the existing file should be truncated or deleted and recreated.
If the command script file is not found the program should write an error message to the
console and exit. The log file should be rewritten every time the program is run, so if the file
already exists it should be truncated or deleted and recreated.
The system will create and maintain a GIS database file that contains all the records that
are imported as the program runs. The GIS database file will be empty initially. All the
indexing of records will be done relative to this file.
There is no guarantee that the GIS record file will not contain two or more distinct records
that have the same geographic coordinates. In fact, this is natural since the coordinates are
expressedintheusualDMSsystem. So,wecannottreat geographic coordinates as aprimary
TheGISrecordswillbe indexedbytheFeature NameandState(abbreviation)fields.This
name index will support finding offsets of GIS records that match a given feature name and
The GIS records will also be indexed by geographic coordinate. This coordinate index will
support finding offsets of GIS records that match a given primary latitude and primary
The system will include a buffer pool, as a front end for the GIS database file, to improve
search speed. See the discussion of the buffer pool below for detailed requirements. When
performing searches, retrieving a GIS record from the database file must be managed through
the buffer pool. During an import operation, when records are written to the database file,
Whensearches areperformed, completeGISrecordswillbe retrievedfromtheGISdatabase
file that your program maintains. The only complete GIS records that are stored in memory
at any time are those that have just been retrieved to satisfy the current search, or individual
GIS records created while importing data or GIS records stored in the buffer pool.
Aside fromwhere specificdata structures are required, youmayuse any suitableSTLlibrary
implementation you like (but not other libraries).
Each index should have the ability to write a nicely-formatted display of itself to an output
Name Index Internals
The name index will use a hash table for its physical organization2
. Each hash table entry
will store a feature name and state abbreviation (separately or concatenated, as you like)
and the file offset(s) of the matching record(s). Since each GIS record occupies one line in
the file, it is a trivial matter to locate and read a record given nothing but the file offset at
which the record begins.
Your table will use quadratic probing to resolve collisions, with the quadratic function (n
to compute the step size. The hash table must use a contiguous physical structure (array).
The initial size of the table will be 1024, and the table will resize itself automatically, by
doubling its size whenever the table becomes 70% full. Since the specified table sizes given
above are powers of 2, an empty slot will always be found unless the table is full.
You can use your desired hash function (e.g. elfhash), and apply it to the concatenation of
the feature name and state abbreviation field of the data records.
You must be able to display the contents of the hash table in a readable manner.
Coordinate Index Internals
The coordinate indexwilluse a bucket PRquadtree3
for thephysical organization. Inabucket
PRquadtree, eachleaf storesuptoKdata objects (for some fixedvalueofK).Uponinsertion,
if the added value would fall into a leaf that is already full, then the region corresponding
to the leaf will be partitioned into quadrants and the K+1 data objects will be inserted into
those quadrants as appropriate. As is the case with the regular PR quadtree, this may lead
toasequenceofpartitioningsteps, extending therelevantbranchofthequadtreebymultiple
levels. In this project, K will probably equal 4, but I reserve the right to specify a different
bucket size with little notice, so this should be easy to modify.
The index entries held in the quadtree will store a geographic coordinate and a collection of
the file offsets of the matching GIS records in the database file.
Note: do not confuse the bucket size with any limit on the number of GIS records that
may be associated with a single geographic coordinate. A quadtree node can contain index
objects for up to K different geographic coordinates. Each such index object can contain
references to an unlimited number of different GIS records.
The PR quadtree implementation should follow good design practices, and its interface should
be somewhat similar to that of the BST.
Youmust be able to display the PR quadtree in a readable manner. The display must clearly
indicate the structure of the tree, the relationships between its nodes, and the data objects
in the leaf nodes (think of this problem as an in-order traversal of tree with four-children).