HW4: It’s a small world after all
Format and Output: You should provide a Makefile. On running make,
it should create an executable \sixdegrees”. You should run the executable
with two command line arguments: the first is an input file, the second is the
output file. You must provide a README with an explanation of the usage
and a description of the files involved. Please cite any sources you used, such
as online code, code from a previous course (that you may have written), or
extensive discussions with someone.
All your files must be of the form *.c, *.cpp, *.h, *.hpp. When we grade, all
other code files will be deleted. (So do not try to script some part in another
Please read the following carefully, since it explains how the IMDB data is
There is a file cleaned movielist.txt that contains a snapshot of IMDB
Each line of this file is of the form <MOVIE> <ACTOR1> <ACTOR2> : : :.
To make parsing easy, the movie and actor names have underscores in them to
represent space. Thus, to parse a line, you simply tokenize by space. The first
token is the movie, all remaining tokens are actors. For example, here is a line
you may recognize.
Terminator 2: Judgment Day Arnold Schwarzenegger Linda Hamilton Edward Furlong Robert Patrick
This file is the \data file” and is not provided as an input to your program.
It is a fixed file.
Each of line of the input file is of the following form:
For each such line, the output file must contain a shortest path between the
actors, formatted exactly as follows.
• If either <ACTOR1> or <ACTOR2> are not present in the data (meaning there is no movie with them), output \Not present”.
• If <ACTOR1> is the same as <ACTOR2>, just print <ACTOR1>.
• (The real case) Print a shortest path between the actors, with the movie
connecting adjacent actors. For example:
Frank Sinatra Jr. -(Do It in the Dirt)- Suzan Averitt -(Rebel Dabble Babble)-
James Franco -(Love Conquers All: The Making of Tristan + Isolde)- Jim Lemley
-(Through the Eyes of Director Timur Bekmambetov)- Thomas Kretschmann
-(Prince Valiant)- Katherine Heigl
Pay attention to the space and parentheses format, and follow it exactly.
The line has: <ACTOR1> -(<MOVIE>)- <NEXT ACTOR> -(: : :. So
Frank Sinatra acted with Suzan Averitt in \Do It in the Dirt”, and Suzan
Averitt acted with James Franco in \Love Conquers All: The Making of
Your output may be different, because there could be multiple paths with
different movies. You only need to provide one movie for the link between two
successive actors. Note that your output path must have the same length as my
For more clarity, look at the test input/output files.
Data structure suggestions: In one word, graphs. In one word and one
acronym, graphs and BFS.
How to represent your graph? Do whatever you want. Think about adjacency lists, and feel free to store neighborhoods in unordered map data structures.
The test cases:
• simple-input.txt, simple-output.txt: A simple test case with an output of
\Not present” and a path with a single vertex. Your output must exactly
be the same.
• more-input.txt, more-output.txt: Ummm…more input and output.
You code should terminate within two minutes for any input file with at most
ten inputs (the BFS does take a bit of time).
1. (10 points) I’ll throw some more test inputs.
2. (5 points) If the output is correct on simple-input.txt and more-input.txt,
you get five points.