1 Assignment Overview
This assignment involves writing a command line application to process streams
of text from a file. The program will read lines of text from a given file, compute
the frequency of words of certain length from the file and print these frequencies to
The overall goal of this assignment is to introduce C programming in a unix setting,
with particular emphasis on C array and string processing. Your eventual submission
will consist of the source files for the program, accompanied by test cases. There
are three parts to this assignment and Sections 1.2, 1.3 and 1.4 below contain
Specifications for each of the three programs. Section 3 describe the Constraints
you should consider in your implementation, and Section 2 describes the Testing
component. What you should Submit is outlined in Section 4 and the Evaluation
scheme is given in Section 5.
Your code is expected to compile without warnings in Senjhalla (the virtualbox+vagrant virtual machine) using the -Wall -g and -std=c99 compile flags with
the gcc 9.3.0 compiler.
Because a later assignment will test your knowledge of dynamic memory allocation,
you are asked to not use dynamic memory allocation for this assignment,
as it is useful to learn how to rely exclusively on automatic allocation.
1.1 Assignment Package
These instructions assume you have completed Lab 2 and you have cloned your git
repository to the home directory in your Senjhalla virtual machine. If you have not
yet done so, refer to the Lab 2 video and slides.
Download the assign1.zip package from Connex and unzip it into the a1 folder
of your git repository. The folder structure is shown in Fig. 1.
Add your code to the src files provided with your solution to parts A, B, and C. You
may create additional files in src but do not rename word_count.c. A suggested
template has been provided for you in word_count.h.
cases hold the test files for parts A, B, and C. The input files have the format
t*.txt and the corresponding expected output format is c*.txt. For example, the
expected output for a solution to Part B, using input t04.txt, will be found in
Figure 1: Assignment package
The test folder contains the framework that will be used for evaluation. See section
5 for details on how to run the tests. You may add your own tests to the existing
tests, but they will not be used in your evaluation.
1.2 Part A. Frequency of words of all lengths
The first part of the assignment is to write a C program, contained in the source
files called word_count.c and word_count.h, which counts the number of words
of all lengths. You have been provided skeleton template files in the assignment
The program must compile, with no warnings, and run using the following commands:
$ gcc -Wall -std=c99 -o word_count word_count.c
$ ./word_count –infile <input_file>
After compiling, a correct implementation will take the name of a word list file as
a command line argument and output the frequency of words of all lengths in that
file, e.g in the form of a function Count[arg] where arg is the length of the word.
For example, consider the following as input file input_file.txt:
Tomorrow, and tomorrow, and tomorrow,
To the last syllable of recorded time;
In the case of a tie (as shown above) then you do a secondary sort based on the
word length bucket/bin value. As in the example, Count is a smaller word length
than Count so it is sorted above the longer world length.
Tests associated with this part are located in cases/B and test/test_B.h.
1.4 Part C. SORTED Frequency of words of all lengths with Words
The third part of the assignment adds in the option to display the unique words
found for each word length in alphanumeric order. Add an additional optional argument to your Part A & B code (i.e. do not create a new C source file), that will
be run as shown. You cannot assume that the arguments will be run in this order.
$ ./word_count –sort –print-words –infile <input_file>
For example, for the same input file as above, the output should be:
Count=05; (words: ”Tomorrow”, ”recorded”, ”syllable” and ”tomorrow”)
Count=03; (words: ”and” and ”the”)
Count=02; (words: ”To”, ”of” and ”to”)
Count=02; (words: ”last” and ”time”)
Tests associated with this part are located in cases/C and test/test_C.h.
2 Test Inputs
You should test all of your programs with a variety of test inputs, covering as
many different use cases as possible, not just the test input provided. You should
ensure that your programs handle error cases (such as files which do not exist)
appropriately and do not produce errors on valid inputs. Since thorough testing is
an integral part of the software engineering process, you will be expected to submit
one test input.
You have been provided a set of 10 test input files (t01.txt to t10.txt) and 21
(c01.txt to c07.txt) expected output files (7 for each part) located under the cases
folder. Your code also needs to be able to handle basic user error. See test/test_input.h
for expected behaviour when handling incorrect command-line arguments. Do not
use exit() to exit the program.
You have also been provided a testing framework that allows you to run these tests
using a makefile.