When executed with a mode of 1, your compiler should read the specified input file and divide it into tokens;
essentially this is the lexical analysis phase of the compiler (referred to as the “lexer”). The output stream
should contain a line for each token, showing the file name, line number, and text corresponding to the
token. More details (including the precise output and error formats, and the definitions for the tokens) are
given below. The input file may be a C header file, a C source file, or an arbitrary text file. There are
opportunities for extra credit.
All implementation must be C, C++, or Java. Instructor permission is required to use anything beyond
the standard libraries for these languages. Submissions will be graded on pyrite.cs.iastate.edu, and
therefore must build and run correctly there.
Students are strongly encouraged to encapsulate the functionality of this phase of the compiler, so that
later parts of the project can easily examine and consume tokens from an input stream.
2 Tokens in our subset of C
Your lexer must recognize the tokens given in Table 1. A useful list of integer constants for tokens may be
found in tokens.h, distributed with the materials for this part of the project. Note that single-character
symbols use their ASCII codes; all other tokens have integer values above 300. Whitespace (spaces, tabs,
and carriage returns) serves only to separate tokens, and should otherwise be discarded. Any characters that
are not part of a lexeme, such as $, should generate an error message.
You may notice that some of the C keywords and operators are missing. Students are welcome to
implement additional language features if desired, but any extra keywords or operators must be part ot the
2.1 C style comments
C-style comments, beginning with /* and ending with */, should be discarded and treated the same as a
space. A comment with no matching */ is closed at the end of file, and an error message (indicating the
starting point of the comment) should be displayed for this case. The formal rule for C-style comments is:
When not currently inside a comment or string literal, the text /* indicates the start of a
comment, and all text is ignored until the next */ or the end of file.
2.2 C++ style comments
C++-style comments, beginning with // and ending with a newline or end of file, should be discarded and
treated the same as a newline character. No error message is necessary if the comment is terminated by the
end of file. The formal rule for C++-style comments is: