Best代写-最专业靠谱代写IT | CS | 留学生作业 | 编程代写Java | Python |C/C++ | PHP | Matlab | Assignment Project Homework代写

C语言代写 | COMP10002 Foundations of Algorithms

C语言代写 | COMP10002 Foundations of Algorithms


2 The Story…
Text analysis and understanding is becoming prevalent in our daily lives. For example, intelligent personal
assistant apps such as Apple Siri and Google Now process user requests by rst converting voice to text with
a speech recognition algorithm, and then analysing the text and nding answers to the requests. Finally,
the answers found are read out by a text to speech (TTS) algorithm.
To help computers understand a sentence, there are a few standard preprocessing steps. Two preprocessing
steps of interest in this assignment are called stemming and Part-Of-Speech (POS) tagging.
Stemming reduces a word to its stem (that is, the root form). See the following example:
Sentence: After stemming:
she she
sells sell
seashells seashell
Here, \sells” is the third-person present form of \sell”; \seashells” is the plural form of \seashell”.
After stemming, the two words are reduced to their respective root forms. The word \she” is already in its
root form, and hence is unchanged.
POS tagging assigns parts of speech to each word, such as noun, verb, etc. See the following example:
Sentence: POS tag
she pronoun
sells verb
seashells noun
There are various algorithms for stemming and POS tagging. The core of those algorithms are linguistic
rules (such as \remove the ending `s’ of a plural noun to obtain its singular form”) and statistics (such as
how often \cook” is used as a verb instead of a noun). Dictionaries are also used to support those algorithms.
In this assignment, you will implement an algorithm for stemming and POS tagging using a dictionary.
Note that you do not need to have any knowledge in linguistics to complete this assignment.

3 Your Task
You are given a dictionary (with at least one and up to 100 unique words) and a sentence to be
processed in the following format.
vt vi n
n pron
she sells seashells
The input starts with a list of unique dictionary words sorted alphabetically. Each word takes three lines.
Line 1 starts with `$’, which indicates the start of a word and is followed by the word itself (e.g., \sell”).
There are up to 22 lower-case English letters in each word, with no upper-case letters or special characters.
Line 2 contains the possible POS tags of the word separated by space (e.g., \vt”, \vi”, and \n” represent
verb used with object, verb used without object, and noun, respectively). You may assume up to 5 POS
tags per word and up to 4 lower-case English letters per POS tag. (Hint: You may use either a single
string or an array of strings to store the POS tags { the former is simpler.)
Line 3 starts with `#’, which indicates the ending line of a word in the dictionary. It is followed by the
variation forms of the word. Each word can have up to 4 forms where each variation form may contain
up to 25 lower-case English letters:
1. Form `0′ is the past tense form (e.g., \sold”);
2. Form `1′ is the past participle form (e.g, \sold”);
3. Form `2′ is the present participle form (e.g., \selling”);
4. Form `3′ is the plural form (e.g., \sells”, note this is not the third-person present form).
The variation forms follow the rules below:

 A word may have (1) no variation forms, (2) the rst three forms (such as a verb), (3) only the last
form (such as a noun), or (4) all four forms (such as a word that can be either a verb or a noun). You
do not need to check the POS tags to verify the variation forms that a word has.
 The forms of a word always appears in the ascending order, e.g., \$0sold1sold2selling3sells”, not
 The variation forms of a word can be the same (e.g., Forms 0 and 1 of \sell” are both \sold”).
 A variation form of a word can be the word itself (e.g., Form 3 of \zebra” is still \zebra”).
 You may assume that any two words in the dictionary will not share the same variation forms (e.g.,
the variation forms of \allocate” and \sell” are all di erent). You may also assume that any word
will not be a variation form of another word (e.g., \allocate” is not a variation form of any other
word in the dictionary).