University of Toronto 的CSC148 Python作业，涉及数据结构的前缀树等知识，难度中上。
By this point in CSC148, we have studied several different abstract data types and data structures,
and most recently looked at tree-based data structures that can be used to represent hierarchical
data. On this assignment, you’ll explore a new tree-based data structure known as a prefix tree,
which is a particular kind of tree used to store large quantities of data organized into a hierarchy by
You’ll be responsible first for implementing this new data structure, and then applying it to two
different problem domains, one of which is a text-based autocompletion tool similar to what online
search engines like Google use.
Task 1: Beginning with SimplePrefixTree
Open prefix_tree.py and read through the provided starter code for the Autocompleter and
SimplePrefixTree classes (you can ignore CompressedPrefixTree for now).
Your first task is to implement __init__, __len__, and insert for the SimplePrefixTree class.
The first two are quite a bit easier than the third.
SimplePrefixTree.insert should be implemented recursively, but be aware that you’ll need to
create new SimplePrefixTree objects to represent the different prefixes as you go down the tree.
Also, pay attention to representation invariants! In particular, the list of subtrees should always remain sorted in non-increasing order of weight, and your insertion algorithm should check for this.
Check your work!
We want you to start thinking about generating good test cases even at this point. Here are some
ideas for test cases to get started:
Inserting one value with an empty prefix  into a new prefix tree should result in a tree with
two nodes: an internal node with an empty prefix  , and then a leaf containing the inserted
value. Note that __len__ should return 1 in this case, since we only count inserted values for
the Autocompleter ADT.
Inserting one value with a length-one prefix [x] into a new prefix tree should result in a tree
with three node: two internal nodes with prefixes  and [x], and then a leaf containing the
Inserting one value with a length-n prefix [x_1, …, x_n] into a new prefix tree should result
in a tree with (n+2) nodes: internal nodes with prefixes , [x_1], [x_1, x_2], etc., and then
a leaf containing the inserted value.
Check that correct weights are calculated for both possible aggregate weight types. Note that
averages should not be rounded here.
Task 2: Implementing autocompletion
After you are confident that your algorithm is inserting values correctly, the next step is to
You should do this in two parts:
1. First, implement an unlimited autocomplete, where you ignore the limit parameter and
always return every value that matches the input prefix.
2. Then, implement the limited autocomplete, using the algorithm described in the Weights
You might find bugs in how your insertion algorithm updated tree weights. That’s totally
normal, but if this happens to you, make sure to add to your Task 1 tests to check that weights
are updated properly!
Note that because of the SimplePrefixTree representation invariants, each subtree list should
already be sorted in non-increasing weight order when you call autocomplete; don’t re-sort the
subtrees inside autocomplete (which is non-mutating), as this is very inefficient.
At this point, you should try inserting some values into a prefix tree, and then calling autocomplete
to obtain some results. Here are some suggestions of input properties/conditions to help you design test cases (add to this!):
How many values in the prefix tree match the autocomplete prefix? (0? 1? 10?)
What is the relationship between the number of matches and the limit argument? (less than?
equal to? greater than?)
If there are more matches than the specified limit, try different combinations of input weights to
check that you’re returning the right matches.
Again, check all of the above for both aggregate weight types.
Task 3: Implementing removal
Your task in this section is to implement the SimplePrefixTree.remove method. The recursive
structure is similar to SimplePrefixTree.autocomplete, but you’ll need to be careful about
Note: similar to the Tree class, we have a representation invariant that says that self.subtrees
cannot have any empty prefix trees (i.e., ones that do not contain any more values).
This means you need to check for this when implementing remove, since removing all values with
a given prefix might cause a much larger tree (and even the whole prefix tree) to become empty.
For example, if we have a prefix tree that contains two only values ‘cat and ‘car’, and we
remove every value with prefix [‘c’, ‘a’], then the resulting tree should become empty, even
though the prefix looks much more specific.
Check your work!
We strongly recommend starting here with small trees containing just a few values, and then
deleting one value at a time and checking at each step that the resulting tree has the right structure
After this task, you’re done with the SimplePrefixTree class. Onto some fun applications!
Task 4: Text-based autocompletion
(Updated Nov 19) Please check new starter code for autocomplete_engines.py for correction
Next, open autocomplete_engines.py. Inside are the start of two classes
LetterAutocompleteEngine and SentenceAutocompleteEngine. Both of these are relatively simple
classes that use an Autocompleter to perform some autocomplete actions, and both store strings
as the values in the Autocompleter. Where they differ is in how they generate prefix sequences for
each string when it is inserted.
The LetterAutocompleteEngine associates a string with a list of characters in that string, in
the same way as the examples we’ve been using in the assignment handout so far. This enables it to finish a string given the first few letters.
The SentenceAutocompleteEngine associates a string with a list of the words in that string,
where whitespace is used to separate words. For example, the sentence
‘how are you today’ would have the prefix sequence [‘how’, ‘are’, ‘you’, ‘today’].
Note that the LetterAutocompleteEngine can also take a string with spaces in it, and each space
is treated just like any other letter. For example, the string ‘how are’ would have the prefix
sequence [‘h’, ‘o’, ‘w’, ‘ ‘, ‘a’, ‘r’, ‘e’].
Your task is to complete the implementation of these two classes based on the public interface and
some implementation instructions we’ve given you in the starter code. The hardest part of each
one is their initializer; the other methods should be simple calls to Autocompleter methods.
Both LetterAutocompleteEngine and SentenceAutocompleteEngine will read data from a text file
to populate their Autocompleters. In the real world, this data is often messy, and we often go
through the process of sanitizing the data before using them in the rest of our program.
Task 5: Let’s make some music
Our last application will be a fun one: performing autocomplete on song melodies! First, some
A note represents a single sound. In Python, we represent a note as a Tuple[int, int],
where the first integer represents the pitch of the note,1 and the second represents its duration
A melody is a sequence of one or more notes. In melody.py, we’ve provided a Melody class
that stores a list of notes and a name for the melody.
An interval is the difference in pitch between two notes.
Finally, the inte
rval sequence of a melody is a list containing the differences between consecutive
notes in the melody. For example, if a melody contains the sequence of notes
Task 6: Compressing prefix trees
You might notice that our implementation of SimplePrefixTree is a little bit inefficient: because we
require that every internal value is a prefix that’s only one element longer than its parent, it is
possible to get very long paths down a simple prefix tree that lead to just a single value. You can
see a small example of this in the diagram of the sample prefix tree earlier in the handout with the
path leading to the word “danger”.
Let’s formalize this idea. We say that an internal value in a prefix tree is compressible if it has
exactly one child, and that child is also an internal value. Equivalently, we can say that an internal
value is incompressible if it has more than one child or it is the parent of a leaf (or both). It turns
out that it is possible to remove every compressible internal value in a prefix tree but still support
the Autocompleter operations in an efficient manner!
We say that a prefix tree with no compressible internal values is fully compressed. For example,
here is a fully compressed version of the prefix tree from the start of the handout.