University of Toronto 的CSC148 Python作业，涉及数据结构的前缀树等知识，难度中上。

By this point in CSC148, we have studied several different abstract data types and data structures,

and most recently looked at tree-based data structures that can be used to represent hierarchical

data. On this assignment, you’ll explore a new tree-based data structure known as a prefix tree,

which is a particular kind of tree used to store large quantities of data organized into a hierarchy by

common “prefixes”.

You’ll be responsible first for implementing this new data structure, and then applying it to two

different problem domains, one of which is a text-based autocompletion tool similar to what online

search engines like Google use.

Task 1: Beginning with SimplePrefixTree

Open prefix_tree.py and read through the provided starter code for the Autocompleter and

SimplePrefixTree classes (you can ignore CompressedPrefixTree for now).

Your first task is to implement __init__, __len__, and insert for the SimplePrefixTree class.

The first two are quite a bit easier than the third.

SimplePrefixTree.insert should be implemented recursively, but be aware that you’ll need to

create new SimplePrefixTree objects to represent the different prefixes as you go down the tree.

Also, pay attention to representation invariants! In particular, the list of subtrees should always remain sorted in non-increasing order of weight, and your insertion algorithm should check for this.

Check your work!

We want you to start thinking about generating good test cases even at this point. Here are some

ideas for test cases to get started:

Inserting one value with an empty prefix [] into a new prefix tree should result in a tree with

two nodes: an internal node with an empty prefix [] , and then a leaf containing the inserted

value. Note that __len__ should return 1 in this case, since we only count inserted values for

the Autocompleter ADT.

Inserting one value with a length-one prefix [x] into a new prefix tree should result in a tree

with three node: two internal nodes with prefixes [] and [x], and then a leaf containing the

inserted value.

Inserting one value with a length-n prefix [x_1, …, x_n] into a new prefix tree should result

in a tree with (n+2) nodes: internal nodes with prefixes [], [x_1], [x_1, x_2], etc., and then

a leaf containing the inserted value.

Check that correct weights are calculated for both possible aggregate weight types. Note that

averages should not be rounded here.

Task 2: Implementing autocompletion

After you are confident that your algorithm is inserting values correctly, the next step is to

implement autocompletion.

You should do this in two parts:

1. First, implement an unlimited autocomplete, where you ignore the limit parameter and

always return every value that matches the input prefix.

2. Then, implement the limited autocomplete, using the algorithm described in the Weights

section.

You might find bugs in how your insertion algorithm updated tree weights. That’s totally

normal, but if this happens to you, make sure to add to your Task 1 tests to check that weights

are updated properly!

Note that because of the SimplePrefixTree representation invariants, each subtree list should

already be sorted in non-increasing weight order when you call autocomplete; don’t re-sort the

subtrees inside autocomplete (which is non-mutating), as this is very inefficient.

At this point, you should try inserting some values into a prefix tree, and then calling autocomplete

to obtain some results. Here are some suggestions of input properties/conditions to help you design test cases (add to this!):

How many values in the prefix tree match the autocomplete prefix? (0? 1? 10?)

What is the relationship between the number of matches and the limit argument? (less than?

equal to? greater than?)

If there are more matches than the specified limit, try different combinations of input weights to

check that you’re returning the right matches.

Again, check all of the above for both aggregate weight types.

Task 3: Implementing removal

Your task in this section is to implement the SimplePrefixTree.remove method. The recursive

structure is similar to SimplePrefixTree.autocomplete, but you’ll need to be careful about

updating weights.

Note: similar to the Tree class, we have a representation invariant that says that self.subtrees

cannot have any empty prefix trees (i.e., ones that do not contain any more values).

This means you need to check for this when implementing remove, since removing all values with

a given prefix might cause a much larger tree (and even the whole prefix tree) to become empty.

For example, if we have a prefix tree that contains two only values ‘cat and ‘car’, and we

remove every value with prefix [‘c’, ‘a’], then the resulting tree should become empty, even

though the prefix looks much more specific.

Check your work!

We strongly recommend starting here with small trees containing just a few values, and then

deleting one value at a time and checking at each step that the resulting tree has the right structure

and weights.

After this task, you’re done with the SimplePrefixTree class. Onto some fun applications!

Task 4: Text-based autocompletion

(Updated Nov 19) Please check new starter code for autocomplete_engines.py for correction

to documentation.

Next, open autocomplete_engines.py. Inside are the start of two classes

LetterAutocompleteEngine and SentenceAutocompleteEngine. Both of these are relatively simple

classes that use an Autocompleter to perform some autocomplete actions, and both store strings

as the values in the Autocompleter. Where they differ is in how they generate prefix sequences for

each string when it is inserted.

The LetterAutocompleteEngine associates a string with a list of characters in that string, in

the same way as the examples we’ve been using in the assignment handout so far. This enables it to finish a string given the first few letters.

The SentenceAutocompleteEngine associates a string with a list of the words in that string,

where whitespace is used to separate words. For example, the sentence

‘how are you today’ would have the prefix sequence [‘how’, ‘are’, ‘you’, ‘today’].

Note that the LetterAutocompleteEngine can also take a string with spaces in it, and each space

is treated just like any other letter. For example, the string ‘how are’ would have the prefix

sequence [‘h’, ‘o’, ‘w’, ‘ ‘, ‘a’, ‘r’, ‘e’].

Your task is to complete the implementation of these two classes based on the public interface and

some implementation instructions we’ve given you in the starter code. The hardest part of each

one is their initializer; the other methods should be simple calls to Autocompleter methods.

Text sanitization

Both LetterAutocompleteEngine and SentenceAutocompleteEngine will read data from a text file

to populate their Autocompleters. In the real world, this data is often messy, and we often go

through the process of sanitizing the data before using them in the rest of our program.

Task 5: Let’s make some music

Our last application will be a fun one: performing autocomplete on song melodies! First, some

background definitions.

A note represents a single sound. In Python, we represent a note as a Tuple[int, int],

where the first integer represents the pitch of the note,1 and the second represents its duration

in milliseconds.

A melody is a sequence of one or more notes. In melody.py, we’ve provided a Melody class

that stores a list of notes and a name for the melody.

An interval is the difference in pitch between two notes.

Finally, the inte

rval sequence of a melody is a list containing the differences between consecutive

notes in the melody. For example, if a melody contains the sequence of notes

Task 6: Compressing prefix trees

You might notice that our implementation of SimplePrefixTree is a little bit inefficient: because we

require that every internal value is a prefix that’s only one element longer than its parent, it is

possible to get very long paths down a simple prefix tree that lead to just a single value. You can

see a small example of this in the diagram of the sample prefix tree earlier in the handout with the

path leading to the word “danger”.

Let’s formalize this idea. We say that an internal value in a prefix tree is compressible if it has

exactly one child, and that child is also an internal value. Equivalently, we can say that an internal

value is incompressible if it has more than one child or it is the parent of a leaf (or both). It turns

out that it is possible to remove every compressible internal value in a prefix tree but still support

the Autocompleter operations in an efficient manner!

We say that a prefix tree with no compressible internal values is fully compressed. For example,

here is a fully compressed version of the prefix tree from the start of the handout.