Overview: Symbolic Machine Translation
In this assignment, you will learn how to write phrase structure grammars for some different linguistic phenomena in two different languages: English and Chinese. You can use the two grammars to create an interlingual machine translation system by parsing in one, and generating in the other.
Don’t panic if you don’t speak Chinese, and also don’t cheer up yet if you can speak the language
— it won’t give you much of an advantage over other students. A facility with languages in general will help you, as will the ability to learn and understand the nuances between the grammars of two different languages. In particular, you will start by working on agreement. Then, you will need to analyse the quantifier scoping difference between the two languages.
TRALE Instructions The TRALE system can be run with:
(which you are welcome to alias). For this assignment, TRALE needs to start a graphical interface:
Gralej. Therefore, if you don’t have access to the labs and want to run TRALE remotely, you can either use:
- RDP over SSH (https://www.teach.cs.toronto.edu/using_cdf/rdp.html),
- Remote Access Server NX (https://www.teach.cs.toronto.edu/using_cdf/remote_access_server.html),
- or connect to teach.cs using ssh with either the -X or -Y flag: ssh -X email@example.com
- Agreement: Determiners, Numbers and Classifiers [10 marks]
English expresses subject–verb agreement in person and number. English has two kinds of number:
singular and plural. The subject of a clause must agree with its predicate: they should be both singular or both plural. However, the number of a direct object does not need to agree with anything.
A linguist annoys a dolphin.
Two linguists annoy a dolphin.
* Two linguists annoys two dolphins.
* A linguist annoy two dolphins.
Chinese, on the other hand, does not exhibit subject–verb agreement. As shown in the examples below, most nouns do not inflect at all for plurality. Chinese does, however, have a classifier (CL) part of speech that English does not. Semantically, classifiers are similar to English collective nouns (a bottle of water, a murder of crows), but English collective nouns are only used when describing collectives. With very few exceptions, classifiers are mandatory in complex Chinese noun phrases.
Different CLs agree with different classes of nouns that are sorted by mostly semantic criteria.
For example, 语言学家 (yu yan xue jia)1 linguist is a person and an occupation, so it should be classified by either 个 (ge) or 位 (wei) and cannot be classified by the animal CL 只 (zhi). However,the rules of determining a noun’s class constitute a formal system that must be followed irrespective of semantic similarity judgements. For example, while mice and sheep are both animals and can both be classified by the animal CL 只 (zhi), 羊 (yang) sheep can take another classifier, 头 (tou), for livestock.
You should be familiar by now with the terminology in the English grammar starter code for this question. The Chinese grammar is fairly similar, but there is a new phrasal category called a classifier phrase (CLP), formed by a number and a classifier. The classifier phrase serves the same role as a determiner does in English.
The two grammars below don’t appropriately constrain the NPs generated. You need to design your own rules and features to properly enforce agreement.
Here is a list of all of the nouns in this question and their acceptable classifiers:
- 老鼠 laoshu mouse: 只 zhi;
- 羊 yang sheep: 只 zhi, 头 tou;
- 语言学家 yu yan xue jia linguist: 个 ge, 位 wei.