这个作业是介绍C语言的链接、编译等入门知识

CS211: Programming Assignment 1:

Intro to C Programming

1 Array Sorting

You have to write a program named first that will read an array from a file and sort the given array. You

will return the array sorted with all odd numbers in ascending order at the front followed by all even

numbers in descending order. You can use any sorting algorithm you know or wish to use. However, you

cannot use the library sort functions. You should write your own sort function.

1.1 Input Format

Your program will take a file name as an input. The first line in the file provides the total number of

values in the array. The second line will contain a list of numbers separated by tabs. For example a

sample input file “example_file.txt” is:

6

25 10 1 99 4 2

example_file.txt

1.2 Output Format

Your output will be a sorted list of numbers, odd numbers in ascending order and then even numbers

in descending order, each separated by tabs.

Command Line

$ ./first example_file.txt

1 25 99 10 4 2

1.3 Things to Note

• We will not give you improperly formatted files. You can assume all your input files will be in proper

format as above.

• Hint: It may be helpful to move all odd numbers to the front of an array first and then sort them.

1

2 Hash Table

In this part, you will implement a hash table containing integers. The hash table has 10,000 buckets. An

important part of a hash table is collision resolution. In this assignment, we want you to use chaining with

a linked list to handle a collision. This means that if there is a collision at a particular bucket then you

will maintain a linked list of all values stored at that bucket. For more information about chaining, see

http://research.cs.vt.edu/AVresearch/hashing/openhash.php

A hash table can be implemented in many ways in C. You must find a simple way to implement a hash

table structure where you have easy access to the buckets through the hash function. As a reminder, a hash

table is a structure that has a number of buckets for elements to ”hash” into. You will determine where

the element falls in the table using a hash function we describe below.

You must not do a linear search of the 10,000 element array. We will not award any credit for O(n) time

implementation of searches or insertions in the common case.

For this problem, you will be using the following hash function: key modulo the number of buckets.

2.1 Input Format

This program takes a file name as argument from the command line. The file contains successive lines of

input. Each line contains a character, either ‘i’ or ‘s’, followed by a tab and then an integer. For each line

that starts with ‘i’, your program should insert that number in the hash table if it is not present. If the line

starts with a ‘s’, your program should search the hash table for that value.

i 10

i 12

s 10

i 10

s 5

example_file.txt

2.2 Output Format

For each line in the input file, your program should print the status/result of that operation. For an insert,

the program should print “inserted” if the value is inserted or “duplicate” if the value is already present.

For a search, the program should print ”present” or “absent” based on the outcome of the search. You can

assume that the program inputs will have proper structure.

Command Line

$ ./second example_file.txt

inserted

inserted

present

duplicate

absent

2

3 Bit Function

In this exercise, you have to write a program that will read a number followed by a series of bit operations

from a file and perform the given operations sequentially on the number. The operations are as follows:

• set(x, n, v): sets the nth bit of the number x to v

• comp(x, n, v): sets the value of the nth bit of x to its complement (1 if 0 and 0 if 1)

• get(x, n, v): returns the value of the nth bit of the number x.

The least significant bit (LSB) is considered to be index 0.

For this exercise, you must use logical operations to implement the three functions. You are not allowed

to use any arithmetic operations or math library functions.

3.1 Input Format

Your program will take the file name as input. The first line in the file provides the value of the number x

to be manipulated. This number should be considered an unsigned short. The following lines will contain

the operations to manipulate the number. To simplify parsing, the format of the operations will always be

the command name followed by 2 numbers, separated by tabs. For the set(x, n, v) command, the value

of the second input number (v) will always be either 0 or 1. For the comp(x, n) and get(x, n) commands

the value of the second input number will always be 0 and can be ignored. Note that the changes to x are

cumulative, rather than each instruction operating independently on the original x.

5 # x =5

get 0 0 # get (x , 0) , ignoring second value (0)

comp 0 0 # comp (x , 0) , ignoring second value (0)

set 1 1 # set (x ,1 ,1)

example_file.txt

3.2 Output Format

Your output for comp and set commands will be the resulting value of the number x after each operation,

each on a new line. For get commands, the output should be the requested bit’s value.

Command Line

$ ./third example_file.txt

1

4

6

3

4 One-Shot Learning

In this exercise, you will write a C program that implements simple “one-shot” machine learning algorithm

for predicting house prices in your area.

There is significant hype and excitement around artificial intelligence (AI) and machine learning. CS 211

students will get a glimpse of AI/ML by implementing a simple machine learning algorithm to predict

house prices based on historical data.

For example, the price of the house (y) can depend on certain attributes of the house: number of bedrooms

(x1), total size of the house (x2), number of baths (x3), and the year the house was built (x4). Then, the

price of the house can be computed by the following equation:

y = w0 + w1x1 + w2x2 + w3x3 + w4x4 (1)

Given a house, we know the attributes of the house (i.e. x1 , x2 , x3 , x4 ). However, we don’t know the

weights for these attributes: w0, w1, w2, w3, and w4. The goal of the machine learning algorithm in our

context is to learn the weights for the attributes of the house from lots of training data.

Let’s say we have N examples in your training data set that provide the values of the attributes and the

price. Let’s say there are K attributes. We can represent the attributes from all the examples in the training

data as a N ∗ (K + 1) matrix as follows, which we call X:

1 x0,1 x0,2 x0,3 x0,4

1 x1,1 x1,2 x1,3 x1,4

1 x2,1 x2,2 x2,3 x2,4

1 x3,1 x3,2 x3,3 x3,4

.

.

.

1 xn,1 xn,2 xn,3 xn,4

where n is N − 1. We can represent the prices of the house from the examples in the training data as a

N × 1 matrix, which we call Y :

y0

y1

.

.

.

yn

Similarly, we can represent the weights for each attributes as a (K + 1) × 1 matrix, which we call W:

w0

w1

.

.

.

wn

The goal of our machine learning algorithm is to learn this matrix from the training data. Now, in the

matrix notation, entire learning process can be represented by the following equation, where X, Y , W

are matrices as described above.

XW = Y

Using the training data, we can learn the weights using the below equation:

W = (XT X)

−1XT Y

4

where XT

is the transpose of the matrix X and (XT X)

−1

is the inverse of the matrix XT X.

Your main task in this part of the assignment is to implement a program to read the training data

and learn the weights for each of the attributes. You have to implement functions to multiply matrices,

transpose matrices, and compute the inverse of the matrix. You will use the learned weights to predict the

house prices for the examples in the test data set.

Want to learn more about One-shot Learning? The theory behind this learning is not important for the

purposes of this assignment. The algorithm you are implementing is known as linear regression with least

square error as the error measure. The matrix (XT X)

−1XT

is also known as the pseudo-inverse of matrix

X. If you are curious, you can learn more about this algorithm at

4.1 Computering the Inverse using Gause-Jordan Elimination

To compute the weights above, your program has to compute the inverse of matrix. There are numerous

methods to compute the inverse of a matrix. We want you to implement a specific method for computing

the inverse of a matrix known as Gauss-Jordan elimination, which is described below. If you compute

inverse using any other method, you will risk losing all points for this part.

Inverse of a square matrix A is another square matrix B, since AB = BA = I, where I is the identity

matrix.

4.1.1 Gauss-Jordan Elimination for computing inverses

Below, we give a sketch of Gauss-Jordan elimination method. Given a matrix A whose inverse needs to be

computed, you create a new matrix Aaug, which is called the augmented matrix of A, by concatenating

identity matrix with A as shown below.

Let’s say matrix A, whose inverse you want to compute is shown below:

1 2 4

1 6 7

1 3 2

The augmented matrix Aaug os A is:

1 2 4 1 0 0

1 6 7 0 1 0

1 3 2 0 0 1

The augmented matrix essentially has the original matrix and the identity matrix. Next, we perform row

operations on the augmented matrix so that the original matrix part of the augmented matrix turns into

an identity matrix.

The valid row operations to compute the inverse (for this assignment) are:

• You can divide the entire row by a constant

• You can subtract a row by another row

• You can subtract a row by another row multiplied by a constant

However, you are not allowed to swap the rows. In the more complex Gauss-Jordan elimination method,

you are allowed to swap the rows. For simplicity, we do not allow you to swap the rows.

5

Let’s see this method with the above augmented matrix Aaug. Again, our goal is to transform A part of

Aaug into an identity matrix. Since Aaug[1][0] 6= 0, we will subtract the first row from the second row to

make Aaug[1][0] = 0. Hence, we perform the operation R1 = R1 − R0 where R1 and R0 represent the

second and first row of the augmented matrix:

1 2 4 1 0 0

0 4 3 −1 1 0

1 3 2 0 0 1

Now, we want to make Aaug[1][1] = 1. Hence, we perform the operation R1 = R1/4. The augmented

matrix after this operation is:

1 2 4 1 0 0

0 1 3

4 −

1

4

1

4

0

1 3 2 0 0 1

Next, we want to make Aaug[2][0] = 0. Hence, we perform the operation R2 = R2 − R0. The augmented

matrix after this operation is:

1 2 4 1 0 0

0 1 3

4 −

1

4

1

4

0

0 1 −2 −1 0 1

Next, we want to make Aaug[2][1] = 0. Hence, we perform the operation R2 = R2 − R1. The resulting

matrix is:

1 2 4 1 0 0

0 1 3

4 −

1

4

1

4

0

0 0 11

4 −

3

4 −

1

4

1

Now, we want to make Aaug[2][2] = 1. Hence, we perform the operation R3 = R3 × − 4

11 :

1 2 4 1 0 0

0 1 3

4 −

1

4

1

4

0

0 0 1 3

11

1

11 −

4

11

So far we have moved down the matrix and zero’ed the lower triangle matrix to 0’s and the diagonals of

the original matrix A to 1’s. Now, we will move up the matrix and transform the upper triangle portion of

the original matrix A to 0.

As a first step of this phase, we want to make Aaug[1][2] = 0. Hence, we perform the operation R1 =

R1 −

3

4 × R2 :

1 2 4 1 0 0

0 1 0 −

5

11

2

11

3

11

0 0 1 3

11

1

11 −

4

11

Next, we want to make Aaug[0][2] = 0. Hence, we perform the operation R0 = R0 − 4 × R2:

1 2 0 1

11 −

4

11

16

11

0 1 0 −

5

11

2

11

3

11

0 0 1 3

11

1

11 −

4

11

Next, we want to make Aaug[0][1] = 0. Hence, we perform the operation R0 = R0 − 2 × R1:

1 0 0 9

11 −

8

11

10

11

0 1 0 −

5

11

2

11

3

11

0 0 1 3

11

1

11 −

4

11

6

At this time, the A part of the augmented matrix is an identity matrix. The inverse of A matrix is the right

half of the augmented matrix, which originally was the identity matrix:

9

11 −

8

11

10

11

−

5

11

2

11

3

11

3

11

1

11 −

4

11

Your program must implement how to compute the inverse of a matrix using Gaussian-Jordan elmination

in order to perform one-shot learning.

4.2 Input Format

4.2.1 Training Data

The first line in the training file will be an integer that provides the number of attributes (K) in the training

set. The second line in the training data file will be an integer (N) providing the number of training

examples in the training data set. The next N lines represent the N training examples. Each line for the

example will be a list of comma-separated double precision floating point values. The first double precision

value in the line represents the price of the house. The remaining K double precision values represent the

values for the attributes of the house.

4

7

21900.000000 ,3.000000 ,1.000000 ,1180.000000 ,1955.000000

538000.000000 ,3.000000 ,2.250000 ,2570.000000 ,1951.000000

180000.000000 ,2.000000 ,1.000000 ,770.000000 ,1933.000000

604000.000000 ,4.000000 ,3.000000 ,1960.000000 ,1965.000000

510000.000000 ,3.000000 ,2.000000 ,1680.000000 ,1987.000000

1230000.000000 ,4.000000 ,4.500000 ,5420.000000 ,2001.000000

257500.000000 ,3.000000 ,2.250000 ,1715.000000 ,1995.000000

train1.txt

In the example above, there are 4 attributes and 7 training data examples. Each example has the price of

the house followed by the values for the attributes. To illustrate, consider the training example below:

221900.000000, 3.000000, 1.000000, 1180.000000, 1955.000000

The price of the house is 221900.000000. The first attribute has value 3.000000, the second attribute has

value 1.000000, the third attribute has value 1180.000000, and the fourth attribute has value 1955.000000.

4.2.2 Test Data File

The first line in the training file will be an integer (M) that provides the number of test data points in the

file. Each line will have K attributes. The value of K is defined in the training data file. Your goal is predict

the price of house for each line in the test data file. The next M lines represent the M test points for which

you have to predict the price of the house. Each line will be a list of comma-separated double precision

floating point values. There will be K double precision values that represent the values for the attributes

of the house.

2

3.000000 ,2.500000 ,3560.000000 ,1965.000000

2.000000 ,1.000000 ,1160.000000 ,1942.000000

test1.txt

7

It indicates that you have to predict the price of the house using your training data for 2 houses. The

attributes of each house is listed in the subsequent lines.

4.3 Output Format

Your program for this part will be executed as follows:

$./fourth <train-data-file-name> <test-data-file-name>

Your program should print the price of the house for each line in the test data file. Your program should

not produce any additional output. If the price of the house is a fractional value, then your program should

round it to the nearest integer, which you can accomplish with the following printf statement:

printf(“%0.0lf\n”, value);

where value is the price of the house and its type is double in C.

Your program should predict the price of the entry in the test data file by substituting the attributes and

the weights (learned from the training data set) using (EQ1).

A sample output of the execution when you execute your program is shown below:

Command Line

$ ./fourth train.txt test1.txt

737861

203060

Submission

Submission Folder Structure

All files must be included in the pa1 folder. The pa1 directory in your tar file must contain 9 subdirectories, one each for each of the parts. The name of the directories should be named first through fourth (in

lower case). Each directory should contain a c source file, a header file (if you use it) and a Makefile. For

example, the subdirectory first will contain first.c first.h (if you create one) and Makefile (the names are

case sensitive).

pa1

first

first.c

first.h (if used)

Makefile

second

second.c

second.h (if used)

Makefile

third

third.c

third.h (if used)

Makefile

fourth

fourth.c

fourth.h (if used)

Makefile

## 发表评论