PHAS0100: Research Computing with C++
Assignment 1: Constructing a Small Research Project
Part A: Linear Regression App (55 marks)
The first part of this coursework is to get you to setup the project. These instructions will guide you through.
1. Please read chapter 4 of the “Hands on Machine Learning” that covers linear regression and provides background to the equations used in this assignment. The PDF is on Moodle in the Coursework topic.
2. Data can come from many places, e.g. file, network, or randomly generated. So, we first define an interface of what we expect from our data provider. In class we learnt: “program to interfaces”, so:
a. Create a header file, containing a pure virtual interface class, with a method equivalent to:
virtual std::vector<std::pair<double, double> > GetData() = 0;
The data returned should be a vector of X, y pairs, where X is the observed feature value, and y is the target/label/predicted value.
b. Ensure the file is included in CMakeLists.txt, so that your build environment will know it exists.
c. Create a header and implementation file of a new concrete (i.e. not abstract) class that implements this interface. At first, just write an empty method with the signature above.
d. Create a unit test file, that will instantiate an instance of your new concrete class.
e. Check that you can compile and run the test.
3. Now we implement the class to generate some data. The idea is that if we create some fake data, we know what the answer should be.
a. In the class that you created as part of 2c, implement a function that generates data that fits the linear model: � = �!� + �” + �����
b. In class we learnt the RAII pattern and dependency injection pattern, rather than using setters/getters. Ensure that parameters for your generator are passed in via constructor.
c. Write a specific unit test that checks:
i. The number of returned items is correct
ii. The distribution of the returned items is correct.
4. Similar to part 2, create a pure abstract interface for the solver, and a concrete implementation. Notice how we have separated the thing that generates or provides data from the thing that provides a solution.
a. Create a header file, containing a pure virtual method to fit data to a model. For this simple exercise, we know that the model only requires 2 parameters, �” and �!, so the returned value can be a pair of doubles representing �” and �!.
i.e. equivalent to:
std::pair<double, double> FitData(std::vector<std::pair<double, double> >)
b. Ensure the header file is included in your CMakeLists.txt
c. Create a header and implementation file of a new concrete (i.e. not abstract) class that implements this interface. At first, just write an empty method.
d. For simplicity re-use the unit test file created in section2d. (0 marks)
e. Check you can compile and run the test.
5. Implement a first solver, using Normal Equations. See equation 4-4 in “Hands on Machine Learning” and the Notations section in Chapter 2. This would be placed in the concrete class created in part 4c.
Hints: This project includes Eigen. Copy data from STL arrays to Eigen, and solve. Don’t worry about performance.
[Implementation 5 marks, Unit tests 5 marks. Both at markers discretion, 10 marks total]
6. Implement a second solver. As mentioned in class the point here is to demonstrate how 2 methods can co-exist in a project, and the project should be able to run both of them.
a. Create a new solver using Gradient Descent.
b. Ensure the parameters are all adjustable, by whoever is using the class (see Dependency Injection covered in lecture 4).
7. Now, in reality, you would be processing data produced by some scientific experiment. The idea here is to now create another concrete implementation of the interface defined in section 2a, where instead of randomly generated data, data is read in from a text file.
a. Create a new header and cpp file, of a concrete class that implements your interface containing the GetData() method.
b. Implement the class, using STL funtions to read data from a plain text file. Assume 2 values per line, representing X and y, each space separated.
c. Write unit tests to ensure you can read TestData1.txt and TestData2.txt (provided)