In this assignment, you will be building a Convolutional Neural Network (CNN) to learn to distinguish among the CIFAR-10 dataset. Please read this handout in its entirety before beginning the assignment.
Please submit the Conceptual Questions on Gradescope under hw3-cnn conceptual. You must type your submissions and upload a PDF. We recommend using LaTeX.
Getting the Stencil
Work off of the stencil code provided, but do not change the stencil except where specified. Doing so may cause incompatibility with the autograder. Don’t change any method signatures!
This assignment requires NumPy and Matplotlib. You should already have this from HW1. Also, this assignment requires TensorFlow 2.6 and tensorflow-datasets (this will make loading the data much easier). We’ve put all the environment creation commands into a script for you so you can just run ./conda_create.sh or ./conda_create_m1.sh depending on whether you’re using an Intel Mac/Windows or an M1/M2. Make sure you activate this new environment (DL3) afterwards by running conda activate dl3.
You are going to be building a CNN for CIFAR10.
- CIFAR10: 10 classes (airplane, automobile, bird, cat, deer, frog, horse, ship, and truck)
The assignment has seven parts (we suggest doing them in this order):
- Conceptual Questions:Business as usual.
- Data Augmentation and creating your model:Start with a basic data augmentation pipeline. Then, build the model.
- Conv2D: Implement the conv2d function for your CNN (this is mostly just a wrapper around tf.nn.convolution).
- Batch Normalization:Implement the batch normalization function for your CNN (the manual version of tf.nn.batch_normalization).
- Dropout:Implement the dropout function (the manual version of tf.nn.dropout).
- Manual Conv2D: Implement the manual version of the conv2d function for your CNN
- Tweaking your model: Experiment with different data augmentations and model architectures
You should include a brief README with your model’s accuracy and any known bugs!
Lab 3 includes many explanations about the way a Model class is structured, what variables are, and how things work in TensorFlow. If you come into hours with questions about TensorFlow related material that is covered in the lab, we will direct you to the lab.
Below is a brief outline of the functions given to you and the functions you need to implement (there are more detailed comments in the code itself). We expect you to fill in some of the missing gaps (review lecture slides, labs, previous homeworks, Keras documentation, etc).
The HW3 CNN notebook that is provided is meant to be a way for you to incrementally test functions as you write them. To test major functionality, you can also run python3 assignment.py with various command-line arguments as described below.
Step 1. Getting the data (GIVEN)
- The get_datafunction in assignment.py pulls in the CIFAR10 dataset and returns X0 (training images), X1 (training labels), Y0 (testing images), and Y1 (testing labels).
Step 2. Data Augmentation and creating your model
- In get_default_CNN_model, you will be creating a data augmentation pipeline and making use of a CustomSequential model.
- [TODO 1] Add data augmentation layers (resizing, flips, rotations, scaling, etc.). Experiment with these layers after implementing your model to see how they can improve your accuracy.
- [TODO 2] Write the model by taking advantage of CustomSequential.
- You must use Conv2D_manual as the first layer (see stencil comment). For any convolution layers, you MUST instantiate them as Conv2D and not tf.keras.layers.Conv2D. Similarly, use BatchNormalization and Dropout (not tf.keras.layers…). This is to allow for easy swapping of your implementation with the Keras implementation.
- Aside from convolution, batch normalization, and dropout, feel free to use any tf.keras.layers you’d like (tf.keras.layers.Dense, tf.keras.layers.MaxPool2D, etc.)
- A model that achieves a good accuracy may look as follows:
- A few activated convolution layers, each followed by batch norm.
- A flattening of features followed by some dense layers.
- This model is completely open-ended though, so feel free to customize it in any way you want! Play around with your model’s hyperparameters and intermediate latent sizes.
- Make sure your model performs decently (even if it doesn’t achieve the accuracy cutoff yet) because for the rest of the assignment, you’ll slowly be swapping the Keras convolution, batch normalization, and dropout layers with your implementations.
- [TODO 3]Compile your model using an appropriate optimizer, loss, and metrics. Refer to Lab 3 or the tensorflow documentation for possible values for these parameters and other parameters you can set.
- [TODO 4]Choose the number of epochs and batch size you will use to train your model.
- Once you’ve filled out get_default_CNN_modelwith your model architecture, you can go ahead and test it!
- Within assignment.py, your model gets run by calling run_task. Notice that run_task has two main parameters: task and subtask (these are described in more detail in the run_task docstring).
- Task can be set to 1, 2, or 3 and indicates whether to train the model with Keras layers, your implementation of some/all of the layers, or your manual implementation of convolution.
- Subtask can be set to 1, 2, or 3 to replace the Keras convolution layer, Keras batch normalization layer, or Keras Dropout layer with your implementation, respectively.
- Once you’ve finished TODOs 1-4 above, you can test your model using python3 assignment.pyor using the appropriate cell in the HW3 CNN notebook
- Do NOT move on to later steps until you have an accuracy of at least 55% (note that the requirement for 1470 is 60% and for 2470, is 65% though). Otherwise, when you begin swapping Keras layers out with your implementations from layers_keras.py, it will be impossible to tell whether the model architecture is the problem or your implementation is actually incorrect.
Step 3/4/5. Building up layers_keras.py
Step 3. Creating your own convolution (Conv2D)
- In the Conv2Dclass, you will be writing the build and call methods. Take a look at Lecture 8 if you’re confused about how convolution works.
- [TODO 1] Inside build, initialize your kernel weights and bias vector
- [TODO 2/3] Inside call, perform convolution using tf.nn.convolution and apply bias and activation if necessary. Then, apply regularizers and call self.add_loss (for more information on how to add losses, see https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer).
- Please test the model you implemented in Step 1 except with the Keras convolution layer replaced by your convolution layer to ensure that your implementation is correct.
- You can test your model with just the Keras convolution layer replaced using python3 assignment.py –-task 2 –-subtask 1or using the appropriate cell in the HW3 CNN notebook
Step 4. Creating your own batch normalization (BatchNormalization)
- In the BatchNormalizationclass, you will be writing the build and call methods. Take a look at Lecture 11 if you’re confused about the purpose of batch normalization or the appropriate formulas to use.
- [TODO 1] Inside build, initialize any variables you need to perform batch normalization. Make sure to pay attention to what self.axis is.
- [TODO 2]After initializing variables, you need to make them into tf.Variables and specify whether they should be trainable or not. tf.Variables are used by Keras to store and update model parameters.
- [TODO 3/4] Inside call, implement the forward pass for batch normalization (for a different explanation, take a look here). This will normalize, scale, and shift the inputs. Then, apply regularizers and call self.add_loss.
- Please test the model you implemented in Step 1 except with the Keras batch normalization layer replaced by your batch normalization layer to ensure that your implementation is correct.
- You can test your model with just the Keras batch normalization layer replaced using python3 assignment.py –-task 2 –-subtask 2or using the appropriate cell in the HW3 CNN notebook
Step 5. Creating your own dropout (Dropout)
- In the Dropoutclass, you will be writing the call method.
- [TODO] Inside call, you will set certain elements in your inputs to 0. The number of 0s in your input corresponds to the dropout rate. Since we’ve dropped many elements from our inputs vector, the expected value of all inputs has changed. Thus, to maintain the same expected value, and to prevent our inputs from getting too large over many dropouts, we must scale all remaining inputs up by the dropout rate (after dropout, the sum of all inputs should be the same as before dropout).
- Please test the model you implemented in Step 1 except with the Keras dropout layer replaced by your dropout layer to ensure that your implementation is correct.
- You can test your model with just the Keras dropout layer replaced using python3 assignment.py –-task 2 –-subtask 3or using the appropriate cell in the HW3 CNN notebook
Once you’ve implemented everything in layers_keras.py, you can test your model with all 3 of the Keras layers replaced with your corresponding implementations using python3 assignment.py –-task 2 or using the appropriate cell in the HW3 CNN notebook
Step 6. Manual Convolution
- The Conv2Dclass in layers_manual.py subclass from layers_keras.Conv2D so it has access to all the same instance variables and functions. You’re just going to be overriding the call function so that it uses your manual implementation instead.
- [TODO]Convolve the filter with the inputs.
- This will involve padding the input if SAME padding is set.
- Then, calculate the correct output dimensions and perform the convolution operator on each input image to generate a corresponding output image.
- Note that you will want to iterate the entire height and width including padding, stopping when cannot fit a filter over the rest of the padding input. For convolution with many input channels, you will want to perform the convolution per input channel and sum those dot products together.
- PLEASE RETURN YOUR RESULT TO A TENSOR USING convert_to_tensor(your_array, dtype = tf.float32). Issues have occurred in the past without this step.
- [TODO]Convolve the filter with the inputs.
Step 7. Tweaking your model!
Now that you have a completely working model, let’s try to optimize it! There are 2 main ways you can improve the accuracy of your model:
- Change your data augmentation pipeline by adding or removing scaling, cropping, rotation, reflection, etc.
- Change your model architecture by modifying the number of convolution and dense layers, parameters for these layers (kernel size, stride size, padding, etc.), the type of activation functions, and more!
- Complete and Submit HW3 Conceptual
- Implement CNN model in conv_model.py and custom implementations in layers_keras.py and layers_manual.py
- Get test accuracy >=60% on CIFAR10
- The “HW3 CNN” notebook is just a guide for incremental testing.
- As a bonus (up to 10 points):Feel free to do the base requirements for the Exploration notebook.
- Credit will be given during manual grading.
- Autograder will display 0/10 credit when PDF export is found (so you can achieve full marks on autograder), and will be re-assessed manually.
- Same as 1470 except:
- Get testing accuracy >=65% on CIFAR10.
- You will need to add additional layers and explore hyperparameter options.
- These may include regularization, other weight initialization schemes, aggregation layers, dropout, rate scheduling, or skip connections.
- Finish 2470 components for Exploration notebook and conceptual questions.
- Feel free to use regular tf.keras.layers components however you’d like!
- As a bonus (up to 10 points):Feel free to explore a non-obvious domain that isn’t related to “regular images” (i.e. sound, some tabular structures, genomics, etc). Exploration must be motivated with reason (so there should be locality to exploit).
- Credit will be given during manual grading.
- Autograder will display 15/25 credit (so you can achieve full marks on autograder) when PDF export is found, and will be re-assessed manually.
Grading and Autograder Compatibility
Conceptual: You will be primarily graded on correctness, thoughtfulness, and clarity.
Code: You will be primarily graded on functionality. Your model should have an accuracy that is greater than or equal to the threshold on the testing data. Although you will not be graded on code style, your final submission should not have excessive print statements.
IMPORTANT! Please use vectorized operations when possible and limit the number of for loops you use. The autograder will automatically time out after 10 minutes.
Notebook: [Required for 2470] The exploration notebook will be graded manually and should be submitted as pdf file & .ipynb. Feel free to use the “Notebooks to Latex PDFs.ipynb” notebook!
You should submit the assignment via Gradescope under the corresponding project assignment by zipping up your hw folder (the path on Gradescope MUST be hw3/code/filename.py) or through GitHub (recommended). To submit through GitHub, commit and push all changes to your repository to GitHub. You can do this by running the following three commands (this is a good resource for learning more about them):
- git add file1 file2 file3 (or -A)
- git commit -m “commit message”
- git push
After committing and pushing your changes to your repo (which you can check online if you’re unsure if it worked), you can now just upload the repo to Gradescope! If you’re testing out code on multiple branches, you have the option to pick whichever one you want.
- Please make sure all your files are in hw3/code. Otherwise, the autograder will fail!
- 2470 STUDENTS: Add a blank file named 2470studentin the hw3/code directory!
The file should have no extension, and is used as a flag to grade 2470-specific requirements. If you don’t do this, YOU WILL LOSE POINTS!