Please submit your project report electronically via the myCourses assignment dropbox.
The submission should include a single Jupyter notebook. All images and programs should be inserted in the Jupyter notebook, not submitted separately (except for video files if you are not able to embed these in the notebook.
Use Python code to implement all operations. You can use OpenCV and Numpy library functions for all parts of the assignment, or you can write your own.
Students are expected to write their own code and assignments. (Academic integrity guidelines can be found at https://www.mcgill.ca/students/srr/ academicrights/integrity).
Project reports received late will be penalized by 10% per day up to 5 days (after which the submission will not be graded).
This is a group project – each group member must have responsibility for one part of the project,e.g. one student creates the training/test set, another defines and trains the network, and another does testing and validation. The project report must indicate the contributions of each group member. All group members will get the same mark.
You will make use of the “Stanford Dogs” dataset, which is a subset of ImageNet consisting of images of dogs classified as one of 120 different breeds. (the dataset files will also be posted on myCourses) http://vision.stanford.edu/aditya86/ImageNetDogs/
You can follow the tutorial on training YOLOv7 on a custom dataset at
This blog post describes the process when using “Gradient Notebooks”. Gradient notebooks are an alternative to Google Colab, and allows you to run python code in a web browser.
https://www.paperspace.com/gradient/notebooks. You can use either Gradient Notebooks or Google Colab for the project. Or you can run it on your own home computer if you wish. Colab will probably give faster training than the free tier on Gradient.
- Create the training, validation and test sets.
a.First select 10 dog breed categories from the full Stanford Dogs dataset with 120 different breeds, based on your group number. In the “Images” folder of the dataset there are 120 folders. You will use 10 of these. Select folders (G+13N)%120, for N = 0,1,…9. (% is the modulo operation, e.g. 131%120 = 11). For example, for group 37, select folders (37,50,63,74,87,100,113,6,19,32). Folder 0 is the first folder in the annotations directory (when listed in alphanumeric order). You can replace some of these with your own choice of dog breeds if you wish. We want each group to train their network on different sets of dog breeds.
b.In the “Annotations” folder, select the 10 annotation files corresponding to your 10 selected dog breeds. These annotation files have bounding box information for each instance of the dog in the image (most, if not all, of the images contain just a single dog instance).
c.The annotations files are in what is known as the VOC format. To do the project you will need to convert these to the format known required by YOLO. The tutorial has some information about the difference between VOC and YOLO format annotations and shows how to do the conversion. Following the tutorial, visualize some of the annotated images to make sure the conversion was done correctly, and the annotations make sense.
d.Divide your custom dataset into Training/Validation/Test sets by splitting the dataset in the ratio 80%/10%/10%.
e.Note that the images in the Stanford Dogs dataset are not square. Most images are roughly 500×330 pixels, but some are smaller. YOLO will resize the input images to be square, with dimension equal to the img-size parameter. So set this parameter to 500.
f.Create a second version of the dataset by augmenting your first training set by horizontally flipping each of the images. Note that you will have to edit the annotations file to flip the bounding box values as well. You could do this manually with a text/image editor, but you should write a program to do this for you quickly. (note: do the training set augmentation only after doing the split into the train/validate/test sets!)
- Implement and train the YOLO model
a.Initialize the network using a pretrained model such as yolosv7.pt.
b.Set the various training and hyperparameter config options as suggested by the tutorial. You can try to change some of these to see if you can improve your network’s accuracy.
c.Use the standard network configuration: yolov7.yaml. Edit this file to change some of the settings as needed (e.g. you will need to change the number of classes, class names, and the paths to the image and annotation files).
d.Train the network using your custom dog breed dataset. Make two versions of the network, one trained using the un-augmented training set and one using the augmented training set.
- Validate and test the trained network
a.As outlined in the tutorial, compute the mAP performance metric on the test set, and put the results in the project report. Also include some selected examples of the output detections on the test set.
b.Repeat for the network trained on the augmented training set. Does augmentation improve accuracy?
c.Select some of the other dog breed images from the Stanford Dogs dataset which were not part of your dataset. Run the network on these images. Do you get any detections of your 10 dog breeds in these images?
d.Run the network on images you find on the internet of dogs of your selected breeds.
e.Find a video with at least one of your breeds of dogs (e.g. do a video search on google), and generate a video output of the network.
Instructions for inputting an mp4 video to YOLO are given in the tutorial.
You can try the code in https://www.kaggle.com/code/mistag/play-videoin-notebook/notebook to display a video in the Jupyter notebook. If this doesn’t work, remember to include both the source and processed video.mp4 files in your assignment submission (zip file).