Task and Mark distribution:
For this coursework, you are required to select a real-world problem of your choice and apply various
machine learning algorithms and methods to solve the selected problem.
Your first task comprises the following:
- Select a real-world classification problem
- Select suitable dataset(s) for the chosen problem
- Select more than one appropriate Machine Learning algorithm for implementing the models
- Evaluate the created models on the selected data
- Tune the models to achieve better performance
- If you are attempting this assessment as a resit, you must choose a different problem and dataset(s) from those you used for your previous attempt(s)
- You are advised to choose a dataset that allows you to demonstrate your ability to perform data analysis and pre-processing techniques such as handling missing, categorical, non-numeric and duplicate values; outliers; scaling; etc. The selected dataset must contain at least 1200 samples, after pre-processing. The dataset cannot be one of the scikit-learn or synthetic datasets.
- If you are not sure where to start, you may find a list of suggested resources with numerous problems and datasets in the “Useful Resources” section on Aula.
- You can use existing algorithms or a combination of some of them, or even come up with a new algorithm of your own.
- The required programming language is Python 3 (others are not accepted).
For the second task, you are required to submit a demonstration video recording the execution and performance of your implementation
- The maximum length of the demonstration video is 5 minutes
- You are NOT required to walk through every line of the source code, but it is important to demonstrate the execution of all stages and the corresponding outputs of the source code
- Voice over the video should be used to describe what is happening and some of the reasoning throughout the video
- Ensure that all texts, tables, graphs, etc. are of an appropriate size to view, free from noise and not blurred. Also, ensure that the audio is clear.
- You are required to use either Jupyter Notebook on a browser or Visual Studio Code when recording the demonstration video.
Write a report (maximum 2000 words) based on the technical work. This should include:
- Analysing and pre-processing the data
- Applying different algorithms and methods to build learning models
- Making appropriate adjustments to improve the models’ performances
- Evaluating the models
- Comparing the approaches and results of other existing pieces of work on the same problem
- Your reports should focus on how algorithms/methods/techniques are actually applied or developments that are novel and specific to your work rather than how they work theoretically
- Your report should include appropriate outcomes such as data analysis diagrams, outcomes from the models, code snippets, etc. to support your text.
- Include all your source code as text in Appendix B at the end of the report. Do not use screenshots of your code in Appendix B Your code muse be presented as text (see coursework template).
- A course work template is provided as a guide in “Assessment” section on Aula
- The 2000-word limit is the absolute maximum word count for the whole report. Reports that are more than 10% over the word limit will result in a reduction of 10% of the marks e.g., a mark of 60% will be reduced by 6% to 54%. The word limit includes quotations, but excludes the (GitHub, datasets, OneDrive) URLs, bibliography, reference list, and appendices (see coursework template)
The submission of your coursework must be in the form of ONE Word file through the indicated Aula submission link. Other formats (other than MS Word) will not be accepted.
The submission of the implementation and demonstration video must be in the form of:
- A URL of Coventry GitHub Repository, OR
- A URL of Coventry OneDrive shared folder
The URL must be included at the beginning of your report
**Examiners will not check for the required URLs in other places
The shared folder or repository must be accessible by examiners, and should include the following:
- The URL to the selected dataset(s) in README or a separate file
- The dataset(s) that are used for your problem
- The source code with appropriate comments, and
- The demonstration video
- No other platform is accepted. Please ensure that it is COVENTRY GitHub or COVENTRY OneDrive
- The submitted source code must be in the form of a (Jupyter) notebook (.ipynb)
- You must ensure that you commit your work appropriately (with the corresponding outputs of all cells – if applicable – clearly present)
- Only include the notebook for your conduct (i.e., remove all draft notebooks)
- The following naming convention must be used for your repository or shared folder:
For example, a student Liz Truss whose student ID is 12345678 would name their repository or shared folder as 12345678-LT-s1
- A failure to use this naming convention may delay the release of marks and feedback for your coursework.