Use genetic programming to perform symbolic regression. Your goal is to find the symbolic algebraic expression of the form y=f(x) best fits a set of 1000 (x,y) pairs. Assume only algebraic operators (+, –, ´, ¸), sine and cosine, and as terminals assume real constants (in the range ±10) and the variable x. The program can be written in any language of your choice.
Run your program on the dataset data.txt
Use any representation, variation operators, selection mechanism you like. You may reuse your code from assignment #1. This assignment is individual.
A week ahead of the deadline, submit single-page result from random search.
Hand in a PDF report containing:
- Cover page, with Your name, UNI, Course name and number, instructor, Date Submitted, Grace hours used and grace hours remaining.
- Results page with one figure showing dataset and best fit curve and error metric (mean average error), as well as the analytical solution found
- Methods (2 pages max)
- A brief description of the representation, variation operators and selection process you used for your implementation.
- Analysis of what worked and what didn’t, and why
- Performance plot:
- Learning curves (fitness vs. evaluations) averaged on at least four runs, with error of the mean, for each approach tested
- Baseline curves for comparison (hillclimber, random)
- Any other diagnostic plot you choose
- Appendix: Listing of all the code you wrote (do not include code you did not author yourself, for example external libraries or auto-generated code)
- Use Courier font, size 8, single line spacing, highlight function declarations
This assignment has a maximum of 100 points. Grading of this assignment is incremental: You collect points for reaching various goals, and you can choose which goals you want to meet. If you accumulate more than 100 points, only 100 points will be recorded as the final grade for this assignment. You can get a maximum of five points for each of following tentative rubrics:
- Cover page includes all information
- General quality of the report (grammar, layout)
- Result page showing information requested
- Code included (8pt courier single spacing)
- Random search submitted a week ahead of deadline
- Dot plot for any one of the methods
- Diversity plot for any one of the methods
- Convergence plot for any one of the methods
- Plot showing accuracy vs complexity (of all evaluations
- Description of representation used
- Description of random search
- Description of hill climber
- Description of EA variation operators used
- Description of EA selection methods used
- Analysis of performance. Did it work? Why or why not?
- Learning curve of random search
- Learning curve of hill climber
- Learning curve of GP
- Learning curve of some variation of the GP
- learning curves clearly labeled, labeled axes
- learning curves have error bars
- Overall correctness of the result
- Overall efficiency of the algorithm (accuracy versus number of evaluations)
- Simpler problem(s) tested for debugging
- Automatically draw tree representing best solution
- Show video where every frame is data point and best function found so far (include link to video online in the PDF, along with a frame from the video)