In final project you will use reinforcement learning techniques to train an agent to play the classic
arcade game Bomberman.
In our setting, the game is played by four agents in discrete time steps. Your agent can move around,
drop bombs or stand still. Crates can be cleared by well-placed bombs and will sometimes drop
coins, which you can collect for points. The deciding factor for the final score, however, is to blow
up opposing agents, and to avoid getting blown up yourself. To keep things simple, special items
and power-ups are not available in this version of the game.
After the project deadline, we will hold a tournament between all trained agents with real prizes.
Tournament performance will be a factor in the final grade as well, although the quality of your
approach (as described in the report and code) will carry more weight.
You should develop two (or more) different models and submit your best performing model to the
tournament. Both models should be described in the report. However, do not split labor such that
every team member works on their separate model – we attach great importance to real teamwork!
This is an open project. You may use any combination of models and techniques you have learnt
during the semester. You are also free to extend your knowledge to new domains and use other/more
advanced models and techniques, although we request that at least one of your two models is limited
to techniques from the lecture. There is only one rule: Your solution must involve machine
learning or it will be rejected.
Your first submission (agent code) for the final project should consist of the following:
• A directory containing the agent code of your best performing model, including all trained
parameters. This is the subdirectory of agent_code that you are developing your agent in
(see details below).
Zip all files into a single archive final-project-agent-code.zip and upload this file to your
assigned tutor on MaMPF before the given deadline.
Note: Each team creates only a single upload, and all team members must join it as described in
the MaMPF documentation at https://mampf.blog/zettelabgaben-fur-studierende/.
Important: Make sure that your MaMPF name is the same as your name on Muesli.
We now identify submissions purely from the MaMPF name. If we are unable to
identify your submission you will not receive points for the exercise!
Your second submission (report) for the final project should consist of the following:
• A PDF report of your approach. The report should comprise at least 10 pages per team
member (and please not much more) and – for legal reasons – indicate after headings who is
responsible for each subsection.
• The URL of a public repository containing your entire code base (including all models you
developed), which must be mentioned in the report. Please do not upload your report to
You are allowed to switch groups between the homework assignments and this final project. Please
organize yourself, e.g. via #homework-team-finding, and announce your teams (incuding a catchy
team name for the tournament) at https://tinyurl.com/fml-final-project-teams.
To share resources fairly between competing agents, you are not allowed to use multiprocessing in
your final agent. However, multiprocessing or anything else that comes to your mind to improve
training is perfectly fine.
If you choose to learn and use neural networks, this is okay. Take into account that neural networks
will be executed on the CPU during official games, although you may use GPUs during training.
Also beware that your agent may not be competitive when network training takes longer than
expected and has not converged by the deadline.
Playing around with the provided framework is allowed, and may in fact be necessary to facilitate
fast training of your agent. However, be aware that the final agent code will simply be plugged into
our original version of the framework – any changes you made to other parts will not be present
Discussions about the final project with other teams are very much encouraged. You can share your
trained agents (without training code) on #final-project-beat-my-agent and download other teams’
agents to test your approach. Just keep in mind that in the tournament you will compete for prizes,
so you may want to keep your best ideas to yourself 🙂
The lectures from February 11th to 18th explained the necessary basics of reinforcement learning.
You may also watch last year’s recordings on MaMPF. In addition, the internet provides plenty of
material on all aspects of RL. Study some of it to learn more and get inspiration. The use of free
software libraries (e.g. pytorch) is allowed, if they can be installed from an official repository like
pip or conda, but you must not copy-paste any existing solution in whole or in part. Plagiarism
will lead to a “failed” grade.
You can find the framework for this project on Github: