Theses

We are always happy to work with students who want to write their thesis at our chair. On this page you will find an overview of the current thesis topics that are available for collaboration, as well as a list of topics that have already been completed. If you have a topic that fits our research profile, we would be happy to talk to you about the possibility of a collaboration.

Open Theses

Robots are increasingly used in unstructured environments, such as homes and factories, where they are required to navigate the environment reliably and efficiently. Among other tasks, mobile robots are expected to perform coverage tasks like cleaning, inspection, etc. Common metrics for coverage tasks are the time it takes to cover the area, the distance traveled, and the percentage of the area that has been covered. Current robots struggle at navigating in particularly cluttered environments, where they drive suboptimal trajectories to avoid obstacles and, in the worst case, they get stuck due to the lack of space to drive to the next goal and trigger recovery strategies to free themselves. This thesis aims to extend the coverage task with obstacle interaction, to allow the robot to push selected obstacles a few centimeters when they prevent the robot from cleaning efficiently with good coverage. Additionally, the implemented method should be integrated into a robotic platform and tested in a real-world scenario.

This thesis is a cooperation of AI-FM and Bosch. A student conducting this thesis is encouraged to have a temporarily internship at Bosch’s research department.

This project explores the problem of navigating a 2D maze with a blind robot. By “blind”, we mean that the robot cannot see walls, nor does it receive any feedback when it bumps into one. In the finite case, say an NxN grid, it is easy to program a robot to solve the maze:

  1. Enumerate all possible NxN mazes, where M_i is the i-th maze
  2. Execute the sequence of moves needed to solve M_0, let’s call this sequence w_0
  3. For maze M_{i+1}, simply execute w_0, …, w_i, to see where we end up, and now execute the word w_{i+1} that will result in reaching the goal in M_{i+1}.

It’s easy to see that executing every w_i in sequence will result in reaching the goal in every NxN maze, no matter the layout. The infinite case (where the grid extends without bounds in all four directions) is more challenging. It is generally believed that there exists an infinite word that solves every such maze, but no such word has been found so far. For more details, see Solvability of Mazes by Blind Robots by Stefan David and Marius Tiba. The goal of this thesis is to extend Stefan David and Marius Tiba’s results to the stochastic setting: Do such maze-solving words exist when there’s a random chance that a NO-OP is executed instead of the intended action?

In standard reinforcement learning (RL), an agent learns an optimal policy by interacting with an environment and collecting experience over time. However, in many practical settings the agent cannot directly observe the true state of the environment. Instead, it relies on a perception model—such as a deep neural network—to infer the state from raw observations. These perception models are inherently imperfect and can produce uncertain or erroneous predictions due to factors such as sensor noise, adversarial perturbations, or limited and potentially corrupted training data.

This thesis aims to develop methods that enable an RL agent to learn a policy that performs well in the true underlying Markov Decision Process despite such perceptual uncertainty. The idea is to incorporate techniques from uncertainty quantification for deep neural networks into the RL framework, allowing the agent to estimate and account for the reliability of its perception when updating its policy. By guiding policy learning based on these uncertainty estimates, the learning process becomes more efficient and robust to perception errors, leading to improved performance.

The sim-to-real transfer remains a major challenge in Reinforcement Learning (RL) due to the reality gap between simulated and real environments. A wide range of methods has been proposed to address this challenge.

In parallel, Causal RL (CRL) is an active field of research that introduces knowledge of cause-and-effect relationships into the RL process. Causal knowledge tends to be robust under distributional shifts, thereby enabling RL agents to improve their generalization capabilities. Since the goals of sim-to-real and CRL align, combining these two is a promising research direction. Possible research questions include:

  • How can we identify which features are causally relevant for task or transfer success in RL?
  • Can causal knowledge be used to improve the efficiency and effectiveness of existing sim-to-real methods?
  • How do causal methods compare to traditional sim-to-real approaches?
  • Which causal assumptions help RL generalization in practice?
  • How can causal representations improve out-of-distribution generalization in RL?

Depending on the research question, this topic qualifies as both a bachelor’s and a master’s thesis.

With data protection regulations such as the GDPR, the ability of machine learning models to forget specific user data has become increasingly important. While machine unlearning has gained traction in supervised learning, particularly in image classification and LLMs, its application to (D)RL remains largely unexplored.
Theses in this field investigate how to enable and motivate selective unlearning in DRL settings without requiring full retraining. The goal is to remove the influence of specific data from trained RL agents while preserving overall performance. Possible research questions include:

  • Can existing methods from Machine Unlearning be applied to RL? Do they require adaptation?
  • At which level is unlearning most effective/relevant in RL, transitions, trajectories, or environments?
  • How does unlearning impact the stability and performance of RL agents across algorithms?

Depending on the research question, this topic qualifies as both a bachelor’s and a master’s thesis.

Completed Theses

Uncertainty-Aware Perception for Reinforcement Learning Agents

This work investigates post-hoc calibration as a means of providing reinforcement learning (RL) agents with lightweight uncertainty estimation that can shape decision-making behavior. Calibration techniques were applied to a YOLO11n object detection model. While global calibration techniques showed limited improvement in a 3D grid world, per-class methods reduced the Expected Calibration Error without degrading performance. The real world utility of calibration was further evaluated on a crack-segmentation network, where calibration improved without decrease in Dice score or IoU. Furthermore, a Proximal Policy Optimization agent was trained on a 3D Godot environment with calibrated confidence scores as uncertainty estimates extending the observation input. This agent outperformed the baseline policy with a

Read More »

On Feature Space Reduction and State Space Discretization: A Pipeline for Vision-Based Reinforcement Learning

This thesis investigates feature space reduction and state space discretization in reinforcement learning for vision-based navigation tasks. The focus lies on comparing the performance of Slow Feature Analysis (SFA) and Convolutional Neural Networks (CNN) as feature extractors in the context of reinforcement learning. A Unity-based 3D environment is used to evaluate these two feature extraction methods, which reduce the egocentric input observations of an RL agent in both discrete and continuous action spaces. These experiments investigate whether SFA can be effectively applied in discrete action spaces using Deep Q-Networks (DQN) and in continuous action spaces with Proximal Policy Optimization (PPO). The results show that both SFA and CNNs can reach

Read More »

Omnigym – A versatile Framework for Reinforcement Learning Research

Reinforcement learning researchers face integration challenges when selecting tools to create experiments. The existing tooling for reinforcement learning is vast but poorly integrated across frameworks. Such gaps in the tooling can result in significant programming overhead. This thesis introduces Omnigym, a tool that bridges the gaps between prominent frameworks, including Storm, Stormvogel, Gymnasium, and MuJoCo. It brings the di!erent tools closer together by o!ering functionality to use framework-specific structures in other frameworks. Researchers can load the definition of a Markov Decision Process (MDP) from a PRISM file, convert it into Stormvogel, and train agents on it in Gymnasium environments. Continuous environments can be discretized and converted to an MDP, which

Read More »

Integrating Activation Steering into Safe Reinforcement Learning

Deep Reinforcement Learning (Deep RL) enables agents to operate in complex, high-dimensional environments, while Safe Reinforcement Learning (Safe RL) seeks to ensure policies maximize performance without violating safety constraints. This thesis investigates activation steering, a technique for influencing internal representations of trained agents, as a potential tool for promoting safer behavior in Deep RL. A review of activation steering in large language models highlights the challenge that safety-related concepts must already be represented in the agent’s activations. Empirical experiments with PPO agents, both with and without cost-based training signals, show that steering can affect reward and safety metrics, but observed changes do not reliably correspond to the intended safety concept

Read More »

Safe Transfer of RL Agents under Shifting Dynamics via Adversarial Training

Safety is a crucial part of Reinforcement Learning (RL), yet standard RL approaches fail to ensure safe agent behavior during training, which is especially problematic in real-world, safety-critical scenarios. To address this challenge, one can train a safe agent in a controlled environment, where safety violations are allowed, and af- terwards transfer it to a target environment in which safety violations may have catastrophic consequences. Prior work enables such transfer but does not account for shifting dynamics, which describe how the environment behaves over time. These real-time shifts simulate things like actuator failures, unexpected weather changes, and more. To address the problem of shifting dynamics during the transfer, there are

Read More »

Learning to Predict Danger for Safe Transfer of Reinforcement Learning Agents

Reinforcement Learning (RL) is a promising area of machine learning in which an agent interacts with an environment and learns by receiving rewards and penalties. Despite significant advances over the past decade, applying RL in safety-critical domains such as autonomous driving or robotics faces problems due to the inherent trial-and-error nature. A common solution is to train agents in simulation, where safety violations are permissible, and then transfer the learned policies to the real-world environment. An adaptation of this approach is Guided Safe Exploration, in which a guide is pre-trained in a controlled environment. This guide supports a learning student by, among other things, intervening during training in the target

Read More »