Individual Thesis Ideas
You already have a thesis topic in mind?
Contact us with your idea: firstname.lastname@example.org
Resilient systems require guaranteed functionality even with an unreliable power supply. To ensure a consistent system state, systems need to detect supply outages in time and handle these accordingly.
This requirement is essential when systems are equipped with non-volatile main memory (NVRAM). With NVRAM, the memory’s content remains even if the power supply is interrupted. This allows systems with NVRAM to continue computations across system shutdowns without having to repeat them. Therefore, to ensure that the system can resume its execution after a power outage, the data has to be kept consistent.
Previous approaches to ensure data consistency utilize transactional systems or continuous checkpointing. Our research project NEON aims to utilize NVRAM’s inherent persistency to minimize the remaining volatile state (e.g., CPU registers). This allows for faster checkpointing and, thereby, checkpointing only on dropping power supply. Thereby, we reduce the overhead of maintaining a consistent system state.
To implement this, we build upon our concept of a Powerfail Interrupt (PFI). As part of this thesis, the PFI will be introduced into the Linux kernel as a new interrupt. Additionally, the interrupt service routines (ISRs) to initiate the checkpointing routine will be developed. The PFI will be issued by a newly introduced kernel module that detects power outages.
All theses are currently already assigned.
Goal and Approach
Software mitigations for hardware bugs like Spectre and Meltdown often degrade the performance and energy efficiency of systems. Currently, this penalty has to be paid by all software running on a system, regardless of whether it processes sensitive data or not. Building on top of an existing Linux patch set that allows to toggle mitigations at runtime, this thesis will build a framework that dynamically enables mitigations as they are required, reducing the impact of the mitigations while retaining security.
The framework will consist of two components: A system service and a client library. The client will library allow programmers to indicate which parts of their code process sensitive data and thus need to be protected by mitigations and thus need to be protected, signaling the current mitigation requirements to the system service. The system service will allocate the information indicated by the running applications to select and enable the ideal set of mitigations for the current workload.
Speculative side-channel attacks like Spectre affect a wide range of modern processors. The various introduced mitigations against them are often associated with overheads. These are usually solely measured in performance metrics. But given the climate crisis on our planet and the ongoing shift to ever more cloud based services, the impact on energy consumption caused by such mitigations is becoming increasingly important. For this reason, this thesis aims to address the following research questions for all currently available Spectre variant 1 & 2 mitigations:
- Where are differences in overheads between widely used processors from Intel and AMD?
- Which part of the mitigations introduce the majority of observable overheads?
- How do performance and energy overheads relate to each other?
In order to answer these questions, two test systems based on an Intel and AMD processor are tested for both energy and performance overheads. The test procedure consists of microbenchmarks and simultaneous energy measurements of as much as possible separated mitigation parts.
The results reveal that the tested AMD processor is less affected by Spectre-V1, though similarly affected by Spectre-V2 as the Intel processor. Depending on the benchmark, the individual mitigations show varying degrees of overhead. The illustrations of chapter four provide orientation to decide which mitigations are appropriate for a specific application, weighing overheads against security needs. Moreover, performance and energy overheads mostly correlate, but are not always completely congruent. Which implies that additional testing of mitigations in regards to potential energy overheads can help to identify further optimization potential and to make competent application related decisions on the mitigations to be enabled.
Modern CPUs utilize simultaneous multithreading (SMT) to increase computational performance. This feature reduces the idle time of physical CPU cores by following two threads, also called logical cores, to share a physical core. Implementations of SMT rely on microarchitectural buffers, which logical cores share. Various nuanced ways were discovered to exploit this shared circuitry in order to break the confidentiality of processed data, namely the vulnerabilities coined microarchitectual data sampling (MDS). A perfect mitigation to these exploits includes disabling SMT, which would in most cases decrease a system’s performance significantly and in turn increase energy consumption, thus proving unfeasible most of the time. There exists a tradeoff between a system’s security and efficiency in respect to distinguished scenarios, which differ in applied mitigations, workload and overall system configuration. Choosing a feasible tradeoff blindly among this variety is not trivial and therefore not advisable.
This choice should be based on a comprehensible investigation adapted to the specific scenario a systems administrator is facing. Therefore this thesis presents a framework to enable execution of these sets of scenarios, take measurements and allow for easy comparison thereof in regard to the impact they have on the energy efficiency of a system or application. The framework should provide guidance to a systems administrator.
The framework is used for an examination of micro- and macro-benchmarks to give insight
into the energy efficiency of different scheduling methods. This includes the utilization of the core scheduling (CS) feature, a partial mitigation for the MDS vulnerability, and the abdication of simultaneous multithreading (SMT). Furthermore the analysis of the data generated with the framework raises questions about the sanity of the method of data collection, which are discussed and ultimately contribute to a revision of the framework’s method of data collection.
The findings show that the utilization of the core scheduling (CS) feature, while not fully mitigating the vulnerability, is generally a significantly more energy efficient method to reduce the attack surface for MDS vulnerabilities than disabling SMT for non-idle systems. While the analysis of this thesis does not cover the whole spectrum of mitigation variations, the implemented framework is capable of being extended to respect them.
In the wake of climate change and other issues affecting the energy sector, the production and consumption of energy has become a prominent topic of discussion. Solutions for both conserving energy and preserving the stability of energy supply, while restructuring the energy production, are sought after. These changes also impact the area of computing, with the range of energy consuming hardware in this area spanning from small low-power devices like smartphones to large infrastructure like data centres. The demand for inspecting and optimising the energy efficiency of the employed computer systems and the software running on them steadily increases.
Due to the increasing demand for energy-efficient computing there is a need to not only optimise for energy efficiency on the hardware level, but also on the software side. To be able to analyse and then optimise a software in terms of energy efficiency detailed measurements of its energy usage are required. This relies on the availability of hardware that provides the energy-demand data for the parts of the system used when the software is executed. To efficiently optimise software it is key to be able to quickly identify the parts of the software that lead to high resource usage, according to the used metric. An approach to this is to measure the used resources per function of the software.
The problem this thesis tries to address is enabling such optimisations in regard to the energy
usage of software, by creating visualisations that display the energy usage of each function of the software. The aim is to have new options for energy demand measurement and visualisation similar to those that were already available for other metrics, like the runtime CPU usage of software.
The scope of this thesis is to provide these measurements and visualisations for software on Linux systems. To accomplish this, methods to collect energy measurements on Linux are investigated. With the chosen method, tools are created to collect the required energy data and information about the software that is analysed. In addition, tools are created to process this data and to create visualisations from it. The chosen method for the energy measurements is querying event counters provided by the Running Average Power Limit (RAPL) CPU feature using the perf_events Linux kernel subsystem.
The created tools gather the current stack-frames of the profiled program at user-specified energy usage thresholds. The visualisation type chosen to create are Flame Graphs, as these are suited for the task of displaying resource usage of functions. Therefore, part of this thesis was the development of scripts to transform the gathered stack-frames into the input format for Flame Graphs. With the transformed data Flame Graphs displaying energy-demand per function can be created with an unmodified Flame Graph implementation.
With the results of this work it is possible to profile Linux software in terms of energy demand and create Flame Graphs displaying the energy demand per function. This allows quickly identifying which functions are good targets for optimising to reduce their energy usage.
Student: Henriette Hofmeier (handed in on 24.06.2022)
Providing secure systems, for example, in computing centers, is an essential task of service providers. Vulnerabilities threatening secure execution are not only located in defective software but can also be found in the hardware itself. Hardware vulnerabilities, such as Spectre, Meltdown, and Microarchitectural Data Sampling, pose significant security concerns and can leave systems vulnerable to attacks extracting privileged information. As these vulnerabilities are often unfixable for already deployed hardware, software developers in general and operating system developers, in particular, go to great lengths to mitigate these attacks. These mitigations typically come with significant performance overheads, especially if speculative execution has to be restricted. Due to differences in the data they handle and security concerns, in general, processes may require varying degrees of
protection. Thus, mitigations may only be required for short time spans or individual processes – if at all. The mainline Linux kernel does not offer run-time control for all mitigations. For some exploits, multiple mitigations may be available that differ in the underlying protection mechanism and in their overhead on performance and efficiency.
This thesis presents dynamic reconfiguration of hardware-vulnerability mitigations in the Linux kernel. By adapting the mitigation configurations to the current workload, the system’s performance and energy efficiency can be optimized as the overhead of unnecessarily enabled mitigations is removed. Also, if multiple protection mechanisms are available for a specific vulnerability, a reconfiguration service determines the optimal configuration depending on workload characteristics, hardware, and system state. Dynamic control of mitigations that are only configurable at boot time in the mainline kernel is provided by a kernel module and kernel extensions. By utilizing code patching at run time, mitigations are omitted from the control flow if the respective mitigation is disabled. Combined, the service, kernel module, and kernel extensions provide the dynamic reconfiguration of hardware-vulnerability mitigations. The evaluation shows that using dynamic reconfigurations and adapting mitigation configurations to the system state has the potential to improve the system’s efficiency significantly. The evaluation also shows that the design and implementation of dynamic reconfiguration of hardware-vulnerability mitigations, as presented in this thesis, can be integrated into the Linux kernel with only minimal run-time overhead. Thus, this thesis provides the means for future research into the optimization of hardware-vulnerability mitigation reconfigurations.