Explainable Deepfake Detection

With advances in deep learning and increasing computing power, deepfakes are becoming more and more realistic. They are often hard to distinguish from real data and can be generated for different media like video, audio, or text. Given their potential harm, like disinformation campaigns, online harassment, or fraud, deepfake detection becomes an increasingly important task. State-of-the-art detection tools focus on confidence scores and do not provide information about the features on which the output score is based. These isolated scores are not always helpful, and highlighting essential regions would support human media assessment.

The thesis aims to investigate interpretability strategies of deep learning models to provide feedback about a model’s decision to detect deepfake. For this, first, deepfake detection methods should be reviewed and implemented. In a second step, the trained models are used to build a recognizer that returns saliencies based on the model’s decision to provide feedback for human reviewers.

Required Knowledge

  • Python programming skills
  • Knowledge about Tensorflow/Pytorch or Neural Networks is a plus, but not necessary