- Thesis by Nikos Andreou
- Supervisors: Prof. Nils Jansen, Dr. Marcel Neuhausen
Leveraging Multi-Stage Reasoning for Trend Detection and Cluster Identification
Artificial intelligence is driving transformative change across all domains of society and industry. For innovation-driven companies, early identification of relevant market and research trends is essential for maintaining a competitive advantage. However, trend detection is often a manual and time-consuming process. This masters thesis, conducted in cooperation with KOSTAL Automobil Elektrik GmbH & Co. KG, addresses this challenge by designing and implementing a fully automated pipeline capable of processing news articles and scientific publications from various online sources. The project consists of four interconnected modules. News and scientific content are retrieved using dynamic extraction methods. Redundant articles are identified through cosine similarity on text embeddings and merged to increase information density and reduce noise. An LLM, specifically GPT-4o, evaluates the collected content to generate signals relevant to KOSTAL. These signals are then clustered using affinity propagation to identify emerging trends. A key innovation is the use of the LLM across multiple stages of the pipeline, including content evaluation, summarization, and signal scoring. For the clustering stage, the algorithms affinity propagation, agglomerative clustering, DBSCAN, HDBSCAN, KMeans, BIRCH, and OPTICS are evaluated using Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Score. These metrics provide empirical guidance for selecting suitable methods in dynamic, unlabeled datasets. The evaluation of each module, both individually and in combination, shows that the pipeline reliably detects trends over time. However, the LLM signal ratings tend to be overly optimistic, and not all sources could be successfully scraped. While redundancy detection performed reliably, it occasionally grouped non-redundant items. Future work may explore alternative extraction methods and integrate RAG to enhance the LLM contextual knowledge. Additionally, feedback from domain experts is needed to validate the relevance of identified trends. Given the non-dynamic nature of affinity propagation, alternative clustering approaches may also be considered. Overall, the pipeline demonstrates the feasibility of automated trend detection and provides a solid foundation for supporting innovation departmentswhile human oversight remains essential for accurate interpretation and decision-making.