Thesis projects · RSTLess

If you want to request a thesis project in collaboration with the RSTLess group, compile
this Google Form⁠
!

Bachelor Thesis

⁠

Master thesis

Chunking in Retrieval Augmented Generation

Advisor: Fabrizio Silvestri

Co-Advisor: Giovanni Trappolini

Difficulty: ●○○

NLP/RAG

Retrieval Augmented Generation (RAG) systems enhance LLM performance by providing external knowledge, but their effectiveness depends heavily on how documents are chunked. Current research suggests that chunk size significantly affects retrieval quality but lacks a systematic analysis of how chunk size influences the distracting effect of irrelevant information. This research aims to quantify how different chunking strategies impact the LLM's ability to focus on relevant information while ignoring distractions in the retrieved context.

BackTracing

Advisor: Fabrizio Silvestri

Difficulty: ●○○

NLP/RAG

In many real-world applications (e.g., customer service, education, legal research), understanding the underlying reason behind a question is crucial. This thesis proposes a novel approach to a largely unexplored task in information retrieval: tracing a user's query back to its possible cause or underlying motivation — a process we refer to as "backtracing." Unlike traditional IR tasks that focus on retrieving relevant documents to answer or expand upon a query, backtracing inverts this perspective by asking: "Why was this query asked?"

Chain of Thought and Reasoning as Internal Information Retrieval

Advisor: Fabrizio Silvestri

Difficulty: ●●○

NLP/RAG

Chain of Thought (CoT) prompting enhances LLM reasoning by encouraging step-by-step thinking, while Retrieval Augmented Generation (RAG) improves performance by accessing external knowledge. This research posits that CoT and information retrieval are fundamentally similar processes: both involve accessing and utilizing relevant 'memories' - CoT retrieves learned reasoning patterns from training, while RAG retrieves explicit knowledge from external sources. Understanding this parallel could lead to unified frameworks that optimize both internal reasoning retrieval and external knowledge retrieval simultaneously.

⁠

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.