Seminar Details
2026-02-26 (10:45) : Applying Generative AI and NLP to Historical Texts
At BARB94
Organized by LINFO2399 - Industrial Seminar in Computer Science
Speaker :
Xavier Gillard (UCLouvain)
Abstract :
The presentation introduces the Arkey project, a long-term applied research collaboration between UCLouvain and the Belgian State Archives. It aims at improving how users (archivists, researchers and the public) interact with archival collections through computational methods. A central component is a comparative study of Named Entity Recognition on noisy XVth-XVIIth century texts, contrasting general-purpose Large Language Models with specialiwed, fine-tuned encoders such as XLM-RoBERTa. Using a mixed evaluation framework that combines standard NLP metrics with a human preference study, the project shows that expert users prioritize the factual accuracy of smaller specialized models more than the fluent but sometimes hallucinatory outputs of LLMs.
The talk also presents “Ask Agatha,” an agentic retrieval-augmented generation 5RAG) system developed for the national archives. We demonstrate the necessity for moving beyond simple RAG pipelines toward a stateful, graph-based agent architecture capable of complex, multi-step tool interactions. We detail experiments such as cascade patterns for efficient tool use and the development of “digita,” a fine-tuned 8B model designed to emulate the archivists’ expert communication style. The presentation concludes by synthesizing lessons learned on model specialization, agentic architecture, and evaluation strategies, illustrating the solutions developed to meet the needs of a specialized expert domain.
