Volume 488 - International Symposium on Grids & Clouds (ISGC2025) (ISGC2025) - Artificial Intelligence (AI)
Leveraging Knowledge Graph-Enhanced RAG and LLMs for Historical Archival Analysis: A Case Study of State of Maryland's Dataset Collections
R.K. Gnanasekaran*, R. Marciano and C. Haley
*: corresponding author
Full text: pdf
Published on: October 20, 2025
Abstract
Integrating Artificial Intelligence in digital humanities has created unprecedented opportunities for analyzing historical archives. Building upon established work with Maryland State Archives (MSA)' Legacy of Slavery (LoS) collections, this research proposes an innovative approach combining Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) with Large Language Models (LLMs) to analyze three dataset collections: Certificates of Freedom, Domestic Traffic Advertisements, and Manumissions. These collections are historically rich and focused on uncovering the narratives of individuals who resisted enslavement in Maryland, USA. This project introduces a novel architecture that enhances traditional Generative AI RAG systems by incorporating prompt-engineered reasoning over a knowledge graph instead of relying on vector-based semantic similarity. Unlike conventional RAG approaches that embed user queries and documents into a shared vector space for retrieval, this system uses structured Cypher queries generated via prompt templates to interact directly with a Neo4j-based knowledge graph. This design allows for precise symbolic reasoning over richly interconnected historical data, enabling nuanced natural language exploration without the need for approximate embedding-based matching. The system employs a three-layer architecture: a knowledge graph layer mapping relationships between entities across collections using Neo4j, an RAG layer augmented through prompt-driven Cypher generation and contextual retrieval, and an LLM layer for natural language synthesis based on grounded graph responses. This study builds upon earlier iterations of ChatLoS—a simple RAG chatbot and an agentic CSV-based version—by structurally transforming the retrieval method to support cross-collection linkage and entity-aware responses. Rather than stacking redundant LLM layers, each iteration addresses specific limitations with simple RAG, and csv-agent AI. By eliminating the need for specialized database knowledge or understanding of archival organization systems, the interface significantly improves accessibility which is the main goal of the LoS project. Additionally, this prompt-engineered KG-RAG architecture advances AI-enabled scientific workflows by leveraging specialized prompt engineering patterns for cross-collection analysis and by preserving the interpretability and provenance of historical evidence. It enhances the trustworthiness and accuracy of insights by grounding responses in verified relationships rather than probabilistic approximations. While the system may at times surface structured results such as counts or connections, these outputs are semantically rich indicators of deeper historical narratives, enabling natural language interactions that democratize access to complex archives. To assess the utility and usability of these tools from a domain expert’s perspective, a qualitative user study was conducted with the Director of the LoS project. The study revealed key themes: current tools at MSA are rigid and siloed, requiring users to have detailed schema knowledge; in contrast, ChatLoS significantly lowers access barriers by supporting natural language queries and conversational refinement. The KG-RAG version was particularly praised for its ability to trace individuals across legal and commercial records and to enhance trust via explainable connections and citations. This study concludes by listing down the limitations and proposing future participatory evaluations with descendant communities and public-facing users, to ensure the design and deployment of AI tools for archival research are culturally responsive, ethically grounded, and historically contextualized.
DOI: https://doi.org/10.22323/1.488.0002
How to cite

Metadata are provided both in article format (very similar to INSPIRE) as this helps creating very compact bibliographies which can be beneficial to authors and readers, and in proceeding format which is more detailed and complete.

Open Access
Creative Commons LicenseCopyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.