From Text Piles to Knowledge Structures: Using Graphs to Improve RAG Reasoning

The limitations of current RAGs can be overcome by structuring information into hierarchical knowledge graphs for better reasoning and reliable AI.

Conclusion

Today’s retrieval augmented generation (RAG) artificial intelligence (AI) systems often fail at reasoning because they search through disorganised text and then attempt to assemble results based on disparate fragments of information. The solution isn’t bigger search windows: it is the smarter organisation of information being made available to the AI.

By combining powerful long-context AI models with structured knowledge graphs, organisations can build more reliable AI responses. This paper details a new method that builds upon the RAG-TAG-GAG model to add pre-defined reasoning summaries into the knowledge graph used by AI. The new method is known as Hierarchical RAG (Hi-RAG), which organises information by importance, and creates AI that works across different levels of detail, providing fact-based, more trustworthy responses.

Observations

The current popular approach to enterprise AI, RAG, is hitting a wall. While useful for simple fact-finding, its method of searching through flat documents is reaching its limits in terms of accuracy and effectiveness. Even the most advanced AI models, capable of reading a million tokens at once, suffer from a Lost in the Middle problem: they struggle to find information unless it is at the very beginning or end of a document. This flaw makes them unreliable for complex tasks that require reasoning. The answer is not to feed the AI more unstructured text, or chunk (cut up) the text in more nuanced ways, but to give it a structured map of the knowledge first.

In Enhancing Large Language Models: The Power Trio of Prompt Engineering, Retrieval-Augmented Generation, and Fine-Tuning, IBRS outlined a new approach that combines large language models with knowledge graphs. This creates a hybrid system, or long context RAG, where the knowledge graph provides a structured framework for the AI’s output. A knowledge graph automatically identifies key entities (like people, products, or concepts) and the relationships between them, creating an organised overlay on top of unstructured documents. When the AI generates an answer, the graph acts as a fact-checker, validating the connections it has made.

Say Hello to Hi-RAG: Going Further with Pre-Reasoned Graphs

Over the last year, an even more advanced architecture, Hi-RAG, has expanded on the original graph approach.

Hi-RAG addresses the core challenge of reasoning across different levels of complexity for AI generation. In essence, Hi-RAG leverages a multi-layered graph knowledge index before any questions are asked. It starts at layer 0 with basic facts from documents, then clusters related facts to create summary nodes at layer 1. The layer 1 summary nodes are effectively pre-reasoned chunks of information.

For example, layer 0 (basic facts) is populated by raw, granular data extracted directly from source documents. This could include technical specifications from engineering reports, specific clauses from legal contracts, individual transaction records from a financial system, or detailed methodology sections from scientific papers.

From this foundation, layer 1 information is created by generating thematic summaries or harvesting information from real-world knowledge sources. For instance, dozens of layer 0 engineering reports could be used to create a layer 1 summary node titled “Common Stress-Bearing Flaws in Q3 Product Designs.” Similarly, it could cluster hundreds of legal clauses to form a summary node, such as “Standard Indemnification Clauses in Vendor Agreements.” In short, layer 1 provides the knowledge graph.

The layering approach can be repeated to build higher and higher levels of abstraction. Think of it as creating an automated table of contents for your entire knowledge base, from individual details up to major strategic themes.

When used as part of a RAG solution, this structure of knowledge overcomes the limitations of fragmented information. When a user asks a question, the Hi-RAG system uses the structured hierarchy to guide its reasoning. It retrieves not only the specific, low-level facts but also the high-level summary contexts. It then maps the shortest, most logical path between the granular details and the overarching concepts. This creates bridges of knowledge: predefined, fact-based reasoning steps grounded in the knowledge structure itself, rather than random fragments.

This method significantly reduces the risk of hallucination in specialised domains and ensures the information fed into the AI’s logic is transparent and follows well-mapped, traceable information sources. The RAG’s final answer is built from a rich context that includes low-level data, high-level summaries, and the explicit reasoning paths that connect them.

Building the Knowledge Base

To support this hierarchical approach, the underlying knowledge graph must be constructed with multiple layers of abstraction in mind. The initial build would focus on extracting and connecting the base entities and relationships from source texts to form layer 0. This requires robust entity extraction models to identify key nouns (products, people, regulations) and relationship extraction to define the verbs that connect them (e. g., “Product X is manufactured by Company Y”, “Regulation Z applies to Process A”).

The graph must then be designed to accommodate the creation of new, higher-level summary nodes. These are not entities that exist in the original text, but rather concepts extracted from existing documents, or – more likely – created automatically by a large language model (LLM), which is then reviewed and guided by the AI design team in conjunction with domain experts.

The graph’s schema needs to allow these summary nodes to connect to the multiple layer 0 nodes they represent, creating a parent-child structure that forms the hierarchy. This ensures that when the system retrieves a high-level concept, it can instantly traverse the graph to find all the underlying granular data that supports it.

When to Use Hi-RAG?

The Hi-RAG approach is most effective in domains characterised by high complexity and large volumes of multi-layered information. Use cases include:

  • Financial Services: For connecting financial trends to individual transaction anomalies.
  • Legal and Regulatory Compliance: For linking overarching laws to specific contract clauses and internal actions.
  • Engineer & Manufacturing: For linking product components to requirements specifications
  • Support & Services: For linking remediation recommendations to support requests and issues.

Next Steps

  • Audit Your Current RAGs: Review your existing AI RAG systems. Identify where they fail on questions that require connecting high-level strategy with operational details.
  • Pilot a Knowledge Graph: Start a small project to build a knowledge graph from a core dataset. Measure the improvement in the accuracy and reliability of AI-generated answers.
  • Model a Complex Domain: Select a well-understood area of your business, such as regulatory compliance or product specifications, and test a hierarchical indexing approach to see how it improves reasoning.
  • Invest in New Skills: Focus your teams on developing skills in knowledge engineering and graph databases. The ability to structure information for AI will soon become a major competitive advantage.

Trouble viewing this article?

Search

Register for complimentary membership where you will receive:
  • Complimentary research
  • Free vendor analysis
  • Invitations to events and webinars
Delivered to your inbox each week