CompanyRemote

RAG Comparative POC for Knowledge Base

Project-Based

Description

Title: Build POC to Compare Normal RAG vs Graph RAG vs Tree RAG on Enterprise Knowledge Base

Project Summary: I need an experienced AI/LLM engineer or small team to build a Proof of Concept that compares 3 retrieval approaches on the same real knowledge base documents:

  1. Normal RAG (vector similarity / vector DB)
  2. Graph RAG (entity + relationship + graph traversal)
  3. Tree RAG (page / heading / section / hierarchy-based retrieval)

The purpose of this POC is not only to make all 3 work, but to compare them fairly on the same documents and same question set, then recommend which approach works best for which question type.

Main Goal: Build a working POC that can:

  • ingest the same source documents
  • create 3 separate indexes from the same documents
  • answer questions using each retrieval approach
  • run a comparison on the same question set
  • generate a final evaluation report with findings and recommendation

Business Objective: We want to understand whether our agent/orchestrator should dynamically select:

  • the correct knowledge base
  • the correct retrieval strategy based on the user question.

Current Thinking / Expected Architecture: There are 2 modes in this POC.

  1. Runtime mode For one real user question:
  • user asks question
  • orchestrator classifies question
  • system selects KB
  • system selects retrieval strategy
  • selected retriever fetches evidence
  • evidence is normalized
  • same foundation model generates answer with citations
  1. POC comparison mode For evaluation:
  • same question is intentionally run through all 3 retrieval approaches
  • outputs are compared side by side
  • recommendation is created based on real results

Scope of Work:

Phase 1: Start with one KB only For fair comparison, begin with one knowledge base only, for example:

  • Document 1

Later, the design should be extendable to:

  • Document 1
  • Document 2
  • Document 3

Stage 0: Document Preparation and Index Building Build 3 indexes from the same source documents.

A. Vector Index for Normal RAG Expected:

  • document parsing

  • chunking with overlap

  • embedding generation

  • vector DB / vector index

  • metadata stored for each chunk:

  • source document

  • page number

  • chunk position

B. Graph Index for Graph RAG Expected:

  • define domain schema
  • identify entity types
  • identify relationship types
  • entity extraction pipeline
  • relationship extraction pipeline
  • entity linking / canonicalization
  • graph storage
  • every entity and relationship must store source-text back reference

Important: Graph retrieval must not return only triples. It must also ground results back to original source passages for answer generation.

C. Tree Index for Tree RAG Expected:

  • parse document structure
  • detect headings / subheadings / sections / pages
  • build hierarchy like: Document → Chapter → Section → Subsection → Paragraph / Page
  • store hierarchy path and source references

Important: Before Tree RAG indexing, do a document structure audit and clearly report whether the documents are suitable for tree-based retrieval.

Stage 1: Question Analysis and Routing Build orchestrator/routing logic with these steps in sequence:

  1. classify question type
  2. select KB/domain
  3. select retrieval strategy based on:
  • question type
  • available indexes for the selected KB

Initial routing heuristics:

  • factual / semantic question → Normal RAG
  • relationship / dependency / multi-hop / comparative question → Graph RAG
  • section / heading / page / hierarchy question → Tree RAG
  • aggregation question → Graph or Tree depending on document structure, may also need post-retrieval computation

These are only initial heuristics. The POC should validate or correct them.

Stage 2: Retrieval Execution Runtime mode:

  • only one selected retrieval path runs

POC comparison mode:

  • all 3 retrieval paths run for the same question

Expected retrieval behavior:

Normal RAG:

  • embed user query
  • run vector similarity search
  • return top K chunks with scores and metadata

Graph RAG:

  • extract entities from query
  • perform canonicalization / entity linking
  • traverse graph with bounded hops
  • retrieve connected nodes / relationships
  • ground all results back to source passages
  • optional hybrid retrieval support is a plus

Tree RAG:

  • match query against hierarchy
  • navigate headings / section titles / page references
  • return section text + hierarchy path + page references

Stage 3: Evidence Normalization Create a common evidence schema for all 3 approaches.

Every retrieved item should be normalized into a structure containing:

  • source document
  • location in document
  • retrieval method
  • confidence / relevance score
  • retrieved text

Reason: The generation layer and evaluation layer must consume a common structure regardless of retrieval method.

Stage 4: Answer Generation Use the same foundation model and same generation policy across all 3 approaches.

Important: For fair comparison, keep fixed:

  • same FM / LLM
  • same prompt template
  • same temperature
  • same max tokens
  • same evidence injection style

Answer must include citations based only on retrieved evidence.

Stage 5: Logging and Metadata For every run, capture:

  • KB selected
  • retrieval method selected
  • retrieved evidence
  • retrieval latency
  • generation latency
  • confidence / relevance details
  • citations returned

Stage 6: POC Evaluation Harness Build evaluation mode where the same tagged question set runs across all 3 approaches.

Question set:

  • around 30 to 50 questions

  • based on real use cases

  • tagged by question type:

  • factual

  • multi-hop

  • comparative

  • section-reference

  • aggregation

Evaluation metrics:

  • answer accuracy
  • retrieval relevance
  • citation quality
  • faithfulness / grounding
  • completeness
  • hallucination
  • latency
  • implementation effort
  • maintenance complexity

Nice to have:

  • recall measured on a labeled subset
  • automated scoring helpers
  • evaluation dashboard or comparison sheet

Final Deliverables:

  1. Working POC codebase
  2. Setup / run instructions
  3. Ingestion pipeline for all 3 index types
  4. Runtime routing flow
  5. POC comparison harness
  6. Sample outputs for all 3 approaches
  7. Evaluation matrix / comparison sheet
  8. Final recommendation report including:
  • strengths and weaknesses of each approach
  • best approach by question type
  • whether dynamic KB + RAG routing is justified
  • suggested production architecture direction

Technical Expectations: Freelancer should have strong experience in:

  • Python
  • LLM / RAG systems
  • vector databases
  • graph databases / Neo4j or equivalent
  • document parsing / PDF processing
  • evaluation of GenAI systems
  • prompt design for evidence-grounded answering

Preferred experience:

  • Graph RAG
  • hierarchical / tree-based retrieval
  • Bedrock / Azure OpenAI / OpenAI APIs
  • LangChain / LlamaIndex / custom pipelines
  • citation-grounded QA systems

What I Need in Proposal: Please include:

  1. Relevant similar work you have done
  2. Your suggested technical stack
  3. How you would implement all 3 approaches
  4. How you would ensure fair comparison
  5. Estimated timeline
  6. Estimated budget
  7. Key risks / assumptions
  8. Example of deliverables you would provide

Project Success Criteria: The project is successful if:

  • all 3 retrieval approaches work on the same document set
  • outputs can be compared fairly
  • evaluation clearly shows where each approach performs well or poorly
  • final recommendation is backed by data, not theory

Important Notes:

  • This is a POC, not a production system
  • correctness of comparison matters more than UI polish
  • clean architecture and clear evaluation matter a lot
  • documentation is important Budget: INR 4000–6000 Skills: Python, Machine Learning (ML), Natural Language Processing, Large Language Model, LangChain, Vector Databases

Skills

OpenAINeo4jPythonAILangChainVector DatabasesMLLLMNatural Language ProcessingMachine LearningMachine Learning (ML)AzureLarge Language Model

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching