CompanyRemote

RAG Comparative POC for Knowledge Base

Posted Today

Project-Based

Description

Title: Build POC to Compare Normal RAG vs Graph RAG vs Tree RAG on Enterprise Knowledge Base

Project Summary: I need an experienced AI/LLM engineer or small team to build a Proof of Concept that compares 3 retrieval approaches on the same real knowledge base documents:

Normal RAG (vector similarity / vector DB)
Graph RAG (entity + relationship + graph traversal)
Tree RAG (page / heading / section / hierarchy-based retrieval)

The purpose of this POC is not only to make all 3 work, but to compare them fairly on the same documents and same question set, then recommend which approach works best for which question type.

Main Goal: Build a working POC that can:

ingest the same source documents
create 3 separate indexes from the same documents
answer questions using each retrieval approach
run a comparison on the same question set
generate a final evaluation report with findings and recommendation

Business Objective: We want to understand whether our agent/orchestrator should dynamically select:

the correct knowledge base
the correct retrieval strategy based on the user question.

Current Thinking / Expected Architecture: There are 2 modes in this POC.

Runtime mode For one real user question:

user asks question
orchestrator classifies question
system selects KB
system selects retrieval strategy
selected retriever fetches evidence
evidence is normalized
same foundation model generates answer with citations

POC comparison mode For evaluation:

same question is intentionally run through all 3 retrieval approaches
outputs are compared side by side
recommendation is created based on real results

Scope of Work:

Phase 1: Start with one KB only For fair comparison, begin with one knowledge base only, for example:

Document 1

Later, the design should be extendable to:

Document 1
Document 2
Document 3

Stage 0: Document Preparation and Index Building Build 3 indexes from the same source documents.

A. Vector Index for Normal RAG Expected:

document parsing
chunking with overlap
embedding generation
vector DB / vector index
metadata stored for each chunk:
source document
page number
chunk position

B. Graph Index for Graph RAG Expected:

define domain schema
identify entity types
identify relationship types
entity extraction pipeline
relationship extraction pipeline
entity linking / canonicalization
graph storage
every entity and relationship must store source-text back reference

Important: Graph retrieval must not return only triples. It must also ground results back to original source passages for answer generation.

C. Tree Index for Tree RAG Expected:

parse document structure
detect headings / subheadings / sections / pages
build hierarchy like: Document → Chapter → Section → Subsection → Paragraph / Page
store hierarchy path and source references

Important: Before Tree RAG indexing, do a document structure audit and clearly report whether the documents are suitable for tree-based retrieval.

Stage 1: Question Analysis and Routing Build orchestrator/routing logic with these steps in sequence:

classify question type
select KB/domain
select retrieval strategy based on:

question type
available indexes for the selected KB

Initial routing heuristics:

factual / semantic question → Normal RAG
relationship / dependency / multi-hop / comparative question → Graph RAG
section / heading / page / hierarchy question → Tree RAG
aggregation question → Graph or Tree depending on document structure, may also need post-retrieval computation

These are only initial heuristics. The POC should validate or correct them.

Stage 2: Retrieval Execution Runtime mode:

only one selected retrieval path runs

POC comparison mode:

all 3 retrieval paths run for the same question

Expected retrieval behavior:

Normal RAG:

embed user query
run vector similarity search
return top K chunks with scores and metadata

Graph RAG:

extract entities from query
perform canonicalization / entity linking
traverse graph with bounded hops
retrieve connected nodes / relationships
ground all results back to source passages
optional hybrid retrieval support is a plus

Tree RAG:

match query against hierarchy
navigate headings / section titles / page references
return section text + hierarchy path + page references

Stage 3: Evidence Normalization Create a common evidence schema for all 3 approaches.

Every retrieved item should be normalized into a structure containing:

source document
location in document
retrieval method
confidence / relevance score
retrieved text

Reason: The generation layer and evaluation layer must consume a common structure regardless of retrieval method.

Stage 4: Answer Generation Use the same foundation model and same generation policy across all 3 approaches.

Important: For fair comparison, keep fixed:

same FM / LLM
same prompt template
same temperature
same max tokens
same evidence injection style

Answer must include citations based only on retrieved evidence.

Stage 5: Logging and Metadata For every run, capture:

KB selected
retrieval method selected
retrieved evidence
retrieval latency
generation latency
confidence / relevance details
citations returned

Stage 6: POC Evaluation Harness Build evaluation mode where the same tagged question set runs across all 3 approaches.

Question set:

around 30 to 50 questions
based on real use cases
tagged by question type:
factual
multi-hop
comparative
section-reference
aggregation

Evaluation metrics:

answer accuracy
retrieval relevance
citation quality
faithfulness / grounding
completeness
hallucination
latency
implementation effort
maintenance complexity

Nice to have:

recall measured on a labeled subset
automated scoring helpers
evaluation dashboard or comparison sheet

Final Deliverables:

Working POC codebase
Setup / run instructions
Ingestion pipeline for all 3 index types
Runtime routing flow
POC comparison harness
Sample outputs for all 3 approaches
Evaluation matrix / comparison sheet
Final recommendation report including:

strengths and weaknesses of each approach
best approach by question type
whether dynamic KB + RAG routing is justified
suggested production architecture direction

Technical Expectations: Freelancer should have strong experience in:

Python
LLM / RAG systems
vector databases
graph databases / Neo4j or equivalent
document parsing / PDF processing
evaluation of GenAI systems
prompt design for evidence-grounded answering

Preferred experience:

Graph RAG
hierarchical / tree-based retrieval
Bedrock / Azure OpenAI / OpenAI APIs
LangChain / LlamaIndex / custom pipelines
citation-grounded QA systems

What I Need in Proposal: Please include:

Relevant similar work you have done
Your suggested technical stack
How you would implement all 3 approaches
How you would ensure fair comparison
Estimated timeline
Estimated budget
Key risks / assumptions
Example of deliverables you would provide

Project Success Criteria: The project is successful if:

all 3 retrieval approaches work on the same document set
outputs can be compared fairly
evaluation clearly shows where each approach performs well or poorly
final recommendation is backed by data, not theory

Important Notes:

This is a POC, not a production system
correctness of comparison matters more than UI polish
clean architecture and clear evaluation matter a lot
documentation is important Budget: INR 4000–6000 Skills: Python, Machine Learning (ML), Natural Language Processing, Large Language Model, LangChain, Vector Databases

Skills

OpenAINeo4jPythonAILangChainVector DatabasesMLLLMNatural Language ProcessingMachine LearningMachine Learning (ML)AzureLarge Language Model

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching

Description

Skills

Similar assignments

Senior Python AI Engineer

LLM-RAG Agent Integration

Senior ML/GenAI Engineer

AI/ML Model Validation Researchers

Senior ML Solutions Architect - Token Factory