Data Narratives & LLM Benchmarking
Description
Budget: ₹600 - ₹7000
I have a collection of purely numerical data and I want to turn those rows and columns into clear, decision-driving stories. The plan is to generate the narratives, then let a large-language model act as an independent judge that scores those stories against insights produced by more traditional statistical analysis.
What I need from you First, help me tighten the problem statement so the research goals are unambiguous. From there, design and code an end-to-end pipeline—Python is fine—that:
• ingests numerical data, • produces narrative text (prompt-engineering or template-based, whichever yields stronger results), • feeds both the narrative and the raw stats into an LLM “judge,” • captures the judge’s decisions alongside classical metrics (accuracy, MAE, R², or similar), and • outputs a concise statistical report that shows where the LLM agrees or disagrees with the baseline.
Automation matters. I want the entire judging cycle triggered by a single command or API call so that new data drops straight through the process without manual work. A short README that lets me reproduce results locally will be the final checkpoint.
Acceptance criteria
- A refined problem statement delivered as a living document.
- Reproducible code (Python, Pandas, scikit-learn, LangChain/OpenAI or similar) that runs on sample data I provide.
- A metrics table and visual summary that quantify the LLM judge’s performance against the traditional analysis.
- One-click (or single-command) execution proving the automation.
Skills
Want AI to find more roles like this?
Upload your CV once. Get matched to relevant assignments automatically.