AI LLM (opensource on local Server) for PDF data analysis
Description
Budget: €250 - €750
NO PLACEHOLDERS. Do not insert a placeholder: please read the project details carefully because here are all the details you need to make an accurate estimation.
I'm seeking an experienced AI LLM Consultant with a proven track record in developing and implementing Artificial Intelligence solutions based on Large Language Models (LLMs).
The aim is to create
- a system capable of reading large PDF files (>300 pages) containing tables (see attachment). PDF files contain text only. The tables differ in terms of the data they contain, their structure, the number of fields, the number of rows and the subject matter covered;
- an LLM system that can be deployed locally (on-premise) on Windows servers (no WSL - Windows Subsystem for Linux) WITHOUT a GPU, PRE-TRAINED in the financial sector (Mistral, FinBERT, Bloom, FinGPT, BloombergGPT, etc.) and capable of analysing the data read from the aforementioned PDF files;
- a method of access both via web (such as a Open WebUI) but primarily via APIs (to be written in.py or the language of your choice)
Accuracy and the absence of hallucinations are very important
Ensure the system operates fully offline, independent of any external cloud services. The idea is to use open-source tools (such as Ollama) and integrate models that meet our requirements. As regards Ollama (that is not a mandatory choice), it has been found that for some analyses it would be preferable to use cloud-based models rather than local ones, as they provide faster response times.
IF the model requires fine-tuning, it is important to agree on a training plan. Please pay attention on environment performances.
----- Tests already completed ----- Regarding data read from PDFs and analysing it, much of preliminary tests were carried out using free online models/services, and the results were as follows:
- GPT-4o fails
- GPT-5-mini fails
- Mistral fails
- AI Studio (Google) OK
- Deepseek-chat-3.2 fails
- scispace.com OK
- super.myninja.com OK Additional tests were done using Python to read the tables in the PDF using specialised libraries, but with poor results.
----- What is needed to start? ----- To best evaluate your application and technical approach, please create a document (WORD/PDF) addressing the following points:
- Required Software: List the essential software components (runtimes, libraries, specific tools) that would be necessary to implement and operate the on-premise LLM system.
- Proposed Architecture Design: 2 pages within a preliminary schema/design of the architecture to be implemented, specifying the main components (logical units, databases, applications, virtual machines, etc.). Please note that considerations for High Availability (HA), Disaster Recovery (DR), or Load Balancing are not required.
- how improve answer accracy and reduce allucinations.
- A detailed activity plan (day/task) as part of your proposal.
- IMPORTANT: given the specific nature of the project and the lack of expertise among many consultants, we require a commitment to carry out a proof of concept (PoC) as a PRELIMINARY step prior to the project being awarded.
----- How will it be used? ----- Mainly via API/WebService: applications (VB/ASP/ASP.NET) will collect requests from the connected user and request answers from local AI service via APIs. However, a web interface (such as ChatGPT, Claude,...) is required to make impromptu requests.
----- Collaboration ----- The consulting engagement will commence as soon as possible and will be conducted full-remote. Participation in daily update meetings is mandatory and non-negotiable. Failure to adhere to this requirement will result in immediate project disengagement, without exception. Expected go-live: end of April. An annual maintenance contract will be provided after the go-live.
----- Budget ----- Please, no ask me «what is your budget for this project?» Bonuses provided at the end of the project for compliance with the timing and quality of the results. No upfront. No payment before successful completion of all final test. The PoC should be considered included, as a demonstration of one’s skills. The proposal must cover the entire project, not just a "minimum viable product" (VMP). The idea is to conclude the project at a cost of 600–1,000 USD, plus an annual maintenance contract. Unnecessary files and libraries must be removed.
----- Milestones ----- The project is a black box: either everything works or nothing works, so it isn’t possible to break it down into milestones
At the end of project:
- Provide technical support and training to our internal team for managing and further training the system.
- Document the entire architecture and implementation processes.
Desirable Requirements:
- Previous experience in AI projects within the financial sector.
Skills
Want AI to find more roles like this?
Upload your CV once. Get matched to relevant assignments automatically.