Urgent OCR
Description
In this folder are three folders called Bishopsgate archives, Newham archives and special collections and RIBA collections. https://www.dropbox.com/scl/fo/ixwakdbgdhodiq9sqlbk6/AAuJLgrYLb2WB1a--E_b76A?rlkey=y0f9poa3t1rpalmemh2qljs2n&st=dfvttlf8&dl=0. Ignore the folder called riba objects called scanning.
Each of those three folders has many subfolders. For each of those subfolders perform these instructions
OCR INSTRUCTIONS
One file per folder For each archive folder (e.g. TBUK1, NEWHAM2), create ONE plain text file (.txt). Put all documents from that folder into that one file.
Clear document separation and stable IDs Every time a new document starts, write:
============================== ARCHIVE_FOLDER: TBUK1 DOCUMENT_ID: TBUK1_01 DATE: (write date exactly as shown, or Unknown) PLACE: (write place exactly as shown, or Not stated)
Then paste the full OCR text of that document. For the next document:
============================== ARCHIVE_FOLDER: TBUK1 DOCUMENT_ID: TBUK1_02 DATE: PLACE:
Continue sequentially: TBUK1_03, TBUK1_04, etc. For a different folder (e.g. NEWHAM2):
ARCHIVE_FOLDER: NEWHAM2 DOCUMENT_ID: NEWHAM2_01
Do not restart numbering without the folder prefix. If date or place is not visible:
DATE: Unknown PLACE: Not stated
Do NOT guess.
- OCR rules
- Copy text exactly as written.
- Do NOT correct spelling or grammar.
- Do NOT rewrite sentences.
- Do NOT summarise.
- Keep paragraph breaks.
- Remove page numbers.
- If a word cannot be read, write: [illegible]
- Do not insert commentary.
- Hand-drawn diagrams Do NOT attempt full OCR of technical drawings. Instead include:
============================== ARCHIVE_FOLDER: TBUK1 DOCUMENT_ID: TBUK1_XX DATE: PLACE:
HAND-DRAWN ENGINEERING DRAWING Title: (if visible) Location: (if visible) Company: (if visible)
If no readable text at all:
HAND-DRAWN ENGINEERING DRAWING (no readable text)
Do NOT copy measurements or technical numbers from diagrams.
- Save format
- Save as.txt
- Use UTF-8 encoding
- For each subfolder also export one pdf for me to refer to easily - pdf must be under 30MB. You can use these Lower res versions of the subfolders for the pdfs. https://www.dropbox.com/scl/fo/hjed8pk08njhzfns6hhvb/ALP8N6HCGRHAuYwiJR_tEg4?rlkey=t4p5y3q3954monibbm2tktz90&st=lb5eo4lp&dl=0
Budget: GBP 200 (Fixed Price)
Proposals: 29 freelancers have applied
Want AI to find more roles like this?
Upload your CV once. Get matched to relevant assignments automatically.