CompanyRemote

Offline Forum Archive Needed: time sensitive

Posted TodayDeadline: 2026-04-04

Description

Budget: $100 - $350

I need a complete, browsable copy of the public sections of NewHorizonOrchids.com before its hosting ends. It will go offline March 31, 2026. A straightforward static scrape is fine—no dynamic search, log-ins, or other interactive functions are required. I am mainly interested in preserving the text (the images are a bonus but not essential) and making sure there are no broken links, missing threads, or empty pages.

You are welcome to use HTTrack or any other tool you trust, as long as the final result is a set of clean HTML files that open locally and mirror the original forum’s structure. Please deliver the archive as a zipped folder that I can unpack and open instantly in any browser.

To confirm success, I will spot-check random threads for completeness and make sure internal links resolve offline. If this is your wheelhouse, let’s talk timeline and any access details you may need.

Project Title: Archive Entire Orchid Forum for Offline Use (HTTrack or Equivalent)

Project Description: I need a complete offline backup of the following public forum before it is taken offline:

https://www.newhorizonorchids.com/forum/

This is a phpBB-style forum with thousands of pages of orchid hybridizing and breeding discussions. The goal is to preserve as much content as possible in a fully browsable offline format.

REQUIREMENTS (MUST FOLLOW):

Output Format:

Deliver as a full offline HTML website archive (NOT PDF, NOT text)
Must open locally via index.html and function like a normal website
All internal links must work (threads, pagination, navigation)

Content Coverage:

Capture ALL forum sections and threads
Include deep pagination (e.g., pages using &start= parameters)
Include images and attachments where possible
Prioritize complete thread capture over speed

Technical Approach:

Use HTTrack, Wget, or equivalent website mirroring tool
Configure crawl to include:
All thread pages (viewtopic.php)
All forum pages (viewforum.php)
Pagination (&start= pages)

Exclusions:

Do NOT include pages, search pages, or user account pages
Avoid unnecessary duplicate crawling if possible

DELIVERABLES:

A single folder (or.zip) containing:
All HTML files
Image assets
Supporting files (CSS, JS, etc.)
File size expected: ~2–6 GB

VERIFICATION REQUIRED:

Before delivery, confirm:

Threads open correctly offline
Pagination works (multi-page threads load correctly)
Images display properly
No major sections are missing

IMPORTANT:

Do NOT send executable files (.exe or installers)
Only deliver raw HTML archive or zipped folder
This is a time-sensitive project (forum may go offline soon)

Optional (nice to have):

Second pass to ensure deeper thread capture
Report on total pages/files captured

Goal: Preserve as much of the forum content as possible before it is lost permanently.

Skills

PythonCSSWeb DevelopmentData ExtractionGoHTMLWeb CrawlingWeb ScrapingSoftware ArchitecturePHP

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching

Description

Skills

Similar assignments

Offline Forum Archive Needed: time sensitive

Public Disaster Alert System Development

Excel Exporting Instagram Data Extractor

Instagram Reel Data Extractor and Processor

Bulk Legal Lead Data Scraping