CompanyRemote

Offline Forum Archive Needed: time sensitive

Deadline: 2026-04-04

Description

Budget: $100 - $350

I need a complete, browsable copy of the public sections of NewHorizonOrchids.com before its hosting ends. It will go offline March 31, 2026. A straightforward static scrape is fine—no dynamic search, log-ins, or other interactive functions are required. I am mainly interested in preserving the text (the images are a bonus but not essential) and making sure there are no broken links, missing threads, or empty pages.

You are welcome to use HTTrack or any other tool you trust, as long as the final result is a set of clean HTML files that open locally and mirror the original forum’s structure. Please deliver the archive as a zipped folder that I can unpack and open instantly in any browser.

To confirm success, I will spot-check random threads for completeness and make sure internal links resolve offline. If this is your wheelhouse, let’s talk timeline and any access details you may need.

Project Title: Archive Entire Orchid Forum for Offline Use (HTTrack or Equivalent)

Project Description: I need a complete offline backup of the following public forum before it is taken offline:

https://www.newhorizonorchids.com/forum/

This is a phpBB-style forum with thousands of pages of orchid hybridizing and breeding discussions. The goal is to preserve as much content as possible in a fully browsable offline format.


REQUIREMENTS (MUST FOLLOW):

  1. Output Format:
  • Deliver as a full offline HTML website archive (NOT PDF, NOT text)
  • Must open locally via index.html and function like a normal website
  • All internal links must work (threads, pagination, navigation)
  1. Content Coverage:
  • Capture ALL forum sections and threads
  • Include deep pagination (e.g., pages using &start= parameters)
  • Include images and attachments where possible
  • Prioritize complete thread capture over speed
  1. Technical Approach:
  • Use HTTrack, Wget, or equivalent website mirroring tool

  • Configure crawl to include:

  • All thread pages (viewtopic.php)

  • All forum pages (viewforum.php)

  • Pagination (&start= pages)

  1. Exclusions:
  • Do NOT include pages, search pages, or user account pages
  • Avoid unnecessary duplicate crawling if possible

DELIVERABLES:

  • A single folder (or.zip) containing:

  • All HTML files

  • Image assets

  • Supporting files (CSS, JS, etc.)

  • File size expected: ~2–6 GB


VERIFICATION REQUIRED:

Before delivery, confirm:

  • Threads open correctly offline
  • Pagination works (multi-page threads load correctly)
  • Images display properly
  • No major sections are missing

IMPORTANT:

  • Do NOT send executable files (.exe or installers)
  • Only deliver raw HTML archive or zipped folder
  • This is a time-sensitive project (forum may go offline soon)

Optional (nice to have):

  • Second pass to ensure deeper thread capture
  • Report on total pages/files captured

Goal: Preserve as much of the forum content as possible before it is lost permanently.

Skills

PythonCSSWeb DevelopmentData ExtractionGoHTMLWeb CrawlingWeb ScrapingSoftware ArchitecturePHP

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching