CompanyRemote

Python Web Scraping Automation

Deadline: 2026-04-01

Description

Budget: €250 - €750

I need a clean, well-documented Python script that automatically visits a set of webpages, collects both their visible text and all referenced images, and then saves everything in an organised JSON file. Each JSON record should include the page URL, scraped text, local image filenames, and any useful metadata such as timestamps.

Core requirements • Headless operation: the script must run without a GUI, preferably using requests/BeautifulSoup or Selenium where dynamic content demands it. • Image handling: download images to a defined folder, keep original filenames when possible, and update the JSON accordingly. • Modular design: functions for URL loading, parsing, image download, data cleaning, and final JSON write-out. • Robustness: graceful error handling, retry logic, and simple logging so I can trace any failures or skipped items. • Re-usability: feed the script a plain-text list of URLs and rerun without code changes.

Deliverables

  1. Python 3 script (.py) with inline comments.
  2. Example JSON output generated from at least two sample pages.
  3. A short README explaining prerequisites, setup, and execution commands.

I already have the list of target URLs and will provide them once we start. If specific libraries such as Scrapy or Playwright make the job smoother, I’m open to your suggestion—just keep installation simple and cross-platform.

Skills

PlaywrightSeleniumPythonBeautifulSoupSoftware ArchitectureJavaScriptScrapyJSONWeb Scraping

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching