CompanyRemote

High-Speed Python Phone Scraper

Deadline: 2026-04-01

Description

Budget: $400 - $600

I need a blister-fast, fully asynchronous Python scraper that can pull U.S. mobile-phone leads from any URL I feed it, then hand back a clean, de-duplicated.txt file through a Telegram bot interface.

Here is the flow I have in mind:

• I drop one or more target URLs into the Telegram bot. • Your script spins up 500+ concurrent requests with Python asyncio and HTTPX, routing every call through Scrape.do’s residential proxy network (I’ll share the API key directly). • As each page or endpoint returns, a Regex routine hunts down every U.S. mobile number in the raw HTML or JSON. • A Redis Bloom Filter provides global, cross-task XXXX XXXX we never store the same number twice, even across millions of records. • When the task finishes, the bot pushes back a downloadable.txt file containing only fresh, unique mobile numbers.

Technical must-haves

  • Advanced mastery of asyncio + HTTPX (I expect you to squeeze maximum throughput without blocking).
  • Familiarity with Scrape.do request g & rotation logic.
  • Hands-on experience with Redis Bloom or an equivalent probabilistic filter at scale.
  • Clean, testable codebase (PEP 8, sensible structure, minimal external dependencies).

Deliverables

  1. Production-ready Python project (scripts, requirements.txt, README).
  2. Telegram bot module wired to the scraper and export routine.
  3. Redis Bloom setup script or Docker service.
  4. One short video or markdown walkthrough showing local setup, running a sample scrape, and exporting results.

Acceptance criteria

  • Sustains ≥500 concurrent requests without timeouts on a mid-tier VPS.
  • Captures only U.S. mobile numbers; false positives <1 %.
  • Redis Bloom keeps duplicates below 0.1 % over a 5-million-record test run.
  • Telegram bot responds within 3 s to start/stop/status commands and delivers the final.txt automatically.

The budget in the brief ($400

  • $600 fixed) covers the full scope, including a week of post-delivery bug fixes. Let me know your estimated timeline and any prior projects that prove you can hit these concurrency and data-cleanliness targets.

Skills

Software DevelopmentHTMLAPI IntegrationPythonSoftware ArchitectureRedisDockerData ProcessingLinuxAPIWeb Scraping

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching