High-Speed Python Phone Scraper
Description
Budget: $400 - $600
I need a blister-fast, fully asynchronous Python scraper that can pull U.S. mobile-phone leads from any URL I feed it, then hand back a clean, de-duplicated.txt file through a Telegram bot interface.
Here is the flow I have in mind:
• I drop one or more target URLs into the Telegram bot. • Your script spins up 500+ concurrent requests with Python asyncio and HTTPX, routing every call through Scrape.do’s residential proxy network (I’ll share the API key directly). • As each page or endpoint returns, a Regex routine hunts down every U.S. mobile number in the raw HTML or JSON. • A Redis Bloom Filter provides global, cross-task XXXX XXXX we never store the same number twice, even across millions of records. • When the task finishes, the bot pushes back a downloadable.txt file containing only fresh, unique mobile numbers.
Technical must-haves
- Advanced mastery of asyncio + HTTPX (I expect you to squeeze maximum throughput without blocking).
- Familiarity with Scrape.do request g & rotation logic.
- Hands-on experience with Redis Bloom or an equivalent probabilistic filter at scale.
- Clean, testable codebase (PEP 8, sensible structure, minimal external dependencies).
Deliverables
- Production-ready Python project (scripts, requirements.txt, README).
- Telegram bot module wired to the scraper and export routine.
- Redis Bloom setup script or Docker service.
- One short video or markdown walkthrough showing local setup, running a sample scrape, and exporting results.
Acceptance criteria
- Sustains ≥500 concurrent requests without timeouts on a mid-tier VPS.
- Captures only U.S. mobile numbers; false positives <1 %.
- Redis Bloom keeps duplicates below 0.1 % over a 5-million-record test run.
- Telegram bot responds within 3 s to start/stop/status commands and delivers the final.txt automatically.
The budget in the brief ($400
- $600 fixed) covers the full scope, including a week of post-delivery bug fixes. Let me know your estimated timeline and any prior projects that prove you can hit these concurrency and data-cleanliness targets.
Skills
Want AI to find more roles like this?
Upload your CV once. Get matched to relevant assignments automatically.