AI Agent Content Aggregation Setup
Description
Our community app needs a hands-off content pipeline that pulls the latest industry news, updates, trends, classes, events, and full manufacturer catalogs directly from websites, blogs, social media, and RSS feeds. The flow should:
• Crawl or subscribe to the sources above, extract fresh text, images, and PDFs • Clean, structure, and tag the data so it is searchable by topic, date, and supplier • Generate a simple, brand-consistent cover image when the source lacks one • Hold each item in a staging queue for my quick review • Push approved items to the app through its REST/GraphQL API, on a daily or weekly schedule I can adjust
For catalogs, be ready to download PDFs or scrape product pages, then pull out spec sheets and key attributes. Everything must remain traceable back to the original URL or document.
Build the agents end-to-end—scraper, NLP enrichment, database, scheduler, and API integration. Python (Scrapy, LangChain, OpenAI, BeautifulSoup), or a comparable stack, is ideal but not mandatory if you can deliver the same reliability and speed.
Timing is critical; I’d like a working MVP ASAP, followed by refinements once we see real data flowing. Please share a concise plan, the toolset you prefer, and examples of similar automations you have already deployed. Rate: USD 25–50/hr Skills: Java, Python, Web Scraping, Software Architecture, Scrapy, API Integration, Natural Language Processing, LangChain
Skills
Want AI to find more roles like this?
Upload your CV once. Get matched to relevant assignments automatically.