CompanyRemote

AI Model Training Dialogue Data Collection

Deadline: 2026-04-04

Description

Budget: $10000 - $20000

For the German, French, and Italian language data collection project, we require 250 groups per language (total 750 groups), with each group consisting of two speakers engaging in natural, unscripted conversations to generate high-quality dialogue datasets for AI model training. All recordings must be conducted in a quiet indoor environment with minimal background noise and no echo, using professional recording equipment such as a microphone and sound card to ensure high-fidelity audio (48kHz, 32-bit, mono). Each group will produce multiple conversation segments on diverse everyday topics such as travel, health, shopping, technology, entertainment, and lifestyle, with each segment ranging from 5 to 30 minutes and the total recording duration per group capped at 2 hours. Participants should be between 18–45 years old, with a balanced male-to-female ratio, and must be fluent in the respective language (German, French, or Italian), delivering clear, natural speech without reading or scripted behavior. Conversations should reflect real human interaction, including overlaps and interruptions, while strictly avoiding sensitive or inappropriate content. Each participant can contribute only once to ensure voice uniqueness, and proper consent for voice usage must be obtained prior to recording. The final deliverables will include fully recorded, quality-checked, and accurately annotated conversational datasets with complete metadata, making them suitable for advanced speech recognition, voice AI, and large language model training.

Skills

Natural Language ProcessingVoice TalentAudio ServicesAI Chatbot DevelopmentAIAI ResearchEchoAI Model DevelopmentGerman TranslatorResearch

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching