CompanyRemote

Italian, German and French Audio Dataset Collection

Deadline: 2026-04-04

Description

Budget: $10000 - $20000

For the German, French, and Italian language data collection project, we require 250 groups per language (total 750 groups), with each group consisting of two speakers engaging in natural, unscripted conversations to generate high-quality dialogue datasets for AI model training. All recordings must be conducted in a quiet indoor environment with minimal background noise and no echo, using professional recording equipment such as a microphone and sound card to ensure high-fidelity audio (48kHz, 32-bit, mono). Each group will produce multiple conversation segments on diverse everyday topics such as travel, health, shopping, technology, entertainment, and lifestyle, with each segment ranging from 5 to 30 minutes and the total recording duration per group capped at 2 hours. Participants should be between 18–45 years old, with a balanced male-to-female ratio, and must be fluent in the respective language (German, French, or Italian), delivering clear, natural speech without reading or scripted behavior. Conversations should reflect real human interaction, including overlaps and interruptions, while strictly avoiding sensitive or inappropriate content. Each participant can contribute only once to ensure voice uniqueness, and proper consent for voice usage must be obtained prior to recording. The final deliverables will include fully recorded, quality-checked, and accurately annotated conversational datasets with complete metadata, making them suitable for advanced speech recognition, voice AI, and large language model training.

Skills

Voice TalentAIData CollectionCastilian Spanish TranslatorAI Model DevelopmentNatural Language ProcessingAudio ServicesAudio ProcessingEchoAI DevelopmentAudio EngineeringResearch

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching