Text & Audio API: Translation and transcription service
A content-agnostic translation and transcription service. Submit text or audio in your input language, specify a target language and an output format (text or audio), and, if the language pair is supported, receive the translated result back in that format. The API does not interpret content or assume a domain; what you do with it is up to you.
What is the Text & Audio API?
The Text and Audio API is a stateless, content-agnostic translation and transcription service. You supply input (text or audio), specify the input language, the target language, and the output format you want back (text or audio). If the language pair is supported, the API returns the translated result. The service does not interpret what you send, there is no agricultural model, no follow-up logic, and no memory between calls.
Two flows are supported: text → text or audio (submit a string, receive translation as text or synthesised speech), and audio → text or audio (submit an audio file, receive transcript optionally translated and synthesised back to speech).
Local dialect coverage is the focus
Built to expand translation and speech support to African and Bantu languages that mainstream tools cover poorly. Production validation on May 13, 2026 returned 200 for synchronous text translation with a user key, 202 for async text job creation with a user key, and 202 for audio upload with a server key. Job polling and locale listing are server-audience routes in the current policy.
Authentication and scopes
All endpoints require a valid API key. Use user keys only for creating text translation requests. Use a server key for audio uploads, locale discovery, and job polling.
X-Api-Key: YOUR_API_KEY
# Bearer style is also accepted
Authorization: Bearer YOUR_API_KEY
Text and audio operations use separate scopes, allowing you to grant minimal permissions to each integration:
Text scope, field:text
POST /api/v1/text, Synchronous text translationPOST /api/v1/text/jobs, Create async text jobGET /api/v1/text/jobs/job_abc123, Check job status with a server key
Audio scope, field:audio
POST /api/v1/audio, Upload audio file for transcription/translationGET /api/v1/audio/audio_job_abc123, Check audio job statusGET /api/v1/locales, List supported languages
Billing
POST /api/v1/text and POST /api/v1/text/jobs are billed per call under the field_text billing endpoint: 1 credit, or 2 when you request spoken output. POST /api/v1/audio is billed by length: it reserves the worst-case credits when you submit, then settles the exact cost and refunds the difference once the real duration is measured (base 1 + 1 per 30s in + 1 per 30s spoken out). GET endpoints for status polling and locale listing are not billed.
Worked example: a 45-second clip translated to spoken audio with about 20 seconds of speech settles to 4 credits (base 1, plus 2 for the input audio at 1 per 30s, plus 1 for the spoken output). On submit the worst case is reserved; the unused hold is refunded once the real duration is measured.
Synchronous text translation
POST /api/v1/text translates a text string and optionally synthesises it to audio in a single synchronous call. Use this for short strings where low latency matters.
Required fields
text(string), The input text to translatelang(string), Target language code (e.g.swfor Swahili,zhfor Mandarin). See the supported-languages section below for the full list.output(string), Either"text"or"audio"
Optional fields
source_lang(string, default"auto"), Source language; set to"auto"for automatic detection
Example, text output
curl -X POST "https://api.fildraai.com/api/v1/text" \
-H "X-Api-Key: YOUR_USER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, how are you today?",
"lang": "sw",
"output": "text",
"source_lang": "en"
}'
Example response
{
"ok": true,
"input_modality": "text",
"output_modality": "text",
"source_lang": "en",
"source_detection": "langdetect",
"target_lang": "sw",
"target_display": "Swahili",
"translation_used": true,
"translation_backend": "bedrock",
"text": "Habari, habari yako leo?",
"audio_url": null,
"audio_format": null,
"audio_duration_seconds": null,
"latency_ms": 1090.5
}
The response includes translation_used (whether translation was applied), text (the output string), and, when output is "audio", an audio_url link and audio_format.
Asynchronous audio jobs
Audio processing, transcription and synthesis, is handled asynchronously via a job queue. This allows large audio files to be processed in the background without blocking your application.
Known limitation: spoken-audio download
When you request output: "audio", the job returns an audio_url, but that link is not directly downloadable yet. Until this is fixed, use output: "text" if you need the result programmatically. Transcription and translation text are unaffected.
Upload audio
POST /api/v1/audio accepts a multipart form upload with the audio file and lang and output form fields. Returns a job_id immediately.
Poll for status
GET /api/v1/audio/audio_job_abc123 returns the current job status. Replace audio_job_abc123 with the job_id returned by the upload request. When status is DONE, the response includes the transcript, translation, and (if synthesis was requested) an audio URL. Note: the synthesized audio URL is currently not directly downloadable; prefer text output until this is fixed.
Accepted formats
Common audio container formats (MP3, M4A, WAV, OGG, WebM, FLAC). The maximum file size is 15 MB, but the effective limit is your plan's audio duration cap (Free 60s, Basic 300s, Pro 900s); a longer clip is rejected even when the file is small. Live in-browser recordings from the FieldAudio playground are capped at 10 seconds. Shorter clips process faster.
Async text jobs
For longer text that needs audio synthesis, use POST /api/v1/text/jobs to submit a text translation job asynchronously. Retrieve the result through your backend with GET /api/v1/text/jobs/job_abc123.
Example, upload audio (cURL)
curl -X POST "https://api.fildraai.com/api/v1/audio" \
-H "X-Api-Key: YOUR_SERVER_API_KEY" \
-F "[email protected]" \
-F "lang=sw" \
-F "output=text" \
-F "source_lang=auto"
Python, upload audio (requests)
import requests
url = "https://api.fildraai.com/api/v1/audio"
headers = {"X-Api-Key": "YOUR_SERVER_API_KEY"}
# `files` triggers multipart/form-data automatically.
with open("/absolute/path/to/recording.wav", "rb") as fh:
response = requests.post(
url,
headers=headers,
data={"lang": "sw", "output": "text", "source_lang": "auto"},
files={"file": ("recording.wav", fh, "audio/wav")},
timeout=60,
)
response.raise_for_status()
job = response.json()
print("job_id:", job["job_id"])
# Then poll /api/v1/audio/{job_id} until status is "DONE".
JavaScript, upload audio (fetch + FormData)
// Works in browsers and Node 18+ (built-in fetch + FormData).
// `audioFile` is a File or Blob, from <input type="file"> or recorded audio.
async function uploadAudio(audioFile) {
const form = new FormData();
form.append('file', audioFile, audioFile.name || 'recording.wav');
form.append('lang', 'sw');
form.append('output', 'text');
form.append('source_lang', 'auto');
const response = await fetch('https://api.fildraai.com/api/v1/audio', {
method: 'POST',
headers: { 'X-Api-Key': 'YOUR_SERVER_API_KEY' },
// IMPORTANT: do NOT set Content-Type, the browser sets the
// correct multipart boundary automatically when you pass FormData.
body: form,
});
if (!response.ok) throw new Error(`HTTP ${response.status}`);
const job = await response.json();
return job.job_id; // poll /api/v1/audio/{job_id} for completion
}
Test in Postman
Postman is the fastest way to verify your API key + audio file work before wiring code. The steps below mirror the cURL / Python / JS examples above.
- Create a new request. Method
POST, URLhttps://api.fildraai.com/api/v1/audio. - Add the API key header. Headers tab → add
X-Api-Keywith your server-audience key value. - Switch the body to multipart. Body tab → select form-data.
- Add the audio file + form fields. Create a key
file, change its type from Text to File, pick your.wav/.mp3/.webm. Add three text rows:lang=sw,output=text,source_lang=auto. - Send it. A successful response returns 200 with a
job_id. PollGET /api/v1/audio/{job_id}with the sameX-Api-Keyheader untilstatus: "DONE".
Screenshots below, drop the captured PNGs into /static/img/docs/postman/ with the filenames shown.
POST selected and the full URL pasted in. Filename:
postman_audio_01_url.png
X-Api-Key + your masked server-audience key. Filename:
postman_audio_02_header.png
file row toggled to File with a selected WAV, and the three text rows below. Filename:
postman_audio_03_formdata.png
job_id and initial status: "QUEUED". Filename:
postman_audio_04_response.png
Tip: save the API key as a Postman environment variable (e.g. {{FILDRA_SERVER_KEY}}) so you don't paste it into every request and so screenshots you share don't leak it.
Polling job status
After submitting an audio or text job, poll the status endpoint from your backend until the job reaches a terminal state. The current production policy requires a server-audience key for status polling.
Active states
QUEUED, Job received and waiting to be processedPROCESSING, Transcription or synthesis is in progress
Terminal states
DONE, Processing finished; results are available in the responseFAILED, Processing failed; checkerror_codeanderror_messagefor details
Example responses
Submitting a job returns 202 with a job_id. Polling GET /api/v1/audio/{job_id} returns 200 at every stage; read status to know where the job is.
# 1. Submit response (HTTP 202)
{ "ok": true, "status": "QUEUED", "job_id": "53aeb56cfb8f4c1fb5c2d64198f2f9be" }
# 2. Completed job (HTTP 200, status DONE)
{
"status": "DONE",
"job_id": "53aeb56cfb8f4c1fb5c2d64198f2f9be",
"input_modality": "audio",
"output_modality": "text",
"source_lang": "en",
"target_lang": "sw",
"transcript_text": "Good morning, the maize in the lower field is turning yellow.",
"text": "Habari za asubuhi, mahindi katika shambani la chini yanabadilika kuwa manjano.",
"audio_url": null,
"audio_duration_seconds": null,
"error_code": null,
"error_message": null
}
# 3. Failed job (HTTP 200, status FAILED)
{
"status": "FAILED",
"job_id": "…",
"error_code": "adapter_job_failed",
"error_message": "Audio exceeds the maximum duration for your plan."
}
When you requested output: "audio", a DONE job also carries audio_url, audio_format, and audio_duration_seconds. Note: the synthesized audio_url is not directly downloadable yet (see the warning above); prefer text output until that is fixed.
Polling interval
For short audio messages, start polling after 3–5 seconds. Use exponential backoff, most jobs complete within 10–30 seconds. Avoid polling more frequently than once per 2 seconds to stay within rate limits.
Supported languages
Use GET /api/v1/locales to retrieve the current list of supported languages at runtime. Each locale entry includes whether audio synthesis (audio_supported) is available, since not all text locales have voice synthesis.
Showing the built-in baseline (live list unavailable right now).
14 languages translate text; 14 also speak back with output: "audio".
| Language | Code | Text translation | Audio (TTS) |
|---|---|---|---|
|
Yes | Yes | |
|
Yes | Yes | |
|
Yes | Yes | |
|
Yes | Yes | |
|
Yes | Yes | |
|
Yes | Yes | |
|
Yes | Yes | |
|
Yes | Yes | |
|
Yes | Yes | |
|
Yes | Yes | |
|
Yes | Yes | |
|
Yes | Yes | |
|
Yes | Yes | |
|
Yes | Yes |
Languages marked No for audio translate text but return 400 if you request output: "audio".
Field name: lang, not target_lang
The request body uses lang as the target language code. source_lang is optional and defaults to auto-detection. Codes are short (en, sw, zh), not regional forms like sw-ke.
Check at runtime
Language support expands over time. Always query /api/v1/locales at runtime rather than hardcoding the supported language list in your application.
Output modes
The output parameter controls what the API returns. Both text and audio endpoints support the same two modes:
output: "text"
Returns the translated text string in the target language. No audio synthesis. Use this when the downstream system will display or speak the text through its own TTS engine.
output: "audio"
Translates the text and synthesises it to speech in the target language. Returns an audio_url to the generated audio file and the audio_format (e.g. mp3).
Error codes
Common error responses from the Text and Audio API:
Client errors (4xx)
400 empty_text, The input text is empty400 unsupported_target_lang, The target language is not supported400 audio_unavailable, Audio synthesis not supported for this language400 invalid_output,outputmust be"text"or"audio"401, Missing or invalid API key404, Job not found (for status polling)
Server errors (5xx)
503, Translation or synthesis service temporarily unavailable
Audio file validation
Audio uploads require a non-empty file. If the file field is missing from the multipart form or the uploaded file is empty, the API returns 400 immediately, before the job is queued. Validate file size and presence before uploading to avoid wasted calls.