Text & Audio API

Text & Audio API: Translation for Bantu and African languages

The Text and Audio API translates agricultural guidance into Bantu languages and converts it to speech — so farmers can receive advice in their own language, by voice, without requiring literacy or data connectivity. It supports synchronous text translation and asynchronous audio transcription and synthesis via a job queue.

POST /api/v1/text · POST /api/v1/audio Scope: field:text / field:audio

What is the Text & Audio API?

The Text and Audio API is FildraAI's language access layer. Agricultural guidance produced by FieldGuide or other parts of the platform can be translated into local Bantu and African languages and optionally converted to spoken audio. This removes literacy and language barriers for smallholder farmers who may not read English or French but can receive voice guidance on a basic phone.

The API handles both directions of a language pipeline: you can send text and receive a translation (with optional audio output), or send an audio file (a recorded farmer question) and receive a transcript and translated response.

Designed for last-mile reach

This API was built specifically for contexts where connectivity is intermittent, devices are low-end, and farmers communicate in local languages that standard translation services do not support well. Audio jobs are processed asynchronously so they can be queued and retrieved when connectivity is available.

Authentication and scopes

All endpoints require a valid API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Text and audio operations use separate scopes, allowing you to grant minimal permissions to each integration:

Text scope — `field:text`

POST /api/v1/text — Synchronous text translation
POST /api/v1/text/jobs — Create async text job
GET /api/v1/text/jobs/{job_id} — Check job status

Audio scope — `field:audio`

POST /api/v1/audio — Upload audio file for transcription/translation
GET /api/v1/audio/{job_id} — Check audio job status
GET /api/v1/locales — List supported languages

Billing

POST /api/v1/text and POST /api/v1/text/jobs are billed per call under the field_text billing endpoint. Audio endpoints are server-audience only and are billed separately. GET endpoints (status polling and locale listing) are not billed.

Synchronous text translation

POST /api/v1/text translates a text string and optionally synthesises it to audio in a single synchronous call. Use this for short strings where low latency matters.

Required fields

text (string) — The input text to translate
lang (string) — Target language locale key (e.g. sw-ke)
output (string) — Either "text" or "audio"

Optional fields

source_lang (string, default "auto") — Source language; set to "auto" for automatic detection

Example — text output
POST /api/v1/text
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

{
  "text": "Apply urea fertiliser at 50kg per hectare before the first rains.",
  "lang": "sw-ke",
  "output": "text",
  "source_lang": "en-us"
}
        

The response includes translation_used (whether translation was applied), text (the output string), and — when output is "audio" — an audio_url link and audio_format.

Asynchronous audio jobs

Audio processing — transcription and synthesis — is handled asynchronously via a job queue. This allows large audio files to be processed in the background without blocking your application.

Upload audio

POST /api/v1/audio accepts a multipart form upload with the audio file and lang and output form fields. Returns a job_id immediately.

Poll for status

GET /api/v1/audio/{job_id} returns the current job status. When status is completed, the response includes the transcript, translation, and audio URL if synthesis was requested.

Accepted formats

The audio endpoint accepts common audio formats. File size limits apply — see the API reference for current limits. Short voice messages (under 60 seconds) process fastest.

Async text jobs

For longer text that needs audio synthesis, use POST /api/v1/text/jobs to submit a text translation job asynchronously and retrieve the result with GET /api/v1/text/jobs/{job_id}.

Example — upload audio
POST /api/v1/audio
Authorization: Bearer YOUR_API_KEY
Content-Type: multipart/form-data

file=@farmer_question.m4a
lang=sw-ke
output=text
source_lang=auto
        

Polling job status

After submitting an audio or text job, poll the status endpoint until the job reaches a terminal state. Job status values:

Active states

queued — Job received and waiting to be processed
processing — Transcription or synthesis is in progress

Terminal states

completed — Processing finished; results are available in the response
failed — Processing failed; check error_code and error_message for details

Polling interval

For short audio messages, start polling after 3–5 seconds. Use exponential backoff — most jobs complete within 10–30 seconds. Avoid polling more frequently than once per 2 seconds to stay within rate limits.

Supported languages

Use GET /api/v1/locales to retrieve the current list of supported languages at runtime. Each locale entry includes whether audio synthesis (audio_supported) is available, since not all text locales have voice synthesis.

Languages with audio support

Kiswahili (East Africa) — sw-ke
French (West/Central Africa) — fr-af
English — en-us

Text translation only

Chichewa / Nyanja — ny-mw
Lingala — ln-cd
Simplified Chinese — zh-cn
Traditional Chinese — zh-tw

Check at runtime

Language support expands over time. Always query /api/v1/locales at runtime rather than hardcoding the supported language list in your application.

Output modes

The output parameter controls what the API returns. Both text and audio endpoints support the same two modes:

output: "text"

Returns the translated text string in the target language. No audio synthesis. Use this when the downstream system will display or speak the text through its own TTS engine.

output: "audio"

Translates the text and synthesises it to speech in the target language. Returns an audio_url to the generated audio file and the audio_format (e.g. mp3).

Error codes

Common error responses from the Text and Audio API:

Client errors (4xx)

400 empty_text — The input text is empty
400 unsupported_target_lang — The target language is not supported
400 audio_unavailable — Audio synthesis not supported for this language
400 invalid_output — output must be "text" or "audio"
401 — Missing or invalid API key
404 — Job not found (for status polling)

Server errors (5xx)

503 — Translation or synthesis service temporarily unavailable

Audio file validation

Audio uploads require a non-empty file. If the file field is missing from the multipart form or the uploaded file is empty, the API returns 400 immediately — before the job is queued. Validate file size and presence before uploading to avoid wasted calls.