Research Programme

Evidence-First Agricultural Intelligence Research

FildraAI is built as a research programme, not just a product. We connect computer vision for crop disease, automatic speech recognition for field audio, machine learning for continuous agricultural variables, and structured knowledge systems — all validated in real field environments across crops, livestock, and fisheries. Each component is evaluated with transparent methods so researchers, agronomists, and farmers can see how the system behaves in practice.

Research Philosophy

Our Approach to Agricultural AI

We use AI and machine learning to support agronomy — not to replace agronomists. Three core principles shape how we design models, collect data, and communicate results.

Agricultural AI must earn trust through transparency, not demand it through authority.

When a model suggests "Northern Leaf Blight" or recommends a specific fungicide application rate, the agronomist in the field needs to understand why. They need to see what the model observed, compare it against their own experience, and make an informed decision. Black-box predictions have no place in agriculture where livelihoods depend on getting it right.

Our research programme is designed around this reality. Every model we deploy comes with explainability tools. Every recommendation traces back to evidence. Every knowledge base entry cites its sources. We measure success not just by accuracy metrics, but by whether farmers and agronomists can understand and act on our outputs.

"Lab metrics are useful, but on-farm performance is the real benchmark. A model that achieves 98% accuracy on curated datasets but fails when farmers capture images under real conditions is not a successful model — it is a research artefact that never made the transition to practice."

Principle 01

Explainability First

Predictions must be easy to inspect. All deployed image models support AI focus area maps and structured outputs showing typical symptoms, look-alike issues, and management options. Agronomists can see why a result was suggested, not just the final label.

Principle 02

Field-First Validation

We prioritise data gathered in real fields — mixed cropping, partial nutrient stress, complex backgrounds — and compare early predictions with end-of-season outcomes. Lab metrics provide a starting point. On-farm performance is the real benchmark.

Principle 03

Transparent Methods, Safe Details

We share evaluation setups, baselines, and typical failure modes. Sensitive items — deployment pipelines, internal hyperparameters, customer data — stay private. But the behaviour and limitations of our models are open for discussion.

Academic Foundations

Datasets & Research Foundations

Our models are informed by publicly available datasets and research from leading institutions across Africa, Asia, and North America — combined with our own field data. We treat these datasets as shared scientific infrastructure and follow citation and licensing conditions for every source.

9+
Cited Datasets
6+
Countries Represented
3
Research Domains
4
Crops in Production

We separate datasets used for research benchmarking from those allowed in production deployment, respecting the licensing conditions of each source. Full citation details, including DOIs and BibTeX entries, are maintained in our internal technical notes and shared with research partners.

🇹🇿

Nelson Mandela African Institution of Science and Technology

Arusha, Tanzania

Publishes Harvard Dataverse maize disease datasets with field-collected images and expert labels for Northern Leaf Blight, Gray Leaf Spot, and Common Rust — captured directly from Tanzanian farms under real agricultural conditions.

Maize Harvard Dataverse
🇺🇬

Makerere University

Kampala, Uganda

Hosts maize image collections with clear field protocols across Ugandan agro-ecologies and seasons, with rigorous annotation standards and disease classification.

Maize Peer Reviewed
🇳🇦

Namibia University of Science and Technology

Windhoek, Namibia

Contributes maize disease datasets from semi-arid and arid systems, extending evaluation into Southern African environments with different light conditions and stress combinations than many global benchmarks.

Maize Southern Africa
🇬🇭

KaraAgro AI

Accra, Ghana

Provides curated maize imagery from West African farms, improving geographic balance across our maize-focused models and including healthy plants alongside multiple disease classes.

Maize West Africa
🇺🇸

PlantVillage — Penn State University

Pennsylvania, USA

One of the most widely cited plant disease image repositories. Where licensing is restrictive, we use PlantVillage primarily as a reference benchmark rather than production training data.

Multi-Crop Benchmark
🇮🇳

PlantDoc — IIT

India

Introduces in-the-wild imagery rather than controlled lab conditions. Useful for stress-testing robustness and identifying failure modes where background clutter and image quality vary from curated field images.

Multi-Crop ACM Published
🇧🇩

Rice Leaf Disease and Pest Dataset

Mendeley Data, 2024

A comprehensive rice leaf disease and pest dataset providing annotated imagery for multiple disease classes and pest types. Supports training and evaluation of rice-specific visual diagnosis models across diverse growing conditions.

Rifat, S.H.; Layes, T.A.; Hasan, A.; Mojumdar, M.U. (2024). Rice Leaf Disease and Pest Dataset Overview. Mendeley Data, V1. doi: 10.17632/vwv3nry3wr.1
Rice Mendeley Data

Rice Leaf Diseases — UCI Machine Learning Repository

UCI Repository, 2017

A foundational rice disease dataset covering bacterial blight, blast, and brown spot — three of the most economically damaging rice diseases. Provides early benchmarks for leaf-level visual classification.

Shah, J., Prajapati, H., & Dabhi, V. (2017). Rice Leaf Diseases [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5R013
Rice UCI Repository
🇳🇵

Rice Disease Dataset — Kaggle

Kaggle, 2021

A community-contributed rice disease dataset with images spanning multiple disease conditions in South Asian growing contexts. Broadens the geographic and phenotypic diversity of our rice model training and evaluation data.

Shrestha, N.L.; Pandey, P.; Tiwari, A.; Giri, R. (2021). Rice Disease Dataset. Kaggle. doi: 10.34740/KAGGLE/DSV/2481060
Rice Kaggle

Proper Citation & Licensing: We cite all datasets and papers in our technical documentation and publications. Each repository is accessed under its own licence terms. Some datasets are used only for research benchmarking; production models are trained on sources whose licences are compatible with commercial deployment.

Full citation details, including DOIs and BibTeX entries, are maintained in our internal technical notes and shared with research partners who would like to reproduce or extend our experiments.

Research Tracks

How Our Research Fits Together

The programme is organised into connected tracks: computer vision for crop images, automatic speech recognition for field audio, machine learning for continuous variables, an evidence-first knowledge system, and field validation. Each track informs the others.

Understanding how these tracks connect explains why our recommendations are contextual, explainable, and grounded in evidence — rather than opaque model outputs.

Track 01 · Computer Vision Core Track

Crop Disease Detection & AI focus areas

DenseSwin (maize), residual networks (rice), and compact CNN baselines (tomato, cassava) form the foundation of our plant health pipeline. We focus on performance under farmer-captured image conditions and on making model attention visible through AI focus area overlays.

  • Hybrid CNN + attention architectures tuned for maize, rice, tomato, and cassava
  • Explainability via AI focus areas plus symptom, look-alike, and management summaries
  • Confidence calibration so probabilities translate into actionable advice
  • Evaluation across diverse African and Asian field conditions
Explore AI Models
Track 02 · Speech & Audio FieldAudio

Automatic Speech Recognition for Field Use

FieldAudio is active research — not a roadmap item. We are fine-tuning ASR models for African Bantu languages (Swahili, Chichewa, Tumbuka, Nyanja), building TTS synthesis calibrated to agricultural vocabulary in these languages, and evaluating STT pipelines under real field conditions: wind, livestock noise, and low-bandwidth connections.

  • ASR fine-tuning on agricultural command vocabulary across Bantu language families
  • TTS synthesis optimised for how farmers actually speak field terms and crop names
  • STT evaluation in real field conditions: wind, livestock noise, low-bandwidth connections
  • Language coverage: Swahili (Kenya/Tanzania), Chichewa (Malawi/Zambia), and expanding
Explore FieldAudio
Track 03 · Machine Learning Core Track

ML for Rates, Yields & Scenarios

When questions move from "what is this?" to "how much should I apply?", we use supervised regression and statistical models. These estimate continuous targets such as fertilizer rates, spray volumes, and expected yield — not single numbers, but ranges with confidence intervals.

  • Agronomy-aware features: nutrient balances, growing degree days, stress indicators
  • Regularised linear models and tree-based ensembles for continuous predictions
  • Conservative / typical / upper confidence bands rather than single-point estimates
  • Safety guardrails derived from regulation, labels, and internal policies
Explore ML Models
Track 04 · Knowledge Discovery Evidence-First

FieldKB & Evidence-First Search

FieldKB is our structured agronomy knowledge system. It links regulations, practice notes, field images, and model outputs into one searchable, country-aware layer — always showing its sources and avoiding one-size-fits-all answers.

  • Multilingual, country-aware retrieval routing by crop × country × topic
  • Text and image evidence with clear citations and licence tags
  • Integration with AI focus areas so users can ask "why this diagnosis?" and see visual evidence
  • RAG-style orchestration favouring transparent, document-backed answers
Browse FieldKB
Track 05 · Field Validation Field-Driven

Field Trials & On-Farm Evaluation

Our primary field operations are based in Zambia, spanning smallholder belts and commercial hubs. We collect images, management histories, and outcomes to test how models behave in practice and keep the knowledge base grounded in real farms.

  • Sites across Eastern, Central, Southern Province and the Lusaka corridor
  • Linked datasets: images, weather summaries, soil tests, and management logs
  • Season-end reviews comparing model suggestions with realised yields and outcomes
  • Data-use agreements and privacy controls for farmer and partner data
See Data Sources
Track 06 · Domain Expansion Scoping

Livestock, Fisheries & New Crops

We are actively scoping research into livestock health monitoring, aquaculture and inland fisheries intelligence, and additional staple crops. These domains share the same accountability requirements as our crop work and will follow the same validation-first approach before deployment.

  • Livestock health monitoring — visual and behavioural indicator research
  • Fisheries — aquaculture conditions, fish health, and yield intelligence
  • Additional crops — sorghum, groundnut, soybean, cassava expansion
  • Same evidence-first, field-validated approach applied across all new domains
Scoping Phase — Not Yet Deployed

Expanding Scope

Where We Are Going

As our validation phases progress, we are beginning to scope research into adjacent agricultural domains. These expansions follow the same discipline that governs our current work: validate before deploying, and never overclaim what has not been proven in the field.

These are research directions, not product commitments. We list them here transparently so partners, researchers, and farmers understand the trajectory of our work.

Active Research

Agricultural ASR & FieldAudio

Automatic speech recognition optimised for noisy field conditions and low-resource agricultural languages. Research is active and informing FieldAudio product development.

Scoping

Livestock Health Intelligence

Visual and behavioural indicator research for common livestock conditions in smallholder farming contexts across sub-Saharan Africa. Dataset scoping and partner identification underway.

Scoping

Aquaculture & Fisheries

Inland fisheries and aquaculture intelligence for smallholder fish farmers — water quality indicators, fish health, and environmental monitoring. Early-stage scoping across East and Southern Africa.

Scoping

Sorghum & Millet

Expanding visual diagnosis to sorghum and pearl millet — drought-tolerant staples critical to food security in arid and semi-arid regions where our platform is being validated.

Scoping

Legumes: Groundnut & Soybean

Groundnut and soybean are critical rotation crops in our field validation regions. Extending disease and pest intelligence to these crops is a near-term research priority.

Planned

Expanded Multilingual ASR

Beyond current language support, we are planning research into additional agricultural languages across East, West, and Southern Africa to broaden accessibility for FieldAudio.

Integration

From Input to Recommendation

Understanding how our research tracks connect explains why FildraAI recommendations are contextual, explainable, and grounded in evidence — not opaque model outputs.

When a farmer or advisor submits a query — whether an image for diagnosis, a voice note describing symptoms, or a question about treatment rates — the request flows through multiple research tracks. Each track adds context, validation, and explainability before results reach the user.

Step 01

Input — Image or Voice

A farmer captures a crop image or records a voice note describing symptoms. FieldAudio transcribes speech to structured queries; FieldVision processes the image through crop-specific computer vision models, generating AI focus area overlays that make model attention visible.

Step 02

Knowledge Retrieval & Evidence Assembly

The diagnosis, crop, and location feed into FieldKB, which retrieves country-aware guidance, regulatory information, and management practices. Users see the sources behind every recommendation — government guidelines, peer-reviewed research, or validated field data.

Step 03

Continuous Variable Estimation

When the question is "how much?" — fertiliser rates, spray volumes, expected yields — ML regression models propose rate ranges. Rather than a single number, users receive conservative, typical, and upper-bound estimates with confidence intervals.

Step 04

Field Validation Feedback Loop

Fieldwork and on-farm evaluation feed back into all tracks. When we compare model predictions with actual outcomes at season end, the results update datasets, stress-test models, and refine FieldKB entries — keeping our research grounded in what actually happens in fields.

Transparency

Openness with Responsibility

We aim to make our research easy to understand and scrutinise while protecting production-critical details that enable continued investment in field-driven agricultural AI.

What We Share

Open Research Components

Evaluation protocols, high-level architectures, performance metrics, and typical failure modes are documented so partners can reproduce results or challenge assumptions. Model behaviour characteristics and known limitations — including where systems should not be trusted — are open for discussion.

What Stays Protected

Production & Partner Data

Full training pipelines, internal feature engineering, hyperparameter configurations, and partner-specific datasets remain confidential. This allows continued investment in long-term, field-driven AI development. Customer and farmer data is never shared beyond agreed use cases.

Data Privacy

Farmer & Field Data Governance

Field data is collected under clear data-use agreements with explicit consent. Public pages aggregate information at province or district level — we never publish farmer-identifying details. Usage data improves our services but is never sold. Farmers retain ownership of their agricultural data.

Limitations

Where We Acknowledge Uncertainty

Our tools support agronomy decisions but do not replace local expertise, labels, or regulation. Where models are uncertain — overlapping stress symptoms, out-of-distribution images, novel disease presentations — we emphasise caution and defer to human judgement.

Licensing & Provenance: FieldKB entries record source, licence, and region for every piece of evidence. Datasets with non-commercial or restrictive terms are separated from production training and used only for research or benchmarking purposes.

Model Limitations: We explicitly document where models should not be trusted — novel pathogens, unusual environmental conditions, crops outside our training distribution, and languages with insufficient ASR training data. Transparency about limitations builds more trust than claims of universal applicability.

Interested in Research Collaboration?

We collaborate with universities, research institutes, agribusinesses, and public agencies. If you have datasets to share, models to benchmark, or field trials to design — across crops, livestock, fisheries, or audio — we welcome joint work on accountable agricultural AI grounded in real fields and clear evidence.