All posts
AI Research 11 min read

Why we trained our own embedding model

Off-the-shelf embeddings worked. Our custom model worked dramatically better. Here's what we learned about domain specificity.

Anna Roman
Lead Researcher
Apr 14, 2026

The use case

Most of our customers run knowledge bases, product catalogs, FAQs, internal docs. The agent retrieves relevant chunks at query time. Quality of retrieval directly drives quality of answers.

We started with OpenAI text-embedding-3-large. Then Cohere. Then a popular open-source model. Each was decent. None was great for our specific data.

Training a domain-specific model

We fine-tuned a 350M-param base model on 8M query-document pairs from our customer data (with consent and proper privacy controls). Training took 4 days on 8 H100s.

Training loss curves
Training loss curves

Our model is smaller than text-embedding-3-large but performs measurably better on our retrieval benchmark. Domain specificity beats raw scale, when the domain is concentrated enough.

Results

Retrieval recall@5 on our benchmark went from 71% (best off-the-shelf) to 88%. End-to-end answer quality (judged by humans) improved by 14% absolute.

We're publishing the eval methodology (not the weights, those are competitive). The methodology is in our research GitHub repo.

#ml#research
Anna Roman
Lead Researcher

Leads our applied ML research. Published widely on multi-agent systems. Believes good evals are 80% of good AI.

Try MyChatBot for free

Set up your first AI agent in 10 minutes. No credit card required.

Start free trial