All posts
Engineering 9 min read

Lessons from running 50M+ messages a month

Five hard-won lessons from operating high-throughput message infrastructure across 100k+ businesses.

Alex Petrov
Engineering Director
Mar 12, 2026

Where we are

MyChatBot processes 50M+ messages per month across WhatsApp, Telegram, Instagram, voice and email. We've grown from 1M/month two years ago. Here's what changed in our heads as we scaled.

1. Tail latency is the only latency that matters

We spent the first year optimizing median latency. Median is fine. Customers feel P95 and P99, that's where the bad experiences live. Now every dashboard we look at shows P99 first.

2. Idempotency or you'll cry

Every message must be idempotent. Networks fail. Webhooks retry. If your handler is not idempotent, you'll send duplicate messages to real customers. We learned this the embarrassing way.

3. Backpressure beats rate limits

Rate limits feel safe but they push the problem upstream, usually to your customers, who then complain. Backpressure (slowing down gracefully when downstream is slow) keeps the system stable without surprising anyone.

4. Logs are not metrics

Don't try to derive metrics from logs at scale. Build first-class metrics. Logs are for debugging individual cases; metrics are for understanding the system.

5. Pager hygiene is engineering culture

If your team is paged at night for things that aren't actionable, you'll burn them out. Every pageable alert must have a runbook and a clear action. If neither exists, the alert shouldn't page.

#operations#engineering
Alex Petrov
Engineering Director

Runs engineering. Previously CTO at two startups. Strong opinions on engineering culture, weakly held.

Try MyChatBot for free

Set up your first AI agent in 10 minutes. No credit card required.

Start free trial