Data Explorer Slackbot

Go Genkit AWS Bedrock LiteLLM MCP Kubernetes Slack

Context

Getting business insights required knowing SQL, understanding complex data schemas, and having the time to write queries. With 120+ people across the company needing answers - driver experience teams checking market SLAs, executives looking at customer trends - everyone had questions but few could answer them. The data team was swamped with tickets, and one-off questions got ignored because they couldn't be prioritized over product work already in the backlog. Wait times stretched into weeks, so product decisions got made without the numbers. In a culture that required data-driven evidence for every decision, this bottleneck was slowing everything down. By the time data came back, sunk cost fallacy had already kicked in.

Action

I built a Slackbot that opened up data access to everyone in the company. When AWS Knowledgebases turned out to be too limited and buggy, I built custom MCP servers in Go - a choice I stand by for its type safety and performance. The system pulls from our data warehouse plus real-time clickstream and usage data, all running through Genkit for local development, evals, and A/B testing. I worked through the full spectrum of LLM challenges: guardrails and gateways, infinite loops, long response times, tool failures, parsing issues, graceful degradation, retries, MCP connectivity and session management, context engineering, persistent conversations, and prompting techniques like ReAct. I designed it to be modular so other internal automation and API gateways could use the same infrastructure.

Result

Questions that took weeks now get answered in seconds. The bot handles hundreds of conversations a day, generates charts, and goes beyond the initial ask - exploring interesting paths that surface deeper insights and prompting follow-up questions to help people who wouldn't already be thinking about what to ask next. The whole company now asks questions directly while the data team maintains evals and monitors for drift. They shifted from ticket triage to high-value work like machine learning - things they'd much rather be doing. It took longer to build the right way, but the result is faster iterations, automated tooling, and continuous improvement.

Back to Experience