The Architecture of Simplifi
A deep dive into how we built Simplifi to automatically categorize millions of transactions in real-time.
When we set out to build Simplifi, we knew the biggest bottleneck in personal finance apps wasn't the UI—it was the data categorization engine. Users hate manually tagging transactions. To solve this, we engineered a completely new architecture designed for sub-millisecond automated tagging.
The Data Ingestion Pipeline
Our pipeline ingests data directly from Plaid and custom bank APIs. We utilize an event-driven architecture powered by Kafka to handle the massive volume of incoming webhook events. As soon as a transaction hits our ingestion layer, it is pushed to a worker node for evaluation.
Machine Learning at the Edge
Instead of relying on heavy cloud functions, we deploy localized machine learning models directly at the edge. We use a proprietary NLP (Natural Language Processing) model trained on over 500 million transaction descriptions to instantly identify merchants, locations, and expense categories with 99.9% accuracy.
What's Next?
Our engineering team is currently working on v2 of the categorization engine, which will include predictive cashflow mapping based on historical burn rates. The architecture scales seamlessly, allowing us to roll out these features without infrastructure overhauls.