Why categorization?
Transaction categorization is a foundational technology. It allows banks and financial institutions, and their customers, to derive meaning from a set of transactions. Categorization forms the basis of most data-derived value and services in the domain of finance, from personal finance management to fraud detection and credit worthiness scoring. Without grouping transactions into categories, we cannot answer the most basic questions about our most valuable data.
What do we offer?
We offer a machine learning and analytics solution for cleansing, enriching and classifying financial transactions. Our API first approach ensures easy integration, either on-premise or in the cloud. We can offer pre-trained models, custom category structures, white-label branding, and support and integration services. Moreover, our models can be embedded in mobile devices, and can be applied to federated and privacy sensitive learning applications.
How are we different?
Key features:
- Decoupled category trees from label prediction
- High performance incremental learning
- Prediction confidence threshold
Decoupled category trees
We have found that when it comes to category trees, the "one size fits none" adage applies. Retail banking customers want their transactions categorized in a different way to a business user. To solve this we use a novel multi-class, multi-label classifier algorithm. Each transaction can have more than one label predicted for it. This approach allows us to model several topics for a single transaction, so that a monthly subscription payment to an online music streaming service can simultaneously be a expenses, credit card payment, online, and music.
Custom category trees for different domains can then be layered on top of the labelled output allowing your users to see their data in the most intuitive way.
Incremental learning
Our algorithms fall into a group called "out-of-core learners". This means that models can be updated incrementally, learning from every user interaction in real-time, using very little memory. No need to retrain your models on the entire batch of training data, whenever new labelled data is collected.