Why Active Learning?
Collecting good training data is an integral part of any machine learning solution. After all, your models can only be as good as the data you train on. However, this is also the most costly part of the process, because human beings need to teach the models by showing them examples of what output is desired, for a given input. This manual labeling process needs to be as streamlined and optimized as possible to save on wasted effort of adding data to categories which are already performing well. The question is, how does one know where best to focus your efforts?
What is Active Learning?
Active learning is a special case of machine learning in which a learning algorithm can interactively query a user (or some other information source) to label new data points with the desired outputs. To achieve this, an algorithm needs to have meta-cognitive properties: It knows when it doesn't know. Our solution leverages this to forward transactions with low confidence predictions to a data pool for human review, to optimally reinforce model training.
Moreover, it is often better for a model to say I don't know what this is, than to say I know and be wrong. This is known as cost-sensitive prediction, where the cost of a false positive can be weighed against the cost of a false negative, and the system can be made aware of the trade-off.
Benefits of Active Learning
- Manage and extend your training data pool with samples where the model needs re-inforcement
- Cost-sensitive margin thresholds allow you to control false positive prediction rates
- Save time and money, which would otherwise be wasted effort