What is Fraud?
Fraud is defined as the intentional use of false or misleading information to illegally deprive another person/entity of money, property, sensitive data, or other rights.
Almost all of us have received a fraud alert from our credit card company, (perhaps when traveling or making a larger than usual purchase), to verify whether a transaction attempt was legitimate.
This lucrative unlawful industry takes on many sizes, shapes, and attack vectors.
Challenges with Fraud
Evolving Patterns: Detecting fraud patterns using rigid rule-based systems is a fragile and error-prone approach no matter the scale or type of abuse you are attempting to combat. The complexity of constantly evolving abuse patterns and imbalanced datasets from which we attempt to extract insights makes fraud detection on par with finding a needle in a haystack.
Adaptative ML Techniques: Rigid rules-based approaches quickly become outdated as fraudulent users change tactics. Due to this fragility, businesses and governments alike have adopted ML techniques for anomaly detection, fraud prevention, and other counter-abuse investment areas.
Heavily Regulated Industries: Industries such as finance and healthcare present an even greater challenge by requiring ML model transparency for any claims made by an automated system.
Imbalance Datasets: In real-world datasets, fraud accounts for less than 1% of transactions. The nature of fraud results in highly unbalanced datasets. Whether you’re dealing with credit card transactions or money laundering schemes, generally only a small percentage of the total transactions that cross a PoS system is actually fraudulent.
Limited/Sensitive Information (PII): Regulatory laws and compliance measures limit the data we can use to detect fraud. Personally identifiable information (PII) must be redacted from data sources to abstract the user from the transaction. For example, in the case of credit card abuse systems, often the only features we can use to make a prediction are transactionID, geoLocation, terminalID, etc.
Misleading Evaluation Metrics: Traditional aggregate metrics are misleading, i.e. 99.8% accuracy could still mean that your model failed to detect every fraudulent transaction. ML algorithms by default tend to work best when samples across classes are balanced/equal. Reason being, most algorithms optimize to maximize accuracy and reduce error. However, for a typical credit card dataset with .2% fraudulent transactions, even a model that always outputs “not fraud” would achieve 99.8% accuracy. While this high accuracy looks great at first glance, it’s extremely misleading as we’ve failed to identify any fraudulent transactions.
Not all inferences are weighted equally. For example, a misclassified fraudulent transaction (false negative; predicting not fraud for a transaction that is indeed fraud) is likely to have a greater negative impact (potential loss of a large transaction $$$) than a misclassified legitimate transaction (false positive; predicting fraud for a transaction that is not fraud) which could be a mild inconvenience to a customer (text message confirming purchase).
A Gap for ML Observability: As fraudsters continue to reinvent adversarial techniques to exploit models in production, monitoring for anomalies (between production and a baseline dataset) and troubleshooting performance degradations across cohorts to identify system-wide threats/vulnerabilities has become both business-critical and time-sensitive.
Importance of Fraud Detection
Over the course of a 20 year study, extensive research reveals that fraud costs the global economy over $5 trillion spread across dozens of industries. In many cases, fraud accounts for losses that outweigh the actual sums of money stolen. Discovering complex fraud patterns can be a drain on time and productivity while tarnishing a company’s reputation and ruining customer relationships. The cost of catching and preventing malicious actors increases dramatically when you are subject to a targeted attack with experienced fraudsters systematically exploiting vulnerabilities of your ML models and prevention systems. To make matters worse, the likelihood of recovering capital lost from fraudulent payouts are slim. According to the Association of Certified Fraud Examiners’ (ACFE) 2018 Report to the Nations, 53% of victims recovered nothing, 32% made a partial recovery of funds, while only 15% recovered all (monetary) losses.