ML Algorithm Cheatsheet
How to choose the right algorithm — decision guide by problem type, data size, and accuracy needs
Choosing the right ML algorithm is one of the most common challenges in practice. The answer depends on four key factors: what kind of problem you have (classification, regression, clustering, anomaly detection), how much labelled data you have, speed vs accuracy trade-off, and linearity of your data.
This cheatsheet condenses the decision logic from the Microsoft Azure ML Algorithm Cheat Sheet — a widely used reference for algorithm selection.
Key Points
- Start with the simplest algorithm that could work — complexity is not always better
- No single "best" algorithm — try 2-3 candidates and compare on a validation set
- Data size matters: small data → simpler models; large data → deep learning or boosting
- Linearly separable data → logistic regression / linear SVM; non-linear → tree-based or neural net
- Accuracy vs speed: Random Forest > Logistic Regression in accuracy but slower to train
- Interpretability needed → Decision Tree, Logistic Regression, Linear Regression
- Tabular data → Gradient Boosting (XGBoost/LightGBM) almost always wins competitions
- Image / audio / text → Deep Learning (CNN, Transformer)
- No labels available → Unsupervised (K-Means, DBSCAN, PCA, Autoencoders)
- Rare events or fraud → Anomaly Detection (Isolation Forest, One-Class SVM)
Algorithm selection flowchart — based on the Microsoft Azure ML Cheat Sheet
| Problem Type | Best Algorithms | When to Use | Avoid When |
|---|---|---|---|
| Binary Classification | Logistic Regression, SVM, XGBoost | Spam/not-spam, churn yes/no, fraud yes/no | Output needs to be a continuous value |
| Multi-class Classification | Random Forest, XGBoost, Neural Network | 3+ categories: sentiment, topic, digit recognition | Only 2 classes — use binary instead |
| Regression | Linear Regression, XGBoost, Neural Network | Price, temperature, demand forecasting | Target is a category not a number |
| Clustering | K-Means, DBSCAN, Hierarchical | Customer segmentation, topic discovery, no labels | You have labels — use classification instead |
| Anomaly Detection | Isolation Forest, One-Class SVM, Autoencoder | Fraud, defects, network intrusions — rare events | Normal events are as rare as anomalies |
| Recommendation | Collaborative Filtering, Matrix Factorisation | Product, movie, content recommendations | No user interaction history exists |
| Time Series | ARIMA, LSTM, Prophet, LightGBM | Demand forecast, stock prices, sensor data | Data has no temporal dependency |
| NLP / Text | TF-IDF + LR, BERT, Transformer | Sentiment, NER, classification, generation | Data is not text/sequence based |
| Image / Vision | CNN, ResNet, ViT (Vision Transformer) | Object detection, image classification, OCR | Dataset is too small (< few thousand images) |
Real-World Example
Kaggle competition winners almost always use XGBoost or LightGBM for structured/tabular data, and fine-tuned Transformers for text/image tasks. The Microsoft Azure ML Cheat Sheet is the go-to reference for enterprise teams deciding which algorithm to try first — it has been downloaded millions of times.