Feature Engineering | Machine Learning | AI / ML

Feature engineering is the process of transforming raw data into informative inputs that ML models can learn from effectively. For traditional ML (non-deep learning), features are often hand-crafted and have a massive impact on model performance.

Key Points

Normalisation / Standardisation: scale features to similar ranges (min-max or z-score) — critical for distance-based models
One-Hot Encoding: convert categorical variables to binary columns (e.g., colour → is_red, is_blue)
Label Encoding: map categories to integers — be careful of implicit ordering
Feature Selection: remove irrelevant/redundant features (correlation analysis, mutual information, LASSO)
Feature Extraction: create new features from raw data (e.g., TF-IDF from text, PCA components)
Handling Missing Data: impute with mean/median/mode, or use a model that handles nulls (XGBoost)
Outlier Treatment: remove, cap (winsorize), or transform (log) extreme values
Feature Interaction: create cross features (e.g., age × income for loan risk)
Time Series: lag features, rolling averages, calendar features (day of week, is_holiday)

Real-World Example

Airbnb's pricing model engineers hundreds of features from raw listing data: days since last booking, host response rate, neighbourhood review score trends. The features often matter more than the algorithm choice for tabular data.

←PreviousModel Evaluation NextML Pipeline & MLOps→