Unsupervised learning finds hidden patterns in unlabelled data. Without explicit ground truth, algorithms discover structure through clustering (grouping similar points), dimensionality reduction (compressing features), or anomaly detection (finding outliers).

Key Points

  • Clustering: group data points so intra-cluster similarity is high and inter-cluster similarity is low
  • k-Means: assign each point to nearest centroid; iterate until convergence; requires k upfront
  • DBSCAN: density-based clustering; identifies arbitrary shapes; handles noise as outliers
  • Hierarchical Clustering: builds a dendrogram (tree of clusters); no need to specify k
  • PCA (Principal Component Analysis): linear dimensionality reduction; finds variance-maximising axes
  • t-SNE: non-linear 2D/3D visualisation of high-dimensional data; great for exploration
  • Autoencoders: neural network that learns a compressed representation (encoder → bottleneck → decoder)
  • Association Rule Mining: finds co-occurrence patterns (market basket analysis, Apriori algorithm)
  • Anomaly Detection: isolation forest, one-class SVM, autoencoders for detecting outliers

Real-World Example

Spotify groups listeners into "taste clusters" without knowing genre labels in advance — they let the algorithm discover that people who listen to Metallica also like Pantera and Black Sabbath, naturally forming a "Heavy Metal" cluster.