Neural Networks & Deep Learning
Perceptrons, backpropagation, CNNs, RNNs, transformers, attention mechanism
Neural networks are computational models loosely inspired by the brain. Stacked layers of neurons learn hierarchical representations — from pixels to edges to shapes to objects. Deep Learning refers to neural networks with many (deep) layers, enabling them to tackle complex problems in vision, language, and beyond.
Key Points
- Perceptron: single neuron — weighted sum of inputs plus bias, passed through activation function
- Activation Functions: ReLU (most common), Sigmoid (binary output), Softmax (multi-class probabilities)
- Backpropagation: computes gradients via chain rule; how the network learns from errors
- CNN (Convolutional Neural Network): specialised for images; uses convolutional filters to extract features
- RNN / LSTM: designed for sequences; LSTM solves the vanishing gradient problem
- Transformer: attention-based architecture; processes whole sequences in parallel; basis of GPT, BERT
- Attention Mechanism: allows the model to focus on relevant parts of input — O(n²) complexity
- Batch Normalisation: normalises layer inputs for faster training and stability
- Dropout: randomly zeros neurons during training to prevent overfitting
- Transfer Learning: start with a pre-trained model (e.g., ResNet, BERT); fine-tune on your task
Real-World Example
GPT-4 is a Transformer with an estimated ~1.76 trillion parameters. It was trained on trillions of tokens of text using next-token prediction. The self-attention mechanism is what enables it to understand long-range context within a conversation.