Neural Networks & Deep Learning | Machine Learning | AI / ML

Neural networks are computational models loosely inspired by the brain. Stacked layers of neurons learn hierarchical representations — from pixels to edges to shapes to objects. Deep Learning refers to neural networks with many (deep) layers, enabling them to tackle complex problems in vision, language, and beyond.

Key Points

Perceptron: single neuron — weighted sum of inputs plus bias, passed through activation function
Activation Functions: ReLU (most common), Sigmoid (binary output), Softmax (multi-class probabilities)
Backpropagation: computes gradients via chain rule; how the network learns from errors
CNN (Convolutional Neural Network): specialised for images; uses convolutional filters to extract features
RNN / LSTM: designed for sequences; LSTM solves the vanishing gradient problem
Transformer: attention-based architecture; processes whole sequences in parallel; basis of GPT, BERT
Attention Mechanism: allows the model to focus on relevant parts of input — O(n²) complexity
Batch Normalisation: normalises layer inputs for faster training and stability
Dropout: randomly zeros neurons during training to prevent overfitting
Transfer Learning: start with a pre-trained model (e.g., ResNet, BERT); fine-tune on your task

Real-World Example

GPT-4 is a Transformer with an estimated ~1.76 trillion parameters. It was trained on trillions of tokens of text using next-token prediction. The self-attention mechanism is what enables it to understand long-range context within a conversation.

←PreviousReinforcement Learning NextModel Evaluation→