Key Libraries
NumPy, Pandas, Requests, Pydantic, SQLAlchemy essentials
Python's ecosystem is one of its greatest strengths. A handful of libraries dominate production Python across data engineering, web APIs, and application development. Knowing when to reach for which library — and how its key abstractions work — is as important as knowing the language itself.
Key Points
- NumPy: N-dimensional array (ndarray) with vectorised C operations — broadcasting, universal functions (ufuncs), the foundation of the scientific Python stack
- Pandas: DataFrame and Series built on NumPy — data wrangling, groupby, merge/join, time-series resampling
- Requests: de facto HTTP client — Session for connection pooling and auth persistence; use httpx for async
- Pydantic (v2): data validation and settings via Python type annotations — used by FastAPI for request/response models; v2 is Rust-backed, ~10x faster than v1
- SQLAlchemy: SQL toolkit and ORM — Core (SQL expression language) and ORM (unit-of-work, identity map, lazy/eager loading)
- FastAPI: async web framework built on Starlette + Pydantic — automatic OpenAPI docs, dependency injection, async endpoints
- pytest: test framework with fixtures, parametrize, plugins — preferred over unittest in modern Python
- Celery: distributed task queue — broker (Redis/RabbitMQ), worker processes, beat scheduler for periodic tasks
| Library | Category | Key abstraction | When to use |
|---|---|---|---|
| NumPy | Numerical computing | ndarray + ufuncs | Array math, ML data prep |
| Pandas | Data analysis | DataFrame / Series | Tabular data, ETL, analytics |
| Requests / httpx | HTTP client | Session, Response | REST API calls |
| Pydantic v2 | Validation / schemas | BaseModel, Field | Data contracts, FastAPI models |
| SQLAlchemy | Database ORM | Session, Query, Engine | Relational DB access |
| FastAPI | Web framework | Router, Depends | Async APIs with auto docs |
| Celery | Task queue | @app.task, delay() | Background/async task processing |
| pytest | Testing | fixture, parametrize | Unit, integration, BDD tests |
Key Python libraries: NumPy vectorisation, Pandas groupby, Pydantic v2 validation, SQLAlchemy ORM, Requests session
import numpy as np
import pandas as pd
from pydantic import BaseModel, Field
from sqlalchemy import create_engine, select
from sqlalchemy.orm import Session
# NumPy — vectorised operations (no Python loops)
a = np.arange(1_000_000)
result = np.sqrt(a) * np.log(a + 1) # 100x faster than list comprehension
# Pandas — groupby aggregation
df = pd.read_csv("orders.csv", parse_dates=["order_date"])
summary = (df
.assign(month=df["order_date"].dt.to_period("M"))
.groupby(["month", "product"])
.agg(total=("revenue", "sum"), count=("id", "count"))
.reset_index())
# Pydantic v2 — strict validation
class UserCreate(BaseModel):
name: str = Field(min_length=1, max_length=100)
email: str
age: int = Field(ge=0, le=150)
user = UserCreate(name="Alice", email="a@b.com", age=30)
# user.model_dump() → dict, user.model_json_schema() → JSON Schema
# SQLAlchemy ORM
engine = create_engine("postgresql+psycopg://user:pass@localhost/db")
with Session(engine) as session:
stmt = select(User).where(User.age > 18).order_by(User.name)
users = session.scalars(stmt).all()
# Requests with session (connection pool + auth)
import requests
session = requests.Session()
session.headers["Authorization"] = f"Bearer {token}"
resp = session.get("https://api.example.com/data", timeout=10)
resp.raise_for_status()
data = resp.json()Real-World Example
Pydantic v2 (rewritten in Rust) validates data 5–50x faster than v1. FastAPI uses it for all request/response serialisation — a FastAPI app with Pydantic v2 can validate hundreds of thousands of requests per second. SQLAlchemy's lazy loading is the most common N+1 query source — always use selectinload() or joinedload() for relationships accessed in a loop.