Key Libraries | Python | Languages & Frameworks

Python's ecosystem is one of its greatest strengths. A handful of libraries dominate production Python across data engineering, web APIs, and application development. Knowing when to reach for which library — and how its key abstractions work — is as important as knowing the language itself.

Key Points

NumPy: N-dimensional array (ndarray) with vectorised C operations — broadcasting, universal functions (ufuncs), the foundation of the scientific Python stack
Pandas: DataFrame and Series built on NumPy — data wrangling, groupby, merge/join, time-series resampling
Requests: de facto HTTP client — Session for connection pooling and auth persistence; use httpx for async
Pydantic (v2): data validation and settings via Python type annotations — used by FastAPI for request/response models; v2 is Rust-backed, ~10x faster than v1
SQLAlchemy: SQL toolkit and ORM — Core (SQL expression language) and ORM (unit-of-work, identity map, lazy/eager loading)
FastAPI: async web framework built on Starlette + Pydantic — automatic OpenAPI docs, dependency injection, async endpoints
pytest: test framework with fixtures, parametrize, plugins — preferred over unittest in modern Python
Celery: distributed task queue — broker (Redis/RabbitMQ), worker processes, beat scheduler for periodic tasks

Library	Category	Key abstraction	When to use
NumPy	Numerical computing	ndarray + ufuncs	Array math, ML data prep
Pandas	Data analysis	DataFrame / Series	Tabular data, ETL, analytics
Requests / httpx	HTTP client	Session, Response	REST API calls
Pydantic v2	Validation / schemas	BaseModel, Field	Data contracts, FastAPI models
SQLAlchemy	Database ORM	Session, Query, Engine	Relational DB access
FastAPI	Web framework	Router, Depends	Async APIs with auto docs
Celery	Task queue	@app.task, delay()	Background/async task processing
pytest	Testing	fixture, parametrize	Unit, integration, BDD tests

Key Python libraries: NumPy vectorisation, Pandas groupby, Pydantic v2 validation, SQLAlchemy ORM, Requests session

import numpy as np
import pandas as pd
from pydantic import BaseModel, Field
from sqlalchemy import create_engine, select
from sqlalchemy.orm import Session

# NumPy — vectorised operations (no Python loops)
a = np.arange(1_000_000)
result = np.sqrt(a) * np.log(a + 1)   # 100x faster than list comprehension

# Pandas — groupby aggregation
df = pd.read_csv("orders.csv", parse_dates=["order_date"])
summary = (df
    .assign(month=df["order_date"].dt.to_period("M"))
    .groupby(["month", "product"])
    .agg(total=("revenue", "sum"), count=("id", "count"))
    .reset_index())

# Pydantic v2 — strict validation
class UserCreate(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    email: str
    age: int = Field(ge=0, le=150)

user = UserCreate(name="Alice", email="a@b.com", age=30)
# user.model_dump() → dict, user.model_json_schema() → JSON Schema

# SQLAlchemy ORM
engine = create_engine("postgresql+psycopg://user:pass@localhost/db")
with Session(engine) as session:
    stmt = select(User).where(User.age > 18).order_by(User.name)
    users = session.scalars(stmt).all()

# Requests with session (connection pool + auth)
import requests
session = requests.Session()
session.headers["Authorization"] = f"Bearer {token}"
resp = session.get("https://api.example.com/data", timeout=10)
resp.raise_for_status()
data = resp.json()

Real-World Example

Pydantic v2 (rewritten in Rust) validates data 5–50x faster than v1. FastAPI uses it for all request/response serialisation — a FastAPI app with Pydantic v2 can validate hundreds of thousands of requests per second. SQLAlchemy's lazy loading is the most common N+1 query source — always use selectinload() or joinedload() for relationships accessed in a loop.

←PreviousConcurrency & Async NextTesting→