Data Structures | Python | Languages & Frameworks

Python's built-in data structures — list, dict, set, tuple — are highly optimised C implementations. The collections module extends these with specialised structures. Knowing the underlying implementation and time complexity of each operation is essential for writing efficient Python.

Key Points

list: dynamic array — O(1) append/index, O(n) insert/delete in middle, O(n) search
dict (Python 3.7+): ordered by insertion, O(1) average get/set/delete — backed by a hash table with open addressing
set: unordered, O(1) add/remove/contains — backed by hash table; frozenset is immutable and hashable
tuple: immutable sequence — faster than list for iteration, safe as dict key/set member, unpacking is idiomatic
collections.deque: double-ended queue backed by doubly-linked list of fixed-size blocks — O(1) appendleft/popleft (unlike list which is O(n) for popleft)
collections.defaultdict: like dict but returns a default value for missing keys — defaultdict(list), defaultdict(int)
collections.Counter: dict subclass for counting hashable objects — most_common(n), element arithmetic
collections.OrderedDict: dict with move_to_end() — useful for LRU cache implementation
heapq: min-heap operations on a regular list — heappush, heappop, nlargest, nsmallest — O(log n)

Structure	Access	Search	Insert/Delete	Notes
list	O(1)	O(n)	O(1) end / O(n) mid	Dynamic array
deque	O(n)	O(n)	O(1) both ends	Doubly-linked blocks
dict	O(1)	O(1) by key	O(1)	Hash table, insertion-ordered
set	N/A	O(1)	O(1)	Hash table, unordered
heapq (list)	O(1) min	O(n)	O(log n)	Min-heap
Counter	O(1)	O(1)	O(1)	dict subclass

Python data structures: defaultdict grouping, Counter frequency, deque BFS, heapq priority queue

from collections import defaultdict, Counter, deque
import heapq

# defaultdict — group items
groups: dict[str, list] = defaultdict(list)
for name, dept in employees:
    groups[dept].append(name)

# Counter — frequency analysis
words = "the cat sat on the mat the cat".split()
c = Counter(words)
print(c.most_common(2))   # [('the', 3), ('cat', 2)]
c["the"] += 10            # arithmetic on counts

# deque — sliding window / BFS queue
window = deque(maxlen=3)   # auto-discards oldest
for item in stream:
    window.append(item)    # O(1)

bfs_queue = deque([root])
while bfs_queue:
    node = bfs_queue.popleft()   # O(1) — use deque, NOT list.pop(0)
    bfs_queue.extend(node.children)

# heapq — k largest / priority queue
nums = [3, 1, 4, 1, 5, 9, 2, 6]
print(heapq.nlargest(3, nums))   # [9, 6, 5]

# Min-heap for Dijkstra / task scheduling
heap = []
heapq.heappush(heap, (priority, task_id, task))
prio, tid, task = heapq.heappop(heap)

# dict tricks
merged = {**dict1, **dict2}       # merge (dict2 wins on conflict)
filtered = {k: v for k, v in d.items() if v > 0}
inverted = {v: k for k, v in d.items()}

Real-World Example

deque.popleft() is O(1) while list.pop(0) is O(n) — for BFS on a graph with millions of nodes, the list version is orders of magnitude slower. Counter is the fastest way to build a frequency map — faster than a manual defaultdict loop because the counting is done in C. heapq is perfect for the "top-k" pattern: O(n log k) vs O(n log n) for full sort.

←PreviousOOP in Python NextFunctional Programming→