Data Structures
Lists, tuples, sets, dicts, deque, heapq, defaultdict, Counter
Python's built-in data structures — list, dict, set, tuple — are highly optimised C implementations. The collections module extends these with specialised structures. Knowing the underlying implementation and time complexity of each operation is essential for writing efficient Python.
Key Points
- list: dynamic array — O(1) append/index, O(n) insert/delete in middle, O(n) search
- dict (Python 3.7+): ordered by insertion, O(1) average get/set/delete — backed by a hash table with open addressing
- set: unordered, O(1) add/remove/contains — backed by hash table; frozenset is immutable and hashable
- tuple: immutable sequence — faster than list for iteration, safe as dict key/set member, unpacking is idiomatic
- collections.deque: double-ended queue backed by doubly-linked list of fixed-size blocks — O(1) appendleft/popleft (unlike list which is O(n) for popleft)
- collections.defaultdict: like dict but returns a default value for missing keys — defaultdict(list), defaultdict(int)
- collections.Counter: dict subclass for counting hashable objects — most_common(n), element arithmetic
- collections.OrderedDict: dict with move_to_end() — useful for LRU cache implementation
- heapq: min-heap operations on a regular list — heappush, heappop, nlargest, nsmallest — O(log n)
| Structure | Access | Search | Insert/Delete | Notes |
|---|---|---|---|---|
| list | O(1) | O(n) | O(1) end / O(n) mid | Dynamic array |
| deque | O(n) | O(n) | O(1) both ends | Doubly-linked blocks |
| dict | O(1) | O(1) by key | O(1) | Hash table, insertion-ordered |
| set | N/A | O(1) | O(1) | Hash table, unordered |
| heapq (list) | O(1) min | O(n) | O(log n) | Min-heap |
| Counter | O(1) | O(1) | O(1) | dict subclass |
Python data structures: defaultdict grouping, Counter frequency, deque BFS, heapq priority queue
from collections import defaultdict, Counter, deque
import heapq
# defaultdict — group items
groups: dict[str, list] = defaultdict(list)
for name, dept in employees:
groups[dept].append(name)
# Counter — frequency analysis
words = "the cat sat on the mat the cat".split()
c = Counter(words)
print(c.most_common(2)) # [('the', 3), ('cat', 2)]
c["the"] += 10 # arithmetic on counts
# deque — sliding window / BFS queue
window = deque(maxlen=3) # auto-discards oldest
for item in stream:
window.append(item) # O(1)
bfs_queue = deque([root])
while bfs_queue:
node = bfs_queue.popleft() # O(1) — use deque, NOT list.pop(0)
bfs_queue.extend(node.children)
# heapq — k largest / priority queue
nums = [3, 1, 4, 1, 5, 9, 2, 6]
print(heapq.nlargest(3, nums)) # [9, 6, 5]
# Min-heap for Dijkstra / task scheduling
heap = []
heapq.heappush(heap, (priority, task_id, task))
prio, tid, task = heapq.heappop(heap)
# dict tricks
merged = {**dict1, **dict2} # merge (dict2 wins on conflict)
filtered = {k: v for k, v in d.items() if v > 0}
inverted = {v: k for k, v in d.items()}Real-World Example
deque.popleft() is O(1) while list.pop(0) is O(n) — for BFS on a graph with millions of nodes, the list version is orders of magnitude slower. Counter is the fastest way to build a frequency map — faster than a manual defaultdict loop because the counting is done in C. heapq is perfect for the "top-k" pattern: O(n log k) vs O(n log n) for full sort.