Python's built-in data structures — list, dict, set, tuple — are highly optimised C implementations. The collections module extends these with specialised structures. Knowing the underlying implementation and time complexity of each operation is essential for writing efficient Python.

Key Points

  • list: dynamic array — O(1) append/index, O(n) insert/delete in middle, O(n) search
  • dict (Python 3.7+): ordered by insertion, O(1) average get/set/delete — backed by a hash table with open addressing
  • set: unordered, O(1) add/remove/contains — backed by hash table; frozenset is immutable and hashable
  • tuple: immutable sequence — faster than list for iteration, safe as dict key/set member, unpacking is idiomatic
  • collections.deque: double-ended queue backed by doubly-linked list of fixed-size blocks — O(1) appendleft/popleft (unlike list which is O(n) for popleft)
  • collections.defaultdict: like dict but returns a default value for missing keys — defaultdict(list), defaultdict(int)
  • collections.Counter: dict subclass for counting hashable objects — most_common(n), element arithmetic
  • collections.OrderedDict: dict with move_to_end() — useful for LRU cache implementation
  • heapq: min-heap operations on a regular list — heappush, heappop, nlargest, nsmallest — O(log n)
StructureAccessSearchInsert/DeleteNotes
listO(1)O(n)O(1) end / O(n) midDynamic array
dequeO(n)O(n)O(1) both endsDoubly-linked blocks
dictO(1)O(1) by keyO(1)Hash table, insertion-ordered
setN/AO(1)O(1)Hash table, unordered
heapq (list)O(1) minO(n)O(log n)Min-heap
CounterO(1)O(1)O(1)dict subclass

Python data structures: defaultdict grouping, Counter frequency, deque BFS, heapq priority queue

from collections import defaultdict, Counter, deque
import heapq

# defaultdict — group items
groups: dict[str, list] = defaultdict(list)
for name, dept in employees:
    groups[dept].append(name)

# Counter — frequency analysis
words = "the cat sat on the mat the cat".split()
c = Counter(words)
print(c.most_common(2))   # [('the', 3), ('cat', 2)]
c["the"] += 10            # arithmetic on counts

# deque — sliding window / BFS queue
window = deque(maxlen=3)   # auto-discards oldest
for item in stream:
    window.append(item)    # O(1)

bfs_queue = deque([root])
while bfs_queue:
    node = bfs_queue.popleft()   # O(1) — use deque, NOT list.pop(0)
    bfs_queue.extend(node.children)

# heapq — k largest / priority queue
nums = [3, 1, 4, 1, 5, 9, 2, 6]
print(heapq.nlargest(3, nums))   # [9, 6, 5]

# Min-heap for Dijkstra / task scheduling
heap = []
heapq.heappush(heap, (priority, task_id, task))
prio, tid, task = heapq.heappop(heap)

# dict tricks
merged = {**dict1, **dict2}       # merge (dict2 wins on conflict)
filtered = {k: v for k, v in d.items() if v > 0}
inverted = {v: k for k, v in d.items()}

Real-World Example

deque.popleft() is O(1) while list.pop(0) is O(n) — for BFS on a graph with millions of nodes, the list version is orders of magnitude slower. Counter is the fastest way to build a frequency map — faster than a manual defaultdict loop because the counting is done in C. heapq is perfect for the "top-k" pattern: O(n log k) vs O(n log n) for full sort.