Ch14 Graphs · Ahmet Çelik

Numbering note: Ch14 Graphs aligns in both course and textbook. Textbook §14.X citations use the textbook’s own section numbers. Union-Find is §14.7.3.

Source note: Graph algorithms in slides are given as pseudocode using instructor labels: setLabel/getLabel, UNEXPLORED/VISITED/DISCOVERY/BACK/CROSS/FORWARD, opposite(v,e), incidentEdges(v), level lists L0, L1, …. That pseudocode is preserved below as the exam artifact. Java implementations are added alongside (textbook/general background).

0. Overview

A graph is a pair G = (V, E): V is a set of vertices, E is a collection of edges (pairs of vertices); vertices and edges are positions storing elements (slide 2).

Edge types (slide 3): A directed edge is an ordered pair (u,v) with origin u and destination v. An undirected edge is an unordered pair. A digraph has all edges directed; an undirected graph has all edges undirected. A weighted graph has numerical weights on edges.

Key vocabulary:

Endpoints: the two vertices an edge connects.
Incident: an edge is incident on each endpoint.
Adjacent: two vertices joined by an edge.
Degree deg(v): number of edges incident on v.
Parallel edges / self-loops: slide 5.
Path / simple path: alternating vertex-edge sequence; simple = all distinct.
Cycle / simple cycle: circular path; simple = all distinct.
Connected: path between every pair of vertices (slide 38).
Connected component: maximal connected subgraph (slide 38).
(Free) tree: connected, acyclic undirected graph (slide 39).
Forest: acyclic undirected graph; components are trees (slide 39).
Spanning tree: spanning subgraph that is a tree (slide 40).
DAG: directed graph with no directed cycles (slide 78).

(n) = number of vertices in the graph, i.e. (n = |V|).

(m) = number of edges in the graph, i.e. (m = |E|).

Properties (slide 8):

Σ_v deg(v) = 2m (each edge counted twice — the key accounting identity behind every O(n+m) bound).
Undirected simple: m ≤ n(n−1)/2.
Directed simple: m ≤ n(n−1) (slide 58).

1. Graph Representations

Edge List Structure (slides 12–13)

Stores a sequence of all vertex objects and a sequence of all edge objects; each edge holds references to its two endpoint vertex objects. No per-vertex index of edges — answering adjacency/incidence requires scanning the whole edge list.

Operation	Edge List	Why
Space	O(n + m)	One object per vertex and per edge.
incidentEdges(v)	O(m)	No per-vertex index; must scan all edges.
areAdjacent(v,w)	O(m)	Same reason.
insertEdge	O(1)	Append to edge sequence.
removeEdge	O(1)	Splice out by stored position reference.
removeVertex(v)	O(m)	Must scan all edges to find and delete v’s edges.

Adjacency List Structure (slides 14–15)

Augments the edge list with, for each vertex, an incidence sequence of references to its incident edges; edges store back-references to their positions in both endpoints’ incidence sequences.

Operation	Adj List	Why
Space	O(n + m)	Each edge appears in exactly 2 incidence lists.
incidentEdges(v)	O(deg(v))	Walk v’s own incidence sequence.
areAdjacent(v,w)	O(min(deg(v),deg(w)))	Scan the shorter incidence sequence.
insertEdge	O(1)	Append to both endpoints’ sequences.
removeEdge	O(1)	Back-references enable O(1) splice from both sequences.
removeVertex(v)	O(deg(v))	Only v’s incident edges need removal via its list.

Adjacency Matrix Structure (slides 16–17)

Vertices get integer indices 0…n−1; an n×n 2D array stores the edge object for adjacent pairs (null otherwise). O(1) adjacency tests but O(n²) space and vertex-add cost.

Operation	Adj Matrix	Why
Space	O(n²)	One cell per ordered vertex pair, edge or not.
incidentEdges(v)	O(n)	Scan v’s full row/column.
areAdjacent(v,w)	O(1)	Direct cell lookup.
insertEdge	O(1)	Write one cell.
removeEdge	O(1)	Clear one cell.
removeVertex(v)	O(n²)	Must resize/rebuild the matrix.

Comparison Table (slide 18)

Operation	Edge List	Adj List	Adj Matrix	Why
Space	O(n+m)	O(n+m)	O(n²)	Matrix reserves a cell per vertex pair regardless of edges.
incidentEdges(v)	O(m)	O(deg(v))	O(n)	Only adj list indexes per-vertex.
areAdjacent(v,w)	O(m)	O(min(deg(v),deg(w)))	O(1)	Matrix answers by single lookup.
insertVertex	O(1)	O(1)	O(n²)	Matrix must grow in both dimensions.
insertEdge	O(1)	O(1)	O(1)	Constant-size update in all three.
removeVertex(v)	O(m)	O(deg(v))	O(n²)	Cost tracks how the structure indexes v’s edges.
removeEdge	O(1)	O(1)	O(1)	Remove one object/cell.

2. Depth-First Search (DFS)

Strategy: Recursively explore as far along each branch as possible before backtracking — “DFS is to graphs what an Euler tour is to binary trees” (slide 41).

Invariant:

Every reachable vertex is labeled VISITED exactly once; each edge is labeled DISCOVERY (reached a new vertex) or BACK (led to an already-visited vertex). DISCOVERY edges form a spanning tree of the connected component (slide 48).

Slide pseudocode (slide 42):

Algorithm DFS(G, v)
    setLabel(v, VISITED)
    for all e in G.incidentEdges(v)
        if getLabel(e) = UNEXPLORED
            w <- opposite(v, e)
            if getLabel(w) = UNEXPLORED
                setLabel(e, DISCOVERY)
                DFS(G, w)
            else
                setLabel(e, BACK)

Algorithm DFS(G)                    // whole graph
    for all u in G.vertices()   setLabel(u, UNEXPLORED)
    for all e in G.edges()      setLabel(e, UNEXPLORED)
    for all v in G.vertices()
        if getLabel(v) = UNEXPLORED   DFS(G, v)

Java (textbook/general background, §14.3.1):

public static <V,E> void DFS(Graph<V,E> g, Vertex<V> u,
                             Set<Vertex<V>> known, Map<Vertex<V>,Edge<E>> forest) {
    known.add(u);
    for (Edge<E> e : g.outgoingEdges(u)) {
        Vertex<V> w = g.opposite(u, e);
        if (!known.contains(w)) {
            forest.put(w, e);
            DFS(g, w, known, forest);
        }
    }
}

Quantity	Cost	Why
Time	O(n + m)	Each vertex labeled twice, each edge twice; incidentEdges once per vertex; Σ deg(v) = 2m (slide 49). Requires adjacency-list structure.
Space	O(n)	Recursion stack depth up to n on a path-shaped graph.

Trace (slides 45–46): Start at A; advance along first unexplored neighbour, marking discovery edges; mark back edges when a visited vertex is found.

Path finding (slide 50, pathDFS): Push vertices/discovery edges onto stack S; when z is reached, return S contents as the path.

Cycle finding (slide 52, cycleDFS): On a back edge (v,w), pop the stack from top down to w — that’s the cycle.

Directed DFS (slide 60): Traverse edges along their direction only; yields four edge types: discovery, back, forward, cross. Starting at s discovers exactly the vertices reachable from s.

3. Breadth-First Search (BFS)

Strategy: Explore in levels L0, L1, L2, …, visiting all vertices at distance i before distance i+1.

Invariant:

A vertex in L_i sits at exactly i discovery edges from s, and every path from s to that vertex uses at least i edges (slide 27, Property 3) — this is why BFS finds unweighted shortest paths.

Slide pseudocode (slide 22):

Algorithm BFS(G, s)
    L0 <- new empty sequence;  L0.addLast(s);  setLabel(s, VISITED)
    i <- 0
    while not Li.isEmpty()
        L(i+1) <- new empty sequence
        for all v in Li.elements()
            for all e in G.incidentEdges(v)
                if getLabel(e) = UNEXPLORED
                    w <- opposite(v, e)
                    if getLabel(w) = UNEXPLORED
                        setLabel(e, DISCOVERY)
                        setLabel(w, VISITED)
                        L(i+1).addLast(w)
                    else
                        setLabel(e, CROSS)
        i <- i + 1

Algorithm BFS(G)
    for all u in G.vertices()  setLabel(u, UNEXPLORED)
    for all e in G.edges()     setLabel(e, UNEXPLORED)
    for all v in G.vertices()
        if getLabel(v) = UNEXPLORED  BFS(G, v)

Java (textbook/general background, §14.3.3):

public static <V,E> void BFS(Graph<V,E> g, Vertex<V> s,
                             Set<Vertex<V>> known, Map<Vertex<V>,Edge<E>> forest) {
    Queue<Vertex<V>> queue = new LinkedList<>();
    known.add(s); queue.add(s);
    while (!queue.isEmpty()) {
        Vertex<V> u = queue.remove();   // FIFO = level order
        for (Edge<E> e : g.outgoingEdges(u)) {
            Vertex<V> w = g.opposite(u, e);
            if (!known.contains(w)) {
                known.add(w);
                forest.put(w, e);       // DISCOVERY edge
                queue.add(w);
            }   // else: CROSS edge
        }
    }
}

Note: Slides use explicit level lists L0, L1, …; the FIFO queue is the equivalent single-structure form. (Textbook §14.3.3)

Quantity	Cost	Why
Time	O(n + m)	Each vertex/edge labeled constant times; incidentEdges once per vertex; Σ deg(v) = 2m (slide 34). Requires adjacency-list structure.
Space	O(n)	Frontier holds at most O(n) vertices.

Applications (slide 35, all O(n+m)): connected components, spanning forest, simple cycle detection, minimum-edge path.

Contrast DFS vs BFS: Both visit all vertices/edges in O(n+m) and produce spanning trees. DFS uses a stack (recursion), needs O(depth) frontier memory, suits cycle detection/reachability/topological sort/strong connectivity. BFS uses a FIFO queue, may hold O(n) frontier, uniquely guarantees minimum-edge (unweighted shortest) paths. Choose BFS for fewest-edge distances; choose DFS for structure/reachability questions.

4. Directed Graphs (Digraphs)

A digraph has all edges directed (slide 57); edge (a,b) goes from a to b only. Keeping separate in-edge and out-edge adjacency lists allows O(deg) listing of in/out edges (slide 58).

Strong connectivity (slide 62): every vertex can reach every other. Test (slide 63, O(n+m)): pick any v; DFS from v in G (fail if any vertex unreached); DFS from v in the reversed graph G’ (fail if any vertex unreached); else strongly connected.

Strongly connected components (slide 64): maximal strongly connected subgraphs; computable in O(n+m) by DFS.

Transitive closure (slides 65–66): G* has same vertices as G, plus edge u→v whenever G has a directed path from u to v. Computed by DFS from each vertex in O(n(n+m)), or by Floyd-Warshall DP.

DAGs and Topological Ordering

A DAG is a digraph with no directed cycles (slide 78). A topological ordering numbers vertices v1,…,vn such that edge (vi,vj) ⇒ i < j.

Theorem (slide 78): A digraph admits a topological ordering if and only if it is a DAG. A directed cycle would force a vertex before itself — impossible.

Slide algorithm 1 — removal method (slide 80; “different than the one in the book”):

Algorithm TopologicalSort(G)
    H <- G;  n <- G.numVertices()
    while H is not empty do
        let v be a vertex with no outgoing edges
        label v <- n;  n <- n - 1;  remove v from H

Slide algorithm 2 — DFS method (slide 81):

Algorithm topologicalDFS(G, v)
    setLabel(v, VISITED)
    for all e in G.outEdges(v)            // outgoing edges
        w <- opposite(v, e)
        if getLabel(w) = UNEXPLORED       // discovery edge
            topologicalDFS(G, w)
        // else forward or cross edge
    label v with topological number n;  n <- n - 1

Both run in O(n+m) (slides 80–81).

Java (DFS-based, §14.5.1):

public static <V,E> List<Vertex<V>> topologicalSort(Graph<V,E> g) {
    List<Vertex<V>> topo = new ArrayList<>();
    Set<Vertex<V>> known = new HashSet<>();
    Deque<Vertex<V>> ready = new ArrayDeque<>();
    for (Vertex<V> u : g.vertices())
        if (!known.contains(u)) tsDFS(g, u, known, ready);
    while (!ready.isEmpty()) topo.add(ready.pop());
    return topo;
}
private static <V,E> void tsDFS(Graph<V,E> g, Vertex<V> u,
                                Set<Vertex<V>> known, Deque<Vertex<V>> ready) {
    known.add(u);
    for (Edge<E> e : g.outgoingEdges(u)) {
        Vertex<V> w = g.opposite(u, e);
        if (!known.contains(w)) tsDFS(g, w, known, ready);
    }
    ready.push(u);   // assign topological number on finish
}

Trace (slides 82–91): DFS assigns numbers on finish from 9 down to 1; sink-like vertices get the smallest finish numbers.

5. Weighted Graphs and Shortest Paths

In a weighted graph each edge has a weight; path length = sum of edge weights (slides 94–95). Structural facts (slide 96): a subpath of a shortest path is shortest (optimal substructure); there exists a shortest-path tree from the source.

Dijkstra’s Algorithm

Strategy: Greedy cloud growth (slide 97) — repeatedly absorb the nearest vertex outside the cloud, then relax its outgoing edges.

Invariant:

d(v) = shortest path distance from s found so far. When the closest outside vertex u is added, d(u) is final: any alternative route would pass through a vertex with an equal-or-larger label and only add nonneg weight (slide 105).

Assumptions (slide 97): connected, undirected, nonneg edge weights.

Edge relaxation (slide 98): d(z) <- min{ d(z), d(u) + weight(e) } for edge e = (u, z) where u was just added.

Java (§14.6.2; slide 101 pseudocode was image-only):

public static <V> Map<Vertex<V>,Integer> dijkstra(Graph<V,Integer> g, Vertex<V> src) {
    Map<Vertex<V>,Integer> d = new HashMap<>();
    for (Vertex<V> v : g.vertices()) d.put(v, Integer.MAX_VALUE);
    d.put(src, 0);
    PriorityQueue<Vertex<V>> pq =
        new PriorityQueue<>(Comparator.comparingInt(d::get));
    pq.addAll(d.keySet());
    Set<Vertex<V>> cloud = new HashSet<>();
    while (!pq.isEmpty()) {
        Vertex<V> u = pq.remove();
        if (d.get(u) == Integer.MAX_VALUE) break;
        cloud.add(u);
        for (Edge<Integer> e : g.outgoingEdges(u)) {
            Vertex<V> z = g.opposite(u, e);
            if (!cloud.contains(z)) {
                int relaxed = d.get(u) + e.getElement();
                if (relaxed < d.get(z)) {
                    d.put(z, relaxed);
                    pq.remove(z); pq.add(z);
                }
            }
        }
    }
    return d;
}

Quantity	Cost	Why
Time	O((n+m) log n) = O(m log n)	Each vertex inserted/removed from PQ once; each edge can trigger a key-decrease; all O(log n). Needs adjacency-list structure (slide 102).
Space	O(n + m)	Labels + PQ.

Why correct (slide 105): greedy, adds by increasing distance. First wrong vertex F would require its predecessor D to have been wrong too — contradiction.

Why it fails on negative weights (slide 106): a late-added negative-incident vertex can invalidate already-finalized cloud distances.

Bellman-Ford Algorithm

Note: Labeled “not in book” on slide 107.

Slide pseudocode (slide 107):

Algorithm BellmanFord(G, s)
    for all v in G.vertices()
        if v = s  setDistance(v, 0)
        else      setDistance(v, inf)
    for i <- 1 to n-1 do
        for each e in G.edges()          // relax edge e
            u <- G.origin(e)
            z <- G.opposite(u, e)
            r <- getDistance(u) + weight(e)
            if r < getDistance(z)  setDistance(z, r)

Properties (slide 107): works with negative-weight edges; must assume directed edges; O(nm); can detect a negative-weight cycle (if an n-th pass still improves a distance, a negative cycle exists).

Contrast Dijkstra vs BFS: BFS finds fewest-edge paths in O(n+m) when weights are all equal (unweighted). Dijkstra generalizes to nonneg weighted graphs at O(m log n) by using a PQ keyed on tentative distance rather than FIFO order. Use BFS for unweighted shortest paths; Dijkstra for nonneg weights; Bellman-Ford for negative weights.

6. Minimum Spanning Trees

An MST is a spanning tree of minimum total edge weight (slide 112).

Cycle property (slide 113): For MST T and edge e not in T, every edge f on the cycle e forms with T satisfies weight(f) ≤ weight(e).

Partition property (slide 114): For any partition of V into U and V∖U, a minimum-weight edge crossing the partition belongs to some MST.

Prim-Jarník’s Algorithm

MIT 6.046J — Demaine’s framing: The whole MST story runs on one lemma: the minimum-weight edge crossing any cut (S, V∖S) is guaranteed to belong to some MST (proved by a cut-and-paste argument — remove whatever edge T* used to cross that cut, drop in the lighter one; weight can only decrease, so the result is still optimal). Prim applies this lemma with a specific, growing cut: S starts as a single vertex s (key = 0, all others ∞), and at every step you absorb the cheapest edge that exits S. Mechanically it is Dijkstra with one change: instead of relaxing d(v) = min{d(v), d(u) + w(u,v)}, you store only the edge weight w(u,v) — you are not accumulating a path, you are choosing the cheapest attachment. Extract min-key vertex u; for each neighbor v outside S, if w(u,v) < key(v) then key(v) ← w(u,v) and v.parent ← u (this is a decrease-key on the PQ, same as Dijkstra). When u is extracted, the edge (u, u.parent) is the minimum-weight edge crossing the cut at that moment, so the cut lemma certifies it belongs to some MST. Running time is identical to Dijkstra: O((n+m) log n) with a binary heap over an adjacency list.

Strategy: Grow one MST as a cloud from an arbitrary start, always attaching the cheapest edge connecting a new vertex to the cloud (slide 115).

Invariant:

d(v) = smallest weight of any edge connecting v to the current cloud. Repeatedly adding the minimum-d outside vertex is safe by the partition property.

Java (§14.7.1; slide 116 pseudocode was image-only):

public static <V> Set<Edge<Integer>> prim(Graph<V,Integer> g, Vertex<V> start) {
    Map<Vertex<V>,Integer> d = new HashMap<>();
    Map<Vertex<V>,Edge<Integer>> con = new HashMap<>();
    for (Vertex<V> v : g.vertices()) d.put(v, Integer.MAX_VALUE);
    d.put(start, 0);
    PriorityQueue<Vertex<V>> pq =
        new PriorityQueue<>(Comparator.comparingInt(d::get));
    pq.addAll(d.keySet());
    Set<Vertex<V>> cloud = new HashSet<>();
    Set<Edge<Integer>> mst = new HashSet<>();
    while (!pq.isEmpty()) {
        Vertex<V> u = pq.remove();
        cloud.add(u);
        if (con.containsKey(u)) mst.add(con.get(u));
        for (Edge<Integer> e : g.outgoingEdges(u)) {
            Vertex<V> z = g.opposite(u, e);
            if (!cloud.contains(z) && e.getElement() < d.get(z)) {
                d.put(z, e.getElement());
                con.put(z, e);
                pq.remove(z); pq.add(z);
            }
        }
    }
    return mst;
}

Quantity	Cost	Why
Time	O((n+m) log n) = O(m log n)	Same accounting as Dijkstra (slide 129).
Space	O(n + m)

Trace (slides 117–118): Start at A, update fringe labels to lightest connecting edge each step, absorb minimum-label vertex.

Kruskal’s Algorithm

Strategy: Maintain vertex clusters; repeatedly take the globally lightest edge; if it joins two different clusters, add it and merge (slide 130). End: one cluster, one MST.

Invariant:

Edges considered in nondecreasing weight order; an edge is added only when its endpoints are in different clusters, so no cycle can form. By the cycle property, the greedy choice is always safe.

Java (§14.7.2; slide 131 pseudocode was image-only):

public static <V> List<Edge<Integer>> kruskal(Graph<V,Integer> g) {
    List<Edge<Integer>> mst = new ArrayList<>();
    PriorityQueue<Edge<Integer>> pq =
        new PriorityQueue<>(Comparator.comparingInt(Edge::getElement));
    for (Edge<Integer> e : g.edges()) pq.add(e);
    UnionFind<Vertex<V>> uf = new UnionFind<>(g.vertices());
    while (!pq.isEmpty() && mst.size() < g.numVertices() - 1) {
        Edge<Integer> e = pq.remove();
        Vertex<V>[] ends = g.endVertices(e);
        if (uf.find(ends[0]) != uf.find(ends[1])) {
            mst.add(e);
            uf.union(ends[0], ends[1]);
        }
    }
    return mst;
}

Quantity	Cost	Why
Time	O(m log n)	Sorting/PQ-ordering m edges = O(m log m) = O(m log n); union-find is near-linear (§7).
Space	O(n + m)

Trace (slides 132–133): Add edges in weight order 1, 2, 3, …, skipping any that join a vertex to its own cluster.

Contrast Prim vs Kruskal: Prim grows one cloud, uses vertex-PQ (Dijkstra-like). Kruskal grows a forest of clusters, uses edge-PQ + union-find. Both O(m log n) and both justified by cycle/partition properties.

7. Union-Find (slides 156–163)

Used by Kruskal. Three operations: makeSet(x), union(A,B), find(p) (slide 157).

Tree-based implementation (slide 160): each set is a tree rooted at a node with a self-referencing set-pointer. union makes one root point to the other; find follows pointers to the root.

Variant	n-operation cost	Why
Plain tree	O(n²) worst	Unbalanced unions can give O(n) path length.
• Union by size (slide 162)	O(n log n)	Smaller tree points to larger; each pointer-follow at least doubles subtree size → O(log n) per find.
• Path compression (slide 163)	O(n log* n)	After find, every node on the path is re-pointed to root; “proof not in book.”

8. Common Exam Traps

“DFS/BFS both find shortest paths.” Only BFS does, and only for unweighted graphs (slide 27). DFS gives spanning tree + reachability, not distances.
“Dijkstra works with negative edges.” False — a late-added negative edge corrupts finalized distances (slide 106). Use Bellman-Ford.
“Adjacency matrix is more space-efficient.” False for sparse graphs: O(n²) always vs O(n+m) for lists (slide 18).
“areAdjacent is O(1) in an adjacency list.” False — O(min(deg(v),deg(w))); only the matrix is O(1) (slide 18).
“Any digraph has a topological ordering.” False — only DAGs do (slide 78 theorem).
“DFS on undirected graph can produce cross edges.” False — undirected DFS yields only discovery + back; cross/forward arise in directed DFS (slide 60).
“BFS uses a stack.” No — BFS uses a FIFO queue (level lists); a stack turns it into DFS.
“Dijkstra/Prim are O(n²) here.” No — with adjacency list + PQ, both are O(m log n) (slides 102, 129).
Σ deg(v) = 2m is the key identity behind every O(n+m) traversal bound (slides 8, 34, 49).

9. Quick-Reference Summary

Use BFS for unweighted shortest paths and level structure; use DFS for reachability, cycle detection, topological ordering, and strong connectivity. Pick adjacency list by default (O(n+m) space, O(deg) incidence); use adjacency matrix only when O(1) adjacency tests matter on a dense graph. For nonneg-weight shortest paths use Dijkstra (O(m log n)); use Bellman-Ford (O(nm)) for negative weights. For MST use Prim-Jarník or Kruskal, both O(m log n). For task scheduling on a DAG use topological sort (O(n+m)).

Quick-Reference Complexity Summary

Algorithm / Structure	Best	Average	Worst	Space	Notes
Edge List	—	—	—	O(n+m)	O(m) adjacency (slide 18)
Adjacency List	—	—	—	O(n+m)	O(deg) incidence; default (slide 18)
Adjacency Matrix	—	—	—	O(n²)	O(1) areAdjacent; O(n²) vertex ops (slide 18)
DFS	O(n+m)	O(n+m)	O(n+m)	O(n)	spanning tree, reachability (slide 49)
BFS	O(n+m)	O(n+m)	O(n+m)	O(n)	unweighted shortest paths (slide 34)
Topological Sort	O(n+m)	O(n+m)	O(n+m)	O(n)	DAG only (slides 80–81)
Dijkstra	O(m log n)	O(m log n)	O(m log n)	O(n+m)	nonneg weights (slide 102)
Bellman-Ford	O(nm)	O(nm)	O(nm)	O(n)	neg weights; directed; “not in book” (slide 107)
Prim-Jarník	O(m log n)	O(m log n)	O(m log n)	O(n+m)	one cloud (slide 129)
Kruskal	O(m log n)	O(m log n)	O(m log n)	O(n+m)	union-find clusters (slides 130–131)
Union-Find (n ops)	—	—	O(n log* n)	O(n)	union-by-size + path compression (slide 163)