Kosaraju’s Algorithm for Strongly Connected Components — and How to Use It for NFL Betting Models

Mon, Aug 18, 2025
by SportsBetting.dog

Kosaraju’s algorithm finds strongly connected components (SCCs) in a directed graph in linear time (Θ(V+E)) with two depth-first searches. In sports betting—especially NFL modeling—SCCs help you (1) detect “rock–paper–scissors” matchup cycles, (2) build condensed, acyclic team graphs for stable power rankings, (3) regularize ML models with structure-aware features, and (4) surface upset risk when a team’s style lands in a bad cycle. Below you’ll find an in-depth guide, from the algorithm to data engineering, modeling blueprints, and practical betting workflows.

1) What Kosaraju’s Algorithm Does

A strongly connected component in a directed graph is a maximal set of vertices where each vertex is reachable from every other vertex in that set.

Kosaraju’s algorithm (high level):

Run DFS on the graph $G$ to compute finishing times for all vertices (a stack or list ordered by DFS exit time).
Compute the transpose graph $G^T$ by reversing every edge.
Pop vertices by decreasing finishing time from step 1, and run DFS on $G^T$ .
Each DFS tree you get in this step is a strongly connected component.

Why it works: The first DFS orders vertices so that when you explore the reversed graph, you “peel off” one SCC at a time without leaking into another.

Complexity:

Build transpose: O(V + E)
DFS passes: O(V + E)
Total: Θ(V+E)

2) Why SCCs Matter in NFL Betting Predictions

On paper, the NFL season looks like a ranking problem. In reality it’s interaction-heavy:

Styles make fights: Some teams defend wide-zone beautifully but struggle with QB run game; others are the opposite.
Transitivity fails: Team A beats B, B beats C, C beats A. That cycle breaks pure Elo-style ordering.
Personnel and scheme are directional: “Team X’s pass rush disrupts Team Y’s long-developing intermediate game,” but not necessarily vice versa.

A directed graph captures this:

Nodes: teams (or team–QB pairs, or team–scheme states).
Directed edge X → Y: X has a matchup advantage over Y under specific conditions (weights can encode the expected margin/cover probability).

SCCs then highlight clusters of mutual reachability—i.e., groups where advantages are circular or balanced. Treating each SCC as a meta-team (node in a condensation DAG) stabilizes rankings and reveals when a line looks short/long due to cycle dynamics rather than raw team strength.

3) Building the Graph for NFL Modeling

3.1 Nodes

Choose granularity to match the question:

Team level: 32 nodes (simple, robust).
Team–QB state: Accounts for QB changes (Team_A_with_QB1, Team_A_with_QB2).
Team–scheme state: Early-season vs. late-season identity; coordinator changes.

3.2 Edges and Weights

Create X → Y if, conditional on context $C$ , X’s estimated edge over Y exceeds a threshold.

Candidate signals (use rolling windows with decay):

EPA splits: Rush EPA, Pass EPA, vs. 11/12/13 personnel EPA allowed, play-action EPA, RPO EPA.
Pressure & time-to-throw: Pressure rate vs. protection rates; heat maps of blitz efficacy.
Coverage vs. route family: How Y handles crossers, seams, sideline comebacks; how X creates those.
Quarterback style: Designed QB runs, scramble EPA, intermediate depth affinity.
OL/DL trench scores: Run-block/Pass-block vs. pass-rush/win rates.
Situational: Red zone TD rate, 3rd-and-short conversion, 4th-down aggressiveness.
Context: Rest differential, travel, surface, weather (wind is huge), tempo, injury status.

Edge weight idea:

w(X \to Y; C) = \mathbb{E}[\text{X margin vs Y} \mid C] \quad\text{or}\quad \Pr(\text{X covers vs Y}\mid C) - 0.5

Keep a time decay so last 3–5 games matter more than September, and use uncertainty bands (Bayesian shrinkage) to avoid overfitting.

3.3 Edge Construction Rules

Add X → Y if $w(X\to Y) > \tau$ (e.g., +1 point expected margin or +5% cover edge).
Optionally add Y → X if $w(Y\to X) > \tau$ . Cycles can and will form.
Use season phase filters (early/mid/late), or QB-state filters.

4) Using SCCs in Betting Workflows

4.1 Condensation DAG for Rankings

Compute SCCs on your weighted graph (sometimes drop very weak edges first).
Condense each SCC to a meta-node. The condensation graph is a DAG.
Topologically order the DAG.
This gives a structure-aware ranking: power within SCCs (use PageRank or average margin), then across SCCs via DAG levels.

Benefit: Reduces instability from cycles; prevents one favorable matchup from inflating global rank.

4.2 Cycle-Aware Upset Risk

If Team A is rated higher by base Elo but the pair (A, B) sits in an SCC with multiple active edges forming a cycle (A→B→C→A), you flag elevated upset risk when A faces B if the edge $B \to A$ is meaningful in the current context (e.g., weather amplifies B’s run game).

4.3 Portfolio Construction

Within an SCC, variance stems from style interactions, not just noise. Position smaller per game but more broadly across the SCC, or use correlation-aware sizing to avoid stacking similar exposure when cycles are present.

4.4 Scheduling & Lookahead

The condensation DAG exposes which SCCs are downstream threats. If a team sits atop the DAG but faces multiple opponents from a “bad-matchup” SCC in the next month, temper forecasts and adjust priors.

4.5 Market Microstructure

Build a second graph on price discovery (edges from market maker → copycat books → retail steam). SCCs there show feedback loops that exaggerate moves. Use this to time entries (hit openers vs. wait for steam).

5) ML Integration: Turning SCCs into Features

5.1 Direct Features

SCC ID (categorical): One-hot or learned embedding.
SCC size: Larger SCCs imply more cyclic dynamics (feature: scc_size).
Intra-SCC centrality: PageRank/Betweenness of the team node within its SCC.
Cycle metrics: Count of simple cycles including the team; shortest cycle length to opponent.
Edge asymmetry: $w(X\to Y) - w(Y\to X)$ as a feature for game $X$ vs $Y$ .

5.2 Regularization

Graph Laplacian penalty: Encourage similar parameters for teams connected within an SCC; relax across SCC boundaries to maintain style differentiation.

5.3 Model Families

Gradient boosting / tabular NN: Add SCC features to classic features (injuries, EPA, etc.).
GNNs (GraphSAGE/GAT): Use the matchup graph as the input; predict cover probability on the edge (game) with message passing constrained to within-SCC hops plus limited spillover.
Hierarchical models: Team-level priors informed by SCC membership; opponent-specific effects at the edge level.

5.4 Training & Targets

Targets: Cover (± against the closing line), margin, total points deviation.
Losses: Log loss for cover, pinball loss for quantiles (risk-aware spreads), or bivariate (side+total) if modeling joint outcomes.
Calibration: Platt/Isotonic per-SCC improves reliability when styles drive skew.

6) From Algorithm to Code (Illustrative)

6.1 Kosaraju (educational, pure Python)

from collections import defaultdict
import sys
sys.setrecursionlimit(10**6)

def kosaraju(n, edges):
    g = defaultdict(list)
    gt = defaultdict(list)
    for u, v in edges:
        g[u].append(v)
        gt[v].append(u)

    visited = [False]*n
    order = []

    def dfs1(u):
        visited[u] = True
        for v in g[u]:
            if not visited[v]:
                dfs1(v)
        order.append(u)

    for u in range(n):
        if not visited[u]:
            dfs1(u)

    comp = [-1]*n
    cid = 0

    def dfs2(u, cid):
        comp[u] = cid
        for v in gt[u]:
            if comp[v] == -1:
                dfs2(v, cid)

    for u in reversed(order):
        if comp[u] == -1:
            dfs2(u, cid)
            cid += 1

    return comp, cid  # comp[i] = component id of node i, 0..cid-1

6.2 Network Construction for NFL

# Pseudocode for building directed edges from matchup features
teams = list_of_teams_or_states  # e.g., 32 or team-QB variants
edges = []
for X in teams:
    for Y in teams:
        if X == Y: 
            continue
        w = expected_margin(X, Y, context=C)  # your model blending EPA splits, OL/DL, weather, etc.
        if w > +1.0:   # threshold in points
            edges.append((X.id, Y.id))

Then run kosaraju to get SCC IDs, compute SCC sizes, centralities (with NetworkX), and append those as features to your tabular/GNN pipeline.

7) Validating the Edge & SCC Pipeline

7.1 Backtesting Principles

Temporal CV: Rolling origin evaluation by week. Never leak future games into past edges.
Decayed windows: e.g., last 8 weeks with exponential decay; re-estimate edges every week.
Out-of-sample calibration: Reliability diagrams by SCC and non-SCC games.
Uplift tests: Compare AUC/Brier/log loss for (baseline) vs. (baseline + SCC features).
Bet simulation: Kelly-fractioned staking on edges where model p − implied p ≥ threshold; stratify results by presence of cycles.

7.2 Diagnostics

SCC churn: Excessive week-to-week SCC changes suggest thresholds too sensitive.
Cycle density: If almost everything is one SCC, thresholds are too low; if all singletons, too high.
Error by style: Check model residuals against scheme embeddings; SCCs should reduce systematic style errors.

8) Practical Betting Plays with SCC Insights

Cycle Flag Upsets
When divisions form cycles (e.g., A→B, B→C, C→A) and weather/venue accentuates the “bad edge,” upgrade the dog’s win/cover probability.
Condensed Rankings for Teasers/Parlays
Use the condensation DAG to identify stable tiers. Avoid stacking legs within a volatile SCC where outcomes are strongly interaction-driven.
Totals via Matchup Composition
Some SCCs are formed by pass-rush vs. deep-dropback dynamics—which also inflates sack/negative plays (unders) or forces tempo-driven quick game (overs). Encode SCC type as a categorical feature in totals models.
Live Betting
Build an in-game state graph (drives or play families as nodes). SCCs of states (e.g., “second-and-long → checkdown → third-and-medium → blitz → punt → second-and-long…”) reveal sticky loops. When a team clicks into a favorable loop vs. this opponent, adjust in-game win and total projections faster than naive models.

9) Extensions & Variants

Edge weighting: Keep weights and run SCC on a thresholded graph at multiple τ levels; features become “SCC@τ”.
Community detection vs. SCC: Undirected clustering (e.g., Louvain) captures similarity; SCCs capture directional dominance cycles—different and complementary.
HITS/PageRank within SCCs: Rank teams inside an SCC by authority/hub scores; use as priors for head-to-head predictions.

10) Limitations and Gotchas

Non-stationarity: Coaching changes, injuries, and late-season scheme pivots can invalidate edges. Use stateful nodes.
Small samples: NFL is data-sparse. Apply Bayesian shrinkage; require edge persistence (e.g., 3+ supporting games).
Confounding context: Weather and travel produce edges that won’t generalize to neutral conditions; tag edges with context and only activate when matched.
Playoffs & rematches: Teams adapt; cycles can break in rematches. Reduce reliance on historical SCCs and weight game-specific plan indicators (tempo change, protection adjustments).

11) A Concrete Modeling Blueprint

Data Layer
- Play-by-play → engineered matchup features with decay.
- Injury/QB status normalization.
- Scheme embeddings from clustering route families, front structures, and motion rates.
Graph Layer
- Build directed graph per week (and per context).
- Threshold edges; compute SCCs via Kosaraju.
- Derive features: scc_id, scc_size, pagerank_in_scc, cycle_count, asymmetry(X→Y), is_same_scc(X,Y).
Prediction Layer
- Baseline Elo/Bayesian margin model.
- Gradient boosted trees (or GNN for the full graph).
- Loss: log loss for side, pinball for margin quantiles.
- Calibration by SCC.
Decision Layer
- Convert probabilities to edges vs. the market (implied probability).
- Apply correlation-aware portfolio sizing; dampen stakes in volatile SCCs.
- Risk rules (max unit per SCC cluster, stop after N correlated losses).
Monitoring
- Weekly SCC heatmap and cycle reports.
- Drift detection on scheme embeddings; automatically rebuild edges when drift > threshold.

12) Takeaways

Kosaraju’s algorithm gives you the SCCs that expose directional cycles in team matchups.
Modeling the NFL as a directed interaction graph (not just a ranking list) surfaces where transitivity breaks, which is exactly where markets can be slow to adjust.
Injecting SCC-derived features into ML pipelines boosts calibration and helps you price upset risk and style-driven variance.
The condensation DAG yields stable tiers for portfolio construction and schedule lookahead.

Sports Betting Videos

IPA 216.73.216.112