Epistasia
  • Home
  • Tutorials
    • 1. Introduction
    • 2. Local epistasis
    • 3. Background-averaged interactions
    • 4. Batch effect correction
    • 5. Tests with synthetic landscapes
    • 6. Empirical analysis
  • Search
  • Previous
  • Next
  • Edit on MCMateu/epistasia
  • 6. Empirical analysis
  • 6.1 Reading empirical data: landscape with 10 species
  • 6.2. Clean batch effect
  • 6.3. Epistasis amplitude
  • 6.4. Local Epistasis
  • 6.5. Filter significant epistatic interactions
  • 6.6. Volcano plot

6. Empirical analysis¶

In this notebook we apply the epistasia package to real datasets. We demonstrate:

  1. Loading and validating empirical fitness data.
  2. Constructing a Landscape object.
  3. Computing WH spectra, epistasis amplitude, and variance-by-order statistics.
  4. Comparing uncertainty and null bootstraps.
  5. Visual inspection of significant orders.
In [1]:
Copied!
###########################
#         IMPORTS         #
###########################

import os
import sys
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
plt.style.use('./style.mplstyle') #include plotsyle

###########################
#         HELPERS         #
###########################

# Simple header function for clean console output
def header(title):
    print("\n" + "=" * len(title))
    print(title)
    print("=" * len(title))

#############################################
#     LOAD BINARY LANDSCAPES DEPENDENCE     #
#############################################

# Include your local path to the library here
base_path = os.path.expanduser("~/FunEcoLab_IBFG Dropbox/")
sys.path.insert(1, base_path)

# Import the main package
import epistasia as ep
########################### # IMPORTS # ########################### import os import sys import numpy as np import pandas as pd from matplotlib import pyplot as plt plt.style.use('./style.mplstyle') #include plotsyle ########################### # HELPERS # ########################### # Simple header function for clean console output def header(title): print("\n" + "=" * len(title)) print(title) print("=" * len(title)) ############################################# # LOAD BINARY LANDSCAPES DEPENDENCE # ############################################# # Include your local path to the library here base_path = os.path.expanduser("~/FunEcoLab_IBFG Dropbox/") sys.path.insert(1, base_path) # Import the main package import epistasia as ep

6.1 Reading empirical data: landscape with 10 species¶

In this tutorial we begin working with empirical data obtained from a biological landscape involving 10 different species. These data contain experimental measurements of the community biomass $F(\mathbf{x})$ across multiple system configurations and will serve as the basis for the analyses developed in the previous tutorials.

The landscape includes all $2^{10} = 1024$ possible binary configurations corresponding to presence/absence patterns of the 10 species. Each configuration is associated with a measured value of $F$ and five different replics $r=1,\dots,R=5.

In [2]:
Copied!
# --- Read empirical data --------------------
data_path = os.path.join(base_path, "EpistasisClass", "Sabela_et_al_2025_N10.csv")

df = pd.read_csv(data_path, index_col=False)

L_raw = ep.Landscape.from_dataframe(df)
display(L_raw)
# --- Read empirical data -------------------- data_path = os.path.join(base_path, "EpistasisClass", "Sabela_et_al_2025_N10.csv") df = pd.read_csv(data_path, index_col=False) L_raw = ep.Landscape.from_dataframe(df) display(L_raw)
C10 C9 C8 C7 C6 C5 C4 C1 C12 C14 rep_1 rep_2 rep_3 rep_4 rep_5
state Order
0000000000 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 0.0000 0.0000
0000000001 1 0 0 0 0 0 0 0 0 0 1 0.0698 0.0314 0.1130 0.0394 0.1686
0000000010 1 0 0 0 0 0 0 0 0 1 0 0.2733 0.0223 0.1909 0.1499 0.2060
0000000011 2 0 0 0 0 0 0 0 0 1 1 0.2190 0.1452 0.1322 0.1254 0.1761
0000000100 1 0 0 0 0 0 0 0 1 0 0 0.2136 0.1371 0.1554 0.0972 0.0821
0000000101 2 0 0 0 0 0 0 0 1 0 1 0.2352 0.1197 0.2309 0.1341 0.1814
0000000110 2 0 0 0 0 0 0 0 1 1 0 0.2335 0.1743 0.1683 0.0993 0.2327
0000000111 3 0 0 0 0 0 0 0 1 1 1 0.2103 0.1514 0.1544 0.1331 0.1620
0000001000 1 0 0 0 0 0 0 1 0 0 0 0.0896 0.0469 0.1036 0.0527 0.0336
0000001001 2 0 0 0 0 0 0 1 0 0 1 0.0997 0.0703 0.1475 0.0527 0.1536

6.2. Clean batch effect¶

Before analysing the empirical landscape, we remove systematic distortions introduced by the measurement batches. In our noise model, each observed replicate $\tilde{F}_r(\mathbf{x})$ differs from the underlying biological signal $\bar{F}(\mathbf{x})$ by an additive shift $a_r$, a multiplicative scaling $b_r$, and a configuration-specific noise term. The identifiability constraints introduced previously (centering of $\{a_r\}$ and unit geometric mean of $\{b_r\}$) ensure that these batch parameters can be uniquely estimated.

To clean the batch effects, we fit a hierarchical Bayesian model directly on the observed values

\begin{equation} \tilde{F}_r(\mathbf{x}) \sim \mathcal{N}\!\big(\bar{F}(\mathbf{x}) + a_r,\; \sigma^2 (\mathbf{x}) b_r^2\big), \end{equation}

where $\bar{F}(\mathbf{x})$, $a_r$, $b_r$, and the global noise scale $\sigma^2 (\mathbf{x})$ are inferred jointly from all configurations and replicates. This approach retains the full structure of the data (unlike methods based on standardized residuals), propagates uncertainty to all parameters, and enforces the identifiability constraints internally through the parametrization of $a_r$ and $b_r$.

Once the posterior means $\bar{a}_r$ and $\bar{b}_r$ are obtained, we invert the batch model to estimate the residual stochastic component

\begin{equation} \hat{\pi}_r(\mathbf{x}) = \frac{ \tilde{F}_r(\mathbf{x}) - \bar{F}(\mathbf{x}) - \bar{a}_r }{ \bar{b}_r }. \end{equation}

Finally, a batch-corrected replicate is reconstructed as

\begin{equation} F_r^{\mathrm{adj}}(\mathbf{x}) = \bar{F}(\mathbf{x}) + \hat{\pi}_r(\mathbf{x}), \end{equation}

which removes the additive and multiplicative batch distortions while preserving the biological variability. The resulting corrected landscape can then be used safely for downstream analyses such as epistasis quantification, inference of structure, or comparison across experiments.

In practice, this step is implemented with the command

In [3]:
Copied!
# Correct Batch effects
L, post = ep.correct_batch_effect(
    L_raw, return_posteriors=True,
    seed=12345678, chains=3, iter_warmup=1000, iter_sampling=1000, link="log",
)
print("Cleaned data")
display(L)
#display(L_batchy)
# Correct Batch effects L, post = ep.correct_batch_effect( L_raw, return_posteriors=True, seed=12345678, chains=3, iter_warmup=1000, iter_sampling=1000, link="log", ) print("Cleaned data") display(L) #display(L_batchy)
11:08:01 - cmdstanpy - INFO - CmdStan start processing
chain 1:   0%|          | 0/2000 [00:00<?, ?it/s, (Warmup)]
chain 2:   0%|          | 0/2000 [00:00<?, ?it/s, (Warmup)]
chain 3:   0%|          | 0/2000 [00:00<?, ?it/s, (Warmup)]
                                                                                                                                                                                                                                                
11:08:26 - cmdstanpy - INFO - CmdStan done processing.
Cleaned data
C10 C9 C8 C7 C6 C5 C4 C1 C12 C14 rep_1 rep_2 rep_3 rep_4 rep_5
state Order
0000000000 0 0 0 0 0 0 0 0 0 0 0 0.000000 0.000000 0.000000 0.000000 0.000000
0000000001 1 0 0 0 0 0 0 0 0 0 1 0.048467 0.038793 0.133346 0.046642 0.144859
0000000010 1 0 0 0 0 0 0 0 0 1 0 0.215349 0.030650 0.212454 0.166285 0.199879
0000000011 2 0 0 0 0 0 0 0 0 1 1 0.162641 0.163804 0.130344 0.141865 0.180170
0000000100 1 0 0 0 0 0 0 0 1 0 0 0.160771 0.154306 0.164416 0.111621 0.099257
0000000101 2 0 0 0 0 0 0 0 1 0 1 0.170876 0.139765 0.253409 0.152870 0.192046
0000000110 2 0 0 0 0 0 0 0 1 1 0 0.171582 0.195181 0.171896 0.115580 0.228525
0000000111 3 0 0 0 0 0 0 0 1 1 1 0.153575 0.170719 0.157193 0.150281 0.170767
0000001000 1 0 0 0 0 0 0 1 0 0 0 0.065712 0.055235 0.120205 0.060541 0.043004
0000001001 2 0 0 0 0 0 0 1 0 0 1 0.069255 0.081904 0.170409 0.062216 0.146146

After running the Bayesian correction we can visualise the corrected replicates.

In [4]:
Copied!
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator

# Choose a specific binary state
x = 123  # modify as needed

vals_raw = L_raw.values[x, :]
vals_adj = L.values[x, :]

colors = ["#ffc300", "#118ab2"]  # with batch / without batch

fig, axes = plt.subplots(1, 2, figsize=(7, 3.6), sharey=True)

# ---------------------------------------------------
# LEFT PANEL — WITH BATCH
# ---------------------------------------------------
ax = axes[0]

# Panel label A
ax.text(
    -0.15, 1.05, "A",
    transform=ax.transAxes,
    fontsize=16,
    fontweight="bold",
    va="top",
    ha="left"
)

# Scatter points
ax.scatter(
    range(1, L_raw.R + 1),
    vals_raw,
    marker="^",
    s=140,
    facecolor=colors[0],
    edgecolor="black",
    linewidth=2,
    zorder=3
)

# Horizontal mean line
mean_raw = vals_raw.mean()
ax.axhline(mean_raw, color="gray", linestyle="--", linewidth=2, zorder=2)
ax.text(
    4.5, mean_raw + 0.005,
    r"$\hat{F}$",
    fontsize=18,
    va="bottom",
    ha="left",
    color="gray"
)

ax.set_ylabel(r"Observed $\tilde{F}(\mathbf{\mathcal{x}})$", fontsize=14)

# Replicate labels rotated 45º
ax.set_xticks(range(1, L_raw.R + 1))
ax.set_xticklabels([f"$\\tilde{{F}}_{{{i}}}$" for i in range(1, L.R + 1)],rotation=0,fontsize=15)

# Ticks every 0.2 on y-axis
ax.yaxis.set_major_locator(MultipleLocator(0.1))

# y-limits
ax.set_ylim(0.1, 0.4)

# Label (BATCH)
ax.text(
    0.08, 0.98,
    "BATCH",
    transform=ax.transAxes,
    va="top", ha="left",
    fontsize=12,
    fontweight="bold"
)

ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)


# ---------------------------------------------------
# RIGHT PANEL — NO BATCH
# ---------------------------------------------------
ax = axes[1]

# Panel label B
ax.text(
    -0.15, 1.05, "B",
    transform=ax.transAxes,
    fontsize=16,
    fontweight="bold",
    va="top",
    ha="left"
)

ax.scatter(
    range(1, L.R + 1),
    vals_adj,
    marker="^",
    s=140,
    facecolor=colors[1],
    edgecolor="black",
    linewidth=2,
    zorder=3
)

mean_adj = vals_adj.mean()
ax.axhline(mean_adj, color="gray", linestyle="--", linewidth=2, zorder=2)
ax.text(
    4.5, mean_adj + 0.005,
    r"$\hat{F}$",
    fontsize=18,
    va="bottom",
    ha="left",
    color="gray"
)

ax.set_xticks(range(1, L.R + 1))
ax.set_xticklabels([f"$\\tilde{{F}}_{{{i}}}$" for i in range(1, L.R + 1)],rotation=0,fontsize=15)

ax.yaxis.set_major_locator(MultipleLocator(0.1))
ax.set_ylim(0.08, 0.35)

ax.text(
    0.04, 0.98,
    "NO BATCH",
    transform=ax.transAxes,
    va="top", ha="left",
    fontsize=12,
    fontweight="bold"
)

ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)


# --------------------------------------------------

fig.suptitle(f"{L.states[x]}")

# ---------------------------------------------------
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt from matplotlib.ticker import MultipleLocator # Choose a specific binary state x = 123 # modify as needed vals_raw = L_raw.values[x, :] vals_adj = L.values[x, :] colors = ["#ffc300", "#118ab2"] # with batch / without batch fig, axes = plt.subplots(1, 2, figsize=(7, 3.6), sharey=True) # --------------------------------------------------- # LEFT PANEL — WITH BATCH # --------------------------------------------------- ax = axes[0] # Panel label A ax.text( -0.15, 1.05, "A", transform=ax.transAxes, fontsize=16, fontweight="bold", va="top", ha="left" ) # Scatter points ax.scatter( range(1, L_raw.R + 1), vals_raw, marker="^", s=140, facecolor=colors[0], edgecolor="black", linewidth=2, zorder=3 ) # Horizontal mean line mean_raw = vals_raw.mean() ax.axhline(mean_raw, color="gray", linestyle="--", linewidth=2, zorder=2) ax.text( 4.5, mean_raw + 0.005, r"$\hat{F}$", fontsize=18, va="bottom", ha="left", color="gray" ) ax.set_ylabel(r"Observed $\tilde{F}(\mathbf{\mathcal{x}})$", fontsize=14) # Replicate labels rotated 45º ax.set_xticks(range(1, L_raw.R + 1)) ax.set_xticklabels([f"$\\tilde{{F}}_{{{i}}}$" for i in range(1, L.R + 1)],rotation=0,fontsize=15) # Ticks every 0.2 on y-axis ax.yaxis.set_major_locator(MultipleLocator(0.1)) # y-limits ax.set_ylim(0.1, 0.4) # Label (BATCH) ax.text( 0.08, 0.98, "BATCH", transform=ax.transAxes, va="top", ha="left", fontsize=12, fontweight="bold" ) ax.spines["top"].set_visible(False) ax.spines["right"].set_visible(False) # --------------------------------------------------- # RIGHT PANEL — NO BATCH # --------------------------------------------------- ax = axes[1] # Panel label B ax.text( -0.15, 1.05, "B", transform=ax.transAxes, fontsize=16, fontweight="bold", va="top", ha="left" ) ax.scatter( range(1, L.R + 1), vals_adj, marker="^", s=140, facecolor=colors[1], edgecolor="black", linewidth=2, zorder=3 ) mean_adj = vals_adj.mean() ax.axhline(mean_adj, color="gray", linestyle="--", linewidth=2, zorder=2) ax.text( 4.5, mean_adj + 0.005, r"$\hat{F}$", fontsize=18, va="bottom", ha="left", color="gray" ) ax.set_xticks(range(1, L.R + 1)) ax.set_xticklabels([f"$\\tilde{{F}}_{{{i}}}$" for i in range(1, L.R + 1)],rotation=0,fontsize=15) ax.yaxis.set_major_locator(MultipleLocator(0.1)) ax.set_ylim(0.08, 0.35) ax.text( 0.04, 0.98, "NO BATCH", transform=ax.transAxes, va="top", ha="left", fontsize=12, fontweight="bold" ) ax.spines["top"].set_visible(False) ax.spines["right"].set_visible(False) # -------------------------------------------------- fig.suptitle(f"{L.states[x]}") # --------------------------------------------------- plt.tight_layout() plt.show()
No description has been provided for this image

Each replic can be seen as point $F_r(\mathbf{x})\in\mathbb{R}^{2^N}$ can be vislualized as box plot. If the model successfully removes the batch effects, the distributions should now align

In [5]:
Copied!
colors = ["#ffc300", "#118ab2"]  # colors for (with batch, without batch)

fig, axes = plt.subplots(1, 2, figsize=(10, 3.6), sharey=True)

# ---------------------------
# Left: with batch effects
# ---------------------------
ax = axes[0]
vals = L_raw.values

ax.text(
    0.02, 1,            
    "BATCH",        
    transform=ax.transAxes,
    va="top", ha="left",
    fontsize=13,
    fontweight="bold"          
)


bp1 = ax.boxplot(
    [vals[:, r] for r in range(L_raw.R)],
    labels=[f"Replic {r+1}" for r in range(L_raw.R)],
    patch_artist=True,
    flierprops=dict(
    marker=".",
    markersize=6,
    markerfacecolor="black",
    markeredgecolor="none",
    markeredgewidth=0
)
)

ax.set_ylim(0,0.8)

ax.yaxis.set_major_locator(MultipleLocator(0.4))

# Color boxes
for patch in bp1["boxes"]:
    patch.set(facecolor=colors[0], alpha=1)
    patch.set(edgecolor="black")   
    patch.set(linewidth=2)

# Style medians
for med in bp1["medians"]:
    med.set(color="black", linewidth=2)

ax.set_ylabel(r"Observed $\tilde{F}(\mathbf{x})$")
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha="right")

# Make whiskers and caps thicker
for whisker in bp1["whiskers"]:
    whisker.set(linewidth=2)   
for cap in bp1["caps"]:
    cap.set(linewidth=2)

# ---------------------------
# Right: after correction
# ---------------------------
ax = axes[1]
vals2 = L.values

ax.text(
    0.02, 1,
    "NO BATCH",
    transform=ax.transAxes,
    va="top", ha="left",
    fontsize=13,
    fontweight="bold"
)

bp2 = ax.boxplot(
    [vals2[:, r] for r in range(L.R)],
    labels=[f"Replic {r+1}" for r in range(L.R)],
    patch_artist=True,
    flierprops=dict(
    marker=".",
    markersize=6,
    markerfacecolor="black",
    markeredgecolor="none",
    markeredgewidth=0,
)
)

ax.set_ylim(0,0.8)

ax.yaxis.set_major_locator(MultipleLocator(0.4))

# Color boxes
for patch in bp2["boxes"]:
    patch.set(facecolor=colors[1], alpha=1)
    patch.set(edgecolor="black")  
    patch.set(linewidth=2)

for med in bp2["medians"]:
    med.set(color="black", linewidth=2)

ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha="right")

# ---------------------------
# Remove top/right spines to match your style
# ---------------------------

for whisker in bp2["whiskers"]:
    whisker.set(linewidth=2)
for cap in bp2["caps"]:
    cap.set(linewidth=2)

# Layout
plt.tight_layout()
plt.show()
colors = ["#ffc300", "#118ab2"] # colors for (with batch, without batch) fig, axes = plt.subplots(1, 2, figsize=(10, 3.6), sharey=True) # --------------------------- # Left: with batch effects # --------------------------- ax = axes[0] vals = L_raw.values ax.text( 0.02, 1, "BATCH", transform=ax.transAxes, va="top", ha="left", fontsize=13, fontweight="bold" ) bp1 = ax.boxplot( [vals[:, r] for r in range(L_raw.R)], labels=[f"Replic {r+1}" for r in range(L_raw.R)], patch_artist=True, flierprops=dict( marker=".", markersize=6, markerfacecolor="black", markeredgecolor="none", markeredgewidth=0 ) ) ax.set_ylim(0,0.8) ax.yaxis.set_major_locator(MultipleLocator(0.4)) # Color boxes for patch in bp1["boxes"]: patch.set(facecolor=colors[0], alpha=1) patch.set(edgecolor="black") patch.set(linewidth=2) # Style medians for med in bp1["medians"]: med.set(color="black", linewidth=2) ax.set_ylabel(r"Observed $\tilde{F}(\mathbf{x})$") ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha="right") # Make whiskers and caps thicker for whisker in bp1["whiskers"]: whisker.set(linewidth=2) for cap in bp1["caps"]: cap.set(linewidth=2) # --------------------------- # Right: after correction # --------------------------- ax = axes[1] vals2 = L.values ax.text( 0.02, 1, "NO BATCH", transform=ax.transAxes, va="top", ha="left", fontsize=13, fontweight="bold" ) bp2 = ax.boxplot( [vals2[:, r] for r in range(L.R)], labels=[f"Replic {r+1}" for r in range(L.R)], patch_artist=True, flierprops=dict( marker=".", markersize=6, markerfacecolor="black", markeredgecolor="none", markeredgewidth=0, ) ) ax.set_ylim(0,0.8) ax.yaxis.set_major_locator(MultipleLocator(0.4)) # Color boxes for patch in bp2["boxes"]: patch.set(facecolor=colors[1], alpha=1) patch.set(edgecolor="black") patch.set(linewidth=2) for med in bp2["medians"]: med.set(color="black", linewidth=2) ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha="right") # --------------------------- # Remove top/right spines to match your style # --------------------------- for whisker in bp2["whiskers"]: whisker.set(linewidth=2) for cap in bp2["caps"]: cap.set(linewidth=2) # Layout plt.tight_layout() plt.show()
/tmp/ipykernel_591620/3007212835.py:21: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.
  bp1 = ax.boxplot(
/tmp/ipykernel_591620/3007212835.py:72: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.
  bp2 = ax.boxplot(
No description has been provided for this image

Additive effects posteriors

In [6]:
Copied!
from math import ceil

# Extract posterior draws for a_r
a = np.asarray(post["a"])   # shape: (draws, R)
n_draws, R = a.shape

# Two rows, dynamic number of columns
n_rows = 2
n_cols = ceil(R / n_rows)

fig, axes = plt.subplots(
    n_rows, n_cols,
    figsize=(3.0 * n_cols, 2.5 * n_rows),
)
axes = np.atleast_2d(axes)

# Plot posterior histograms for each a_r
for r in range(R):
    row = r // n_cols
    col = r % n_cols
    ax = axes[row, col]
    ax.hist(a[:, r], bins=40, density=True,color="#a9d6e5")
 
    mean_r = a[:, r].mean()
    ax.axvline(mean_r, color="blue", linestyle="--", linewidth=1.2,label="Mean")
    ax.set_xlabel(rf"$a_{r+1}$")
    ax.set_ylabel(rf"$P(a_{r+1}|\text{{data}})$")

    if(r==0):
        ax.legend(frameon=False,loc="best",fontsize=10)

# Hide unused axes
for idx in range(R, n_rows * n_cols):
    row = idx // n_cols
    col = idx % n_cols
    axes[row, col].axis("off")

fig.tight_layout()
plt.show()
from math import ceil # Extract posterior draws for a_r a = np.asarray(post["a"]) # shape: (draws, R) n_draws, R = a.shape # Two rows, dynamic number of columns n_rows = 2 n_cols = ceil(R / n_rows) fig, axes = plt.subplots( n_rows, n_cols, figsize=(3.0 * n_cols, 2.5 * n_rows), ) axes = np.atleast_2d(axes) # Plot posterior histograms for each a_r for r in range(R): row = r // n_cols col = r % n_cols ax = axes[row, col] ax.hist(a[:, r], bins=40, density=True,color="#a9d6e5") mean_r = a[:, r].mean() ax.axvline(mean_r, color="blue", linestyle="--", linewidth=1.2,label="Mean") ax.set_xlabel(rf"$a_{r+1}$") ax.set_ylabel(rf"$P(a_{r+1}|\text{{data}})$") if(r==0): ax.legend(frameon=False,loc="best",fontsize=10) # Hide unused axes for idx in range(R, n_rows * n_cols): row = idx // n_cols col = idx % n_cols axes[row, col].axis("off") fig.tight_layout() plt.show()
No description has been provided for this image

Multiplicative effects posterios

In [7]:
Copied!
# Extract posterior draws for b_r
b = np.asarray(post["b"])   # shape: (draws, R)
n_draws, R = b.shape

# Two rows, dynamic number of columns
n_rows = 2
n_cols = ceil(R / n_rows)

fig, axes = plt.subplots(
    n_rows, n_cols,
    figsize=(3.0 * n_cols, 2.5 * n_rows),
)
axes = np.atleast_2d(axes)

# Plot posterior histograms for each b_r
for r in range(R):
    row = r // n_cols
    col = r % n_cols
    ax = axes[row, col]

    # Histogram
    ax.hist(b[:, r], bins=40, density=True, color="#ffbf69")

    # Posterior mean line
    mean_r = b[:, r].mean()
    ax.axvline(mean_r, color="red", linestyle="--", linewidth=1.2,label="Mean")

    # Labels
    ax.set_xlabel(rf"$b_{r+1}$")
    ax.set_ylabel(rf"$P(b_{r+1}|\text{{data}})$")

    if(r==0):
        ax.legend(frameon=False,loc="best",fontsize=10)

# Hide unused axes
for idx in range(R, n_rows * n_cols):
    row = idx // n_cols
    col = idx % n_cols
    axes[row, col].axis("off")

fig.tight_layout()
plt.show()
# Extract posterior draws for b_r b = np.asarray(post["b"]) # shape: (draws, R) n_draws, R = b.shape # Two rows, dynamic number of columns n_rows = 2 n_cols = ceil(R / n_rows) fig, axes = plt.subplots( n_rows, n_cols, figsize=(3.0 * n_cols, 2.5 * n_rows), ) axes = np.atleast_2d(axes) # Plot posterior histograms for each b_r for r in range(R): row = r // n_cols col = r % n_cols ax = axes[row, col] # Histogram ax.hist(b[:, r], bins=40, density=True, color="#ffbf69") # Posterior mean line mean_r = b[:, r].mean() ax.axvline(mean_r, color="red", linestyle="--", linewidth=1.2,label="Mean") # Labels ax.set_xlabel(rf"$b_{r+1}$") ax.set_ylabel(rf"$P(b_{r+1}|\text{{data}})$") if(r==0): ax.legend(frameon=False,loc="best",fontsize=10) # Hide unused axes for idx in range(R, n_rows * n_cols): row = idx // n_cols col = idx % n_cols axes[row, col].axis("off") fig.tight_layout() plt.show()
No description has been provided for this image

Hyper-parameters posterios

In [8]:
Copied!
# Extract posterior draws for sigma
sigma = np.asarray(post["sigma"])   # shape: (draws,)

fig, ax = plt.subplots(figsize=(5, 3))

# Histogram
ax.hist(sigma, bins=40, density=True, color="#76c893")

# Vertical posterior mean line
mean_sigma = sigma.mean()
ax.axvline(mean_sigma, color="green", linestyle="--", linewidth=2,label="Mean")

# Axis labels
ax.set_xlabel(r"$\sigma$")
ax.set_ylabel(r"$P(\sigma|\text{data})$")

# Legend
ax.legend(frameon=False,loc="best",fontsize=10)

fig.tight_layout()
plt.show()
# Extract posterior draws for sigma sigma = np.asarray(post["sigma"]) # shape: (draws,) fig, ax = plt.subplots(figsize=(5, 3)) # Histogram ax.hist(sigma, bins=40, density=True, color="#76c893") # Vertical posterior mean line mean_sigma = sigma.mean() ax.axvline(mean_sigma, color="green", linestyle="--", linewidth=2,label="Mean") # Axis labels ax.set_xlabel(r"$\sigma$") ax.set_ylabel(r"$P(\sigma|\text{data})$") # Legend ax.legend(frameon=False,loc="best",fontsize=10) fig.tight_layout() plt.show()
No description has been provided for this image

6.3. Epistasis amplitude¶

The epistasis amplitude measures how much each interaction order contributes to the global structure of the functional landscape. For a given order $S$, it is defined as the average squared strength of all $S$-way interactions:

\begin{equation} \langle E_S^2 \rangle = \binom{N}{S}^{-1} \sum_{|s|=S} E_s^2 . \end{equation}

It summarizes the global magnitude of epistasis at order (S): large values indicate that interactions of that order play an important role, while small values indicate weak or negligible contributions. Because it averages over all backgrounds, the amplitude is more stable than individual epistatic coefficients and provides a clean “spectral” view of the map.

The function epistasis_amplitude() computes these amplitudes together with bootstrap confidence intervals and a noise-based null model. By comparing the empirical amplitude to the null distribution, we can assess which interaction orders contain detectable biological signal.

In [9]:
Copied!
df_amp = ep.epistasis_amplitude(
    L,
    B_uncertainty=1000,
    B_null=1000,
    ci_level=0.95,
    multipliers="normal",
    as_dataframe=True,
)

display(df_amp)
df_amp = ep.epistasis_amplitude( L, B_uncertainty=1000, B_null=1000, ci_level=0.95, multipliers="normal", as_dataframe=True, ) display(df_amp)
Order Epistasis amplitude <E^2>_k CI low CI high Null CI low Null CI high SNR (null) Null mean Variance (obs) Variance (null median) P-value order P-value var
0 1 0.000588 0.000545 0.000646 3.098245e-07 0.000010 252.132979 0.000003 0.000588 0.000002 0.000 0.84
1 2 0.000254 0.000236 0.000304 1.522962e-06 0.000030 31.659593 0.000012 0.000254 0.000010 0.000 0.84
2 3 0.000099 0.000103 0.000221 6.459941e-06 0.000121 3.226393 0.000047 0.000099 0.000041 0.056 0.84
3 4 0.000231 0.000261 0.000721 2.457570e-05 0.000479 1.884806 0.000187 0.000231 0.000160 0.308 0.84
4 5 0.000615 0.000729 0.002420 1.083796e-04 0.001929 1.255203 0.000744 0.000615 0.000632 0.515 0.84
5 6 0.001903 0.002416 0.009543 4.158275e-04 0.007581 0.953118 0.002979 0.001903 0.002568 0.654 0.84
6 7 0.006041 0.007893 0.038562 1.565740e-03 0.030968 0.772123 0.011884 0.006041 0.010310 0.744 0.84
7 8 0.014572 0.020879 0.148194 6.739769e-03 0.123953 0.450841 0.046982 0.014572 0.039928 0.883 0.84
8 9 0.093389 0.079788 0.723876 1.946717e-02 0.637378 0.552262 0.192567 0.093389 0.150255 0.696 0.84
9 10 0.056497 0.000609 5.027775 2.548395e-04 3.808067 0.049240 0.698194 0.056497 0.271621 0.743 0.84

The variance spectrum tells us how much of the total functional variation $V_{\text{tot}}$ is explained by additive, pairwise, and higher-order interactions. The epistasis amplitude, $\langle \mathcal{E}_S^2 \rangle$, complements this view by measuring the average squared strength of all interactions of order $S$. It captures the global magnitude of epistasis at that order, independently of background.

The function ep.plot_variance_and_amplitude() jointly visualizes both quantities, together with bootstrap confidence intervals and a noise-based null model, allowing us to assess how interaction strength and functional variance are distributed across orders of epistasis.

In [10]:
Copied!
fig, axes, data = ep.plot_variance_and_amplitude(
    L,
    B_uncertainty=100,
    B_null=100,
    as_fraction=True,
    show_components=True,
    rng=np.random.default_rng(125),
    return_data=True,
    figsize=(15, 4),
    ci_method_uncertainty = "percentile"  # {"percentile","studentized","bca"}
)
fig, axes, data = ep.plot_variance_and_amplitude( L, B_uncertainty=100, B_null=100, as_fraction=True, show_components=True, rng=np.random.default_rng(125), return_data=True, figsize=(15, 4), ci_method_uncertainty = "percentile" # {"percentile","studentized","bca"} )
No description has been provided for this image

The score $1 - p_\text{value}$ quantifies how confidently we can reject the null hypothesis of “no interaction” at each order. Values close to 1 indicate strong evidence that the observed epistasis cannot be explained by noise alone, whereas values near 0 suggest that the interaction magnitude is statistically indistinguishable from the null model. Plotting interaction order vs. $1 - p\_\text{value}$ provides a clear visual summary of which orders contain detectable biological signal.

In [11]:
Copied!
plt.plot(df_amp["Order"],1-df_amp["P-value order"],"o-")
plt.ylabel("Detection score (1 - p-value)")
plt.xlabel("Epistatic order")
plt.plot(df_amp["Order"],1-df_amp["P-value order"],"o-") plt.ylabel("Detection score (1 - p-value)") plt.xlabel("Epistatic order")
Out[11]:
Text(0.5, 0, 'Epistatic order')
No description has been provided for this image

6.4. Local Epistasis¶

In [12]:
Copied!
delta2 = ep.focal_effect(
    L,
    i=2,
    missing_policy="drop",       # tolerates missing states
    nan_policy="omit",           # ignore NaNs inside alternating sum
    B_uncertainty=1000,         # bootstrap draws for uncertainty bands
    B_null=1000,                # bootstrap draws for null test
    multipliers="rademacher",  # type of wild multipliers
    ci_level=0.95,
    as_dataframe=True,          # return as DataFrame
)

display(delta2)
delta2 = ep.focal_effect( L, i=2, missing_policy="drop", # tolerates missing states nan_policy="omit", # ignore NaNs inside alternating sum B_uncertainty=1000, # bootstrap draws for uncertainty bands B_null=1000, # bootstrap draws for null test multipliers="rademacher", # type of wild multipliers ci_level=0.95, as_dataframe=True, # return as DataFrame ) display(delta2)
Background Background loci Background loci names Background active loci Background active names Order Loci involved Loci names Epistasis (mean) Experimental SD ... Prob(Effect > 0) Prob(Effect < 0) Null CI (low) Null CI (high) Signal-to-Null-Noise (SNR) Null variance (per background) Variance (observed) Variance (null median) p-var p-null
0 000000000 (0, 1, 3, 4, 5, 6, 7, 8, 9) (C10, C9, C7, C6, C5, C4, C1, C12, C14) () () 1 (2,) (C8,) 0.195680 0.050457 ... 0.999 0.001 -0.075533 0.066451 5.141814 0.001448 0.002148 0.001446 0.0 0.002
1 000000001 (0, 1, 3, 4, 5, 6, 7, 8, 9) (C10, C9, C7, C6, C5, C4, C1, C12, C14) (9,) (C14,) 1 (2,) (C8,) 0.177508 0.040981 ... 0.998 0.002 -0.078768 0.072248 4.593752 0.001493 0.002148 0.001446 0.0 0.001
2 000000010 (0, 1, 3, 4, 5, 6, 7, 8, 9) (C10, C9, C7, C6, C5, C4, C1, C12, C14) (8,) (C12,) 1 (2,) (C8,) 0.057968 0.086060 ... 0.938 0.062 -0.077590 0.072754 1.494345 0.001505 0.002148 0.001446 0.0 0.119
3 000000011 (0, 1, 3, 4, 5, 6, 7, 8, 9) (C10, C9, C7, C6, C5, C4, C1, C12, C14) (8, 9) (C12, C14) 1 (2,) (C8,) 0.029363 0.039252 ... 0.792 0.208 -0.068860 0.069715 0.761386 0.001487 0.002148 0.001446 0.0 0.427
4 000000100 (0, 1, 3, 4, 5, 6, 7, 8, 9) (C10, C9, C7, C6, C5, C4, C1, C12, C14) (7,) (C1,) 1 (2,) (C8,) 0.113849 0.080499 ... 0.995 0.005 -0.075365 0.069782 2.887325 0.001555 0.002148 0.001446 0.0 0.010
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
507 111111011 (0, 1, 3, 4, 5, 6, 7, 8, 9) (C10, C9, C7, C6, C5, C4, C1, C12, C14) (0, 1, 3, 4, 5, 6, 8, 9) (C10, C9, C7, C6, C5, C4, C12, C14) 1 (2,) (C8,) 0.025767 0.048197 ... 0.773 0.227 -0.069277 0.075126 0.676876 0.001449 0.002148 0.001446 0.0 0.473
508 111111100 (0, 1, 3, 4, 5, 6, 7, 8, 9) (C10, C9, C7, C6, C5, C4, C1, C12, C14) (0, 1, 3, 4, 5, 6, 7) (C10, C9, C7, C6, C5, C4, C1) 1 (2,) (C8,) 0.033614 0.022873 ... 0.826 0.174 -0.071192 0.075507 0.869225 0.001495 0.002148 0.001446 0.0 0.351
509 111111101 (0, 1, 3, 4, 5, 6, 7, 8, 9) (C10, C9, C7, C6, C5, C4, C1, C12, C14) (0, 1, 3, 4, 5, 6, 7, 9) (C10, C9, C7, C6, C5, C4, C1, C14) 1 (2,) (C8,) -0.007927 0.106824 ... 0.426 0.574 -0.071104 0.068443 0.213425 0.001380 0.002148 0.001446 0.0 0.829
510 111111110 (0, 1, 3, 4, 5, 6, 7, 8, 9) (C10, C9, C7, C6, C5, C4, C1, C12, C14) (0, 1, 3, 4, 5, 6, 7, 8) (C10, C9, C7, C6, C5, C4, C1, C12) 1 (2,) (C8,) 0.018088 0.052666 ... 0.685 0.315 -0.079199 0.069145 0.488170 0.001373 0.002148 0.001446 0.0 0.603
511 111111111 (0, 1, 3, 4, 5, 6, 7, 8, 9) (C10, C9, C7, C6, C5, C4, C1, C12, C14) (0, 1, 3, 4, 5, 6, 7, 8, 9) (C10, C9, C7, C6, C5, C4, C1, C12, C14) 1 (2,) (C8,) 0.008970 0.085860 ... 0.594 0.406 -0.075263 0.072812 0.235165 0.001455 0.002148 0.001446 0.0 0.828

512 rows × 23 columns

In [ ]:
Copied!
full_df = ep.compute_full_epistasis(
    L,
    min_order=1,
    max_order=L.N,
    B_uncertainty=1000,
    B_null=1000,
)
full_df = ep.compute_full_epistasis( L, min_order=1, max_order=L.N, B_uncertainty=1000, B_null=1000, )
In [ ]:
Copied!
display(full_df.loc[(full_df["p-null"] < 0.05) & (full_df["Order"] == 1)])
display(full_df.loc[(full_df["p-null"] < 0.05) & (full_df["Order"] == 1)])
Interaction type Background Background loci Background loci names Background active loci Background active names Order Loci involved Loci names Epistasis (mean) ... Prob(Effect > 0) Prob(Effect < 0) Null CI (low) Null CI (high) Signal-to-Null-Noise (SNR) Null variance (per background) Variance (observed) Variance (null median) p-var p-null
87 order-1 001010111 (1, 2, 3, 4, 5, 6, 7, 8, 9) (C9, C8, C7, C6, C5, C4, C1, C12, C14) (3, 5, 7, 8, 9) (C7, C5, C1, C12, C14) 1 (0,) (C10,) -0.068199 ... 0.033 0.967 -0.064757 0.074268 1.904629 0.001282 0.001972 0.001442 0.000 0.048
119 order-1 001110111 (1, 2, 3, 4, 5, 6, 7, 8, 9) (C9, C8, C7, C6, C5, C4, C1, C12, C14) (3, 4, 5, 7, 8, 9) (C7, C6, C5, C1, C12, C14) 1 (0,) (C10,) -0.099070 ... 0.008 0.992 -0.079080 0.071927 2.558043 0.001500 0.001972 0.001442 0.000 0.017
126 order-1 001111110 (1, 2, 3, 4, 5, 6, 7, 8, 9) (C9, C8, C7, C6, C5, C4, C1, C12, C14) (3, 4, 5, 6, 7, 8) (C7, C6, C5, C4, C1, C12) 1 (0,) (C10,) 0.075802 ... 0.982 0.018 -0.069493 0.075900 1.982035 0.001463 0.001972 0.001442 0.000 0.044
176 order-1 010110000 (1, 2, 3, 4, 5, 6, 7, 8, 9) (C9, C8, C7, C6, C5, C4, C1, C12, C14) (2, 4, 5) (C8, C6, C5) 1 (0,) (C10,) -0.201185 ... 0.001 0.999 -0.074638 0.069136 5.056771 0.001583 0.001972 0.001442 0.000 0.002
180 order-1 010110100 (1, 2, 3, 4, 5, 6, 7, 8, 9) (C9, C8, C7, C6, C5, C4, C1, C12, C14) (2, 4, 5, 7) (C8, C6, C5, C1) 1 (0,) (C10,) -0.119612 ... 0.008 0.992 -0.068739 0.075601 3.169977 0.001424 0.001972 0.001442 0.000 0.004
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5081 order-1 111011001 (0, 1, 2, 3, 4, 5, 6, 7, 8) (C10, C9, C8, C7, C6, C5, C4, C1, C12) (0, 1, 2, 4, 5, 8) (C10, C9, C8, C6, C5, C12) 1 (9,) (C14,) -0.073547 ... 0.017 0.983 -0.071584 0.071792 2.022724 0.001322 0.001691 0.001442 0.002 0.044
5086 order-1 111011110 (0, 1, 2, 3, 4, 5, 6, 7, 8) (C10, C9, C8, C7, C6, C5, C4, C1, C12) (0, 1, 2, 4, 5, 6, 7) (C10, C9, C8, C6, C5, C4, C1) 1 (9,) (C14,) -0.087174 ... 0.012 0.988 -0.073647 0.078973 2.151806 0.001641 0.001691 0.001442 0.002 0.032
5106 order-1 111110010 (0, 1, 2, 3, 4, 5, 6, 7, 8) (C10, C9, C8, C7, C6, C5, C4, C1, C12) (0, 1, 2, 3, 4, 7) (C10, C9, C8, C7, C6, C1) 1 (9,) (C14,) 0.077175 ... 0.976 0.024 -0.072471 0.072388 2.078133 0.001379 0.001691 0.001442 0.002 0.042
5115 order-1 111111011 (0, 1, 2, 3, 4, 5, 6, 7, 8) (C10, C9, C8, C7, C6, C5, C4, C1, C12) (0, 1, 2, 3, 4, 5, 7, 8) (C10, C9, C8, C7, C6, C5, C1, C12) 1 (9,) (C14,) -0.086298 ... 0.010 0.990 -0.077273 0.070661 2.036906 0.001795 0.001691 0.001442 0.002 0.036
5116 order-1 111111100 (0, 1, 2, 3, 4, 5, 6, 7, 8) (C10, C9, C8, C7, C6, C5, C4, C1, C12) (0, 1, 2, 3, 4, 5, 6) (C10, C9, C8, C7, C6, C5, C4) 1 (9,) (C14,) -0.089111 ... 0.012 0.988 -0.071283 0.068164 2.325085 0.001469 0.001691 0.001442 0.002 0.026

567 rows × 24 columns

6.5. Filter significant epistatic interactions¶

In [ ]:
Copied!
df_sig = ep.filter_significant_interactions(
    full_df,
    min_snr_null=2.0,
    max_p_value_var=None,
    min_prob_sign=0.0,
    max_p_null=0.05,
    # Only order 2 interactions 
    min_order=2,
    max_order=2,
    min_feature_overlap=2,
    feature_col="Loci names",
    p_col_for_filter="p-null",
)

edges_df = ep.epistasis_to_network(
    df_sig,
    order=2,
    agg="mean_abs",              
    feature_names=L.feature_names 
)

display(edges_df.head())

ep.plot_epistasis_network(
    edges_df,
    use_names=True,  
    node_size=2000,
    font_size=10,
)
df_sig = ep.filter_significant_interactions( full_df, min_snr_null=2.0, max_p_value_var=None, min_prob_sign=0.0, max_p_null=0.05, # Only order 2 interactions min_order=2, max_order=2, min_feature_overlap=2, feature_col="Loci names", p_col_for_filter="p-null", ) edges_df = ep.epistasis_to_network( df_sig, order=2, agg="mean_abs", feature_names=L.feature_names ) display(edges_df.head()) ep.plot_epistasis_network( edges_df, use_names=True, node_size=2000, font_size=10, )
i j weight metric Locus i name Locus j name
0 0 1 0.149846 mean_abs_epistasis C10 C9
1 0 2 0.156017 mean_abs_epistasis C10 C8
2 0 3 0.155220 mean_abs_epistasis C10 C7
3 0 4 0.143409 mean_abs_epistasis C10 C6
4 0 5 0.128522 mean_abs_epistasis C10 C5
No description has been provided for this image

6.6. Volcano plot¶

In [ ]:
Copied!
# One panel per order, with significance threshold and % significant annotated
ep.plot_epistasis_volcano(
    full_df,
    orders=[1,2,3,4,5,6,7,8,9,10],
    mode="by-order",
    alpha=0.05,
    clip=1e-6,
    colormap="custom",
)
# One panel per order, with significance threshold and % significant annotated ep.plot_epistasis_volcano( full_df, orders=[1,2,3,4,5,6,7,8,9,10], mode="by-order", alpha=0.05, clip=1e-6, colormap="custom", )
No description has been provided for this image

Documentation built with MkDocs.

Search

From here you can search these documents. Enter your search terms below.

Keyboard Shortcuts

Keys Action
? Open this help
n Next page
p Previous page
s Search