Nature Methods Brief Communication Workflow¶

This workflow turns the breast Xenium WTA reference projection into a submission-grade analysis bundle.

What It Adds¶

Held-out Perturb-seq reference validation with AUROC/AUPRC.
Expression-matched random programs and group-label shuffles for empirical calibration.
Ranked candidate filtering with min_claim_cells=50; smaller ROI/cell-type groups remain supplementary.
Bootstrap confidence intervals for group-level program scores.
Moran-style spatial autocorrelation on the Xenium spatial graph.
Ablations for top_n = 25, 50, 100 and graph k = 5, 15, 30.
Two main figures plus a Nature Methods short-communication scaffold.

Python API¶

import spatialperturb as sp

results = sp.run_nature_methods_breast_analysis(
    "/data/taobo.hu/SpatialPerturb/prepared/xenium_wta_breast.h5ad",
    reference_datasets=["gse241115_breast_cropseq", "gse281048_pathway_atlas"],
    config={
        "cache_dir": "/data/taobo.hu/SpatialPerturb/cache",
        "pathway_cell_line": "MCF7",
        "reference_effect_size_only": True,
        "n_random": 25,
        "n_spatial_permutations": 25,
        "n_bootstrap": 100,
        "min_claim_cells": 50,
    },
    output_dir="/data/taobo.hu/SpatialPerturb/reports/nature_methods_breast_shortcomm",
)

CLI¶

spatialperturb run-nature-methods-breast-analysis \
  /data/taobo.hu/SpatialPerturb/prepared/xenium_wta_breast.h5ad \
  /data/taobo.hu/SpatialPerturb/reports/nature_methods_breast_shortcomm \
  --cache-dir /data/taobo.hu/SpatialPerturb/cache

A100 Run¶

tmux new -d -s sp_nm_breast \
  "bash /data/taobo.hu/SpatialPerturb/code/SpatialPerturb/scripts/a100_run_nature_methods_breast_analysis.sh 2>&1 | tee /data/taobo.hu/SpatialPerturb/reports/nature_methods_breast_shortcomm/run.log"

bash /data/taobo.hu/SpatialPerturb/code/SpatialPerturb/scripts/a100_monitor_nature_methods_breast_analysis.sh --watch

The A100 runner first tries to create /data/taobo.hu/SpatialPerturb/envs/r-seurat with micromamba, mamba, or conda. For the submission run, gse281048_pathway_atlas is expected to be prepared and filtered to MCF7 through that R/Seurat environment; if a development environment lacks R, the manifest records the secondary-reference failure explicitly without changing the primary GSE241115 analysis.

The current submission run treats both references as ready: GSE241115 contributes the breast CROP-seq baseline and GSE281048 contributes 218 MCF7 pathway programs, expanding the reference panel from 50 to 268 programs.

Key Outputs¶

manifest.json
nature_methods_summary.md
nature_methods_shortcomm_scaffold.md
biological_interpretation.md
figures/main_figure_1.png
figures/main_figure_2.png
tables/reference_validation.tsv
tables/calibrated_program_scores_by_group.tsv
tables/spatial_autocorrelation.tsv
tables/ablation_summary.tsv

Production Panel Output¶

The submission-facing panel layer lives in spatialperturb.figurekit. It is separate from exploratory plotting helpers in spatialperturb.pl and enforces production defaults for Nature-style figure assembly:

PDF/PS fonts are exported as Type 42 and SVG text is left editable.
Panel sizes are declared in millimeters through PanelSpec.
Axis labels, ticks, legends and panel labels use fixed small-format typography.
Dense spatial scatter layers are rasterized while axes, labels, legends and colorbars remain vector.
Every saved panel exports .pdf, .png and .svg, writes a matching Source Data .tsv file and records an entry in panel_manifest.tsv.

import spatialperturb as sp

outputs = sp.render_nature_methods_panels(
    "/data/taobo.hu/SpatialPerturb/reports/nature_methods_breast_shortcomm",
    "/data/taobo.hu/SpatialPerturb/manuscripts/nature_methods_shortcomm",
    strict=True,
)

This writes eight independent panel files under panels/ and their matched source tables under source_data/:

fig1a_workflow_schema
fig1b_reference_validation
fig1c_null_calibration
fig1d_ablation_robustness
fig2a_xenium_map_celltype_roi
fig2b_top_program_spatial_map
fig2c_roi_celltype_heatmap
fig2d_spatial_autocorrelation

The main panels intentionally use a curated short-communication selection rather than all 268 programs. Figure 1 emphasizes reference readiness, 50 -> 268 program expansion, GSE281048 validation and null-calibration caveats. Figure 2 focuses on ranked candidate spatial programs with interpretable breast-cancer context: Mast-cell FOS, basal-like structured DCIS CEBPB, dendritic SP1/MTOR/RPS6KB1/MAPK3, luminal-like amorphous DCIS PTGS2, invasive-associated CAF MAPK8, and mitotic invasive tumor IFNAR1/TYK2. Full FDR values, bootstrap intervals, ablations, redundancy and all 268 programs belong in Source Data, supplementary tables or Extended Data.

Interpretation Guardrail¶

Projection scores quantify Perturb-seq reference-like transcriptional states. They should be framed as spatially localized candidate regulatory states, not as proof that the tissue cells underwent the corresponding knockout, CRISPRi perturbation, pathway intervention, or drug treatment.