Single Cell ATAC-seq

Identifying candidate cis-regulatory elements (cCREs)

Notebook: 08-28-2025

📌 Note: For a better understanding of the design of this notebook, please read the main section page first. Thanks 🙏

Single Cell ATAC-seq

1. Background

These notes highlight strategies and resources for the downstream analysis of single-cell chromatin accessibility.
The main goal is to compile practical references and methods to support the identification of candidate cis-regulatory elements (cCREs) for advanced analysis — including tasks such as linking peaks to genes or computing differentially accessible regions (DARs).

Calling snATAC-seq peaks using MACS2 and linking peaks to genes

The following video provides a summary of the main topics discussed in this section:

A big thanks to Leonardo Collado-Torres from Lieber Institute for Brain Development for compiling and editing this recording.

Diagrams and concept maps

When working with multiome data (RNA + ATAC), Cell Ranger ARC processes the raw data and identifies valid barcodes that contain both RNA and ATAC information. From here, we can build two parallel matrices: one for gene expression counts and another for peak accessibility counts (Image 1).

CellRanger default peak calls

Sketch 1

To explore regulatory landscapes, we visualize read coverage and chromatin features with CoveragePlot(), which overlays MACS2-called peaks, linked peak–gene relationships, and the gene model (Images 2–3). This provides a clear view of how open chromatin regions may regulate transcription.

Peak overview (click to open)

Sketch 2 Sketch 2b

Next, we create a chromatin assay: performing GC bias correction, normalization (TF-IDF), and standardization. Then, using Signac::LinkPeaks, we correlate peaks with nearby genes across windows (e.g., ±500 kb around a TSS), evaluating the strength of each regulatory connection (Image 4).

There are two complementary strategies:

Global peak linking, where peaks are defined across all cells, merged, normalized, and then linked to genes (broad overview).

Local peak linking, where peaks are defined per cluster (via MACS2), quantified, normalized, and linked (more cell-type-specific view) (Image 5).

Global vs local peaks (click to open)

Sketch 3 Sketch 4

The basic “cell-to-cell” approach follows a straightforward workflow: load the Seurat object, load and quantify MACS2 peaks, build the chromatin assay, normalize, and then perform downstream analyses like LinkPeaks or differential accessibility (DARs) (Image 6).

Finally, we can compare cell-to-cell vs pseudobulk strategies. In cell-to-cell, we build a shared peak set (using reduce), quantify fragments, and construct a chromatin assay for fine-grained resolution. In pseudobulk, we aggregate expression at the cluster level for broader, more robust signals. Both converge into analyses of peak–gene links and differential accessibility, but at different resolutions (Image 7).

Cell-to-cell vs pseudobulk (click to open)

Sketch 5 Sketch 6

Together, these steps form a pipeline for uncovering how chromatin accessibility influences gene expression across cell states, balancing global robustness with local specificity.

2. Technical Support Notes

a. Key Literature

Chawla et al., 2025 (Nat Genet)
Single-nucleus chromatin accessibility profiling identifies cell types and functional variants contributing to major depression.
DOI link
Includes code for DAR analysis (cell-type/cluster-specific cCREs and TFs): GitHub repo
Zeng et al., 2024 (Science)
Genetic regulation of cell type–specific chromatin accessibility shapes brain disease etiology.
DOI link
Anderson et al., 2023 (Cell Genomics)
Single nucleus multiomics identifies ZEB1 and MAFB as candidate regulators of Alzheimer’s disease-specific cis-regulatory elements.
DOI link
Includes code for peak calling: Workflow

b. Vignettes & Tutorials

c. Core Functions

Signac
GenomicRanges
- reduce
- IRanges methods
Seurat v5+
- AggregateExpression

d. Custom Repo Practices

Peak calling workflows: scMultiomics_AD repo
Link detection strategies:
- Filter MACS peak calls (e.g., by expression or cell type).
- Keep only genes with sufficient UMIs (e.g., ≥200).
- Restrict peak–gene links by distance (≤100 kb from TSS).
- Example: link filtering.

e. Behind-the-Scenes Tools & File Formats

To run some functions, like Signac::CallPeaks (which internally invokes MACS), you might be interested in reading a bit about how MACS work, as you need to install it to let CallPeaks operate. Here, I am listing the most elementary resources for making happen, these are:

MACS Project: GitHub and Changelog
Installation: MACS2 guide and MACS3 guide
BED = Browser Extensible Data format (standard for genomic intervals): UCSC FAQ

If you are wondering, Why get into BED format ? The short story is that CallPeaks() calls MACS2, which produces BED/narrowPeak files. Then Signac reads those into GRanges, discarding the need for you to handle files manually. The relation is direct: CallPeaks() = automated creation + import of MACS2 BED output into R

This matter if you want to:

Export: you can always export Signac’s GRanges peaks back into a BED file (with rtracklayer::export()). Useful if you want to load them in IGV, UCSC Genome Browser, or compare with ENCODE cCREs.
Import: if you already have BED peaks (e.g., ENCODE, or your own MACS2 runs outside of Signac), you can load them with rtracklayer::import() and use them directly in Signac.
Consistency: this BED ↔ GRanges conversion makes it easy to interoperate between R and external genomics tools.

Thanks for reading through these notes. I hope they give you a useful starting point or spark new ideas for your own analysis.

Feel free to share thoughts, suggestions, or resources. Collaboration makes the whole community stronger.

Until next time, happy exploring!

Cynthia SC