We evaluated the regulatory activity of cCREs using four complementary functional assays: genome-wide STARR-seq, MPRA, CRISPR perturbations, and transgenic mouse enhancer assays. Across these assays, nearly all human cCREs (97%) were tested in at least one cellular context, with a substantial fraction showing significant activity. Because STARR-seq provided the highest throughput and enabled systematic assessment of both activating and repressive effects, we focused subsequent analyses on this assay. To resolve cCRE-specific activity from STARR fragments that often span multiple elements, we developed CAPRA, a CRE-centric framework that quantifies enhancer and silencer activity and supports downstream analyses of cell type specificity, sequence features, and combinatorial interactions.
Frequently Asked Questions
Each functional assay tests regulatory activity in a different biological and experimental context. In MPRA experiments, cCREs are typically placed adjacent to a promoter, testing intrinsic sequence-driven regulatory potential. In STARR-seq, cCREs are embedded in the 3′ untranslated region of a reporter transcript, enabling high-throughput detection of both activating and repressive effects but outside native chromatin. In contrast, CRISPR perturbation assays interrogate cCREs in their endogenous genomic and chromatin context, using context-specific transcriptional readouts. These differences are informative rather than contradictory: together, they distinguish cCREs with latent regulatory potential from those that are active in specific chromatin environments. Even within a single assay type, technical factors (e.g., library design or experimental implementation across laboratories) can influence results. We systematically evaluated these sources of variability and their implications in Supplementary Note 2.1.
Across all functional assays, approximately 28% of tested cCREs showed significant activity in at least one assay and cellular context. This estimate likely underrepresents the true fraction of functional elements, as most assays were performed in a limited number of cell types and are biased toward detecting enhancer activity. In K562 cells—where the most extensive functional data are available—91% of promoter cCREs and 65% of enhancer cCREs exhibited significant activity. These higher rates highlight the strong dependence of functional readouts on biosample selection and underscore the importance of expanding assays across additional cellular contexts.
Although whole-genome STARR-seq assays test regulatory sequences outside their native chromatin context, they directly probe the intrinsic regulatory potential encoded in DNA sequence. This makes STARR-seq particularly powerful for identifying sequence features that can drive activation or repression when the appropriate transcription factors are present. Our cross-cell-type analyses show that STARR-seq activity patterns closely reflect differences in transcription factor availability across biosamples, linking sequence composition to cell type-specific regulatory activity. In this way, STARR-seq complements chromatin-based and CRISPR perturbation assays by separating sequence-driven potential from chromatin accessibility and higher-order genomic context.
CAPRA anchors functional analysis directly on annotated cCREs, enabling element-level interpretation of WG-STARR-seq data. In these human WG-STARR-seq experiments, reporter fragments did not include unique molecular identifiers (UMIs), making it difficult to directly match individual DNA input fragments to their corresponding RNA outputs. As a result, traditional analyses rely on peak calling, which often aggregates signal across large genomic regions and obscures the contributions of individual regulatory elements. By quantifying RNA-to-DNA ratios at cCRE anchors, CAPRA resolves enhancer and silencer activity at single-element resolution and enables systematic analyses of sequence features, cell type specificity, and combinatorial interactions between cCREs.
Motif enrichment analyses of cell type–specific STARR-seq activity revealed expected transcription factors in K562 and HepG2, but also highlighted TP53 and GFI1B motifs in HepG2-active enhancers. TP53-associated enhancers were active in cell lines with functional TP53 but inactive in K562, consistent with known differences in TP53 status across these lines. This illustrates how regulatory activity measured in cancer cell lines can reflect disruptions to underlying regulatory programs.
In contrast, GFI1B—a transcriptional repressor expressed in erythroid cells—was associated with cCREs that showed reduced or repressive activity in K562. This provided early evidence that our STARR-seq–based framework can detect silencing behavior, a property we later leverage to identify and characterize silencer elements genome-wide.
Top: Genome browser view of three distal enhancer cCREs (denoted by 1-3) in the MTNR1A intron with DNase (green) and H3K27ac (yellow) signals in K562. A STARR-seq peak is shown in black. Bottom: CAPRA quantifications for the three enhancers shown in top: EH38E3620077 (1), EH38E3620078 (2) and EH38E3620079 (3) using solo fragments (top) and double fragments (bottom). High quantifications are denoted in purple (CAPRA, p = 0.03).