The ENCODE4 Registry classifies candidate cis-regulatory elements (cCREs using their biochemical signatures and genomic context, extending earlier frameworks to capture a broader spectrum of regulatory activity. In addition to canonical promoter- and enhancer-associated elements, the Registry now includes chromatin-accessible and transcription-factor–anchored classes that enable systematic study of silencers, latent enhancers, and context-dependent regulatory elements.
Overview of our cCRE classification scheme. cCREs are classified based on their patterns of biochemical signals (chromatin accessibility in green, H3K4me3 in red, H3K27ac in yellow, CTCF in blue, transcription factor in purple) and distance from annotated TSSs. High signals are denoted by peaks. The +/- symbols indicate the corresponding signal may or may not be present and its presence does not impact classification. New categories of elements are denoted by stars.
Frequently Asked Questions
cCREs are classified using a cell type–agnostic framework that integrates each element’s dominant biochemical signals across all available biosamples and its distance to the nearest annotated transcription start site (TSS). This approach is analogous to gene annotation, which defines genes independently of their expression level in any one cell type.
Promoter-like signatures (promoter) must fall within 200 bp of a TSS and have high chromatin accessibility and H3K4me3 signals.
TSS-proximal enhancer-like signatures (proximal enhancer) have high chromatin accessibility and H3K27ac signals and are within 2 kb of an annotated TSS. If they are within 200 bp of a TSS, they must also have low H3K4me3 signal.
TSS-distal enhancer-like signatures (distal enhancer) have high chromatin accessibility and H3K27ac signals and are farther than 2 kb from an annotated TSS.
Chromatin accessibility + H3K4me3 (CA-H3K4me3) have high chromatin accessibility and H3K4me3 signals but low H3K27ac signals and do not fall within 200 bp of a TSS.
Chromatin accessibility + CTCF (CA-CTCF) have high chromatin accessibility and CTCF signals but low H3K4me3 and H3K27ac signals.
Chromatin accessibility + transcription factor (CA-TF) have high chromatin accessibility, low H3K4me3, H3K27ac, and CTCF signals, and are bound by a transcription factor.
Chromatin accessibility (CA) have high chromatin accessibility and low H3K4me3, H3K27ac, and CTCF signals.
Transcription factor (TF) have low chromatin accessibility, low H3K4me3, H3K27ac, and CTCF signals and are bound by a transcription factor.
Cell type–agnostic cCREs represent a global classification based on evidence aggregated across all biosamples and are most appropriate for genome-wide analyses or comparisons across studies.
Aggregate cCREs are group-level annotations on biosamples that have been combined from the same organ/tissue of origin. For each biosample grouping, cCREs were included if they fit into one or more of the following categories based on DNAase Z-scores:
Group-supported activity: Z > 1.64 (95th percentile) in ≥5 biosamples
Moderately strong activity: Z > 2.02 (98th percentile) in ≥2 biosamples
Highly strong activity: Z > 2.32 (99th percentile) in ≥1 biosample
Biosample-specific cCREs are annotated in individual cell or tissue types. In general, we recommend biosample-specific annotations when studying tissue-specific or context-dependent regulation.
Only a subset of the ~2.3M human cCREs is active in any given biosample. In a typical biosample, on the order of ~100,000 cCREs have chromatin accessibility. The exact number varies depending on the quality and depth of the chromatin accessibility data, particularly DNase-seq, with higher-quality datasets generally identifying more active elements. Despite this variability, most well-profiled biosamples fall within a similar range, providing a consistent scale for interpreting biosample-specific regulatory landscapes.
Nearly all cCREs from ENCODE3 are retained in ENCODE4, while additional elements arise from expanded biosample coverage and improvements to the computational pipeline. These newly added cCREs are enriched for regulatory activity and evolutionary conservation relative to non-cCRE regions and show strong overlap with regulatory annotations from multiple independent studies. Differences in activity and conservation among older and newly added cCREs largely reflect shifts in class composition, with newer elements enriched for distal enhancers and transcription-factor–anchored classes that tend to be more cell type–specific and evolutionarily dynamic.
Yes. Analyses using variational autoencoders trained on cCRE sequences show that promoter and enhancer classes segregate along latent dimensions strongly correlated with GC content. GC-rich elements are more common among promoter-associated cCREs and broadly active enhancers, while AT-rich elements are enriched among distal, cell type–specific enhancers. These sequence preferences align with transcription factor motif composition and binding biases, supporting a model in which biochemical classification captures underlying, biologically meaningful sequence features.
The Registry reflects the availability, quality, and distribution of underlying functional genomics data. Tissues with extensive profiling, such as blood and brain, are more comprehensively annotated than less-sampled tissues. In addition, many biosamples lack complete chromatin datasets, which can limit confident classification—particularly for enhancers—and lead to conservative labels such as chromatin-accessible (CA). We recommend interpreting aggregate annotations with care and prioritizing biosample-level analyses when assessing tissue-specific regulatory activity.