The ENCODE4 Registry represents a substantial expansion in both scale and scope, encompassing over 2.3 million human and nearly 1 million mouse candidate cis-regulatory elements (cCREs). This growth reflects the integration of newly generated ENCODE4 datasets, a large increase in the diversity of biosamples profiled, and methodological advances in the cCRE discovery and annotation pipeline. Together, these updates enable more comprehensive and context-aware annotation of regulatory elements across tissues, cell types, and cellular states.
Frequently Asked Questions
The expansion of the ENCODE4 Registry is driven by two main factors. First, ENCODE4 substantially increased the number and diversity of biosamples profiled, more than doubling the representation of tissues, cell types, and cellular states. Second, updates to the computational pipeline improved recovery of regulatory elements and expanded the set of cCRE classes that can be annotated. In addition to canonical promoter- and enhancer-associated elements, the Registry now includes chromatin-accessible and transcription-factor–anchored cCREs (e.g., CA, CA-TF, and TF cCREs), enabling annotation of regulatory elements such as silencers and dynamic enhancers that were underrepresented or missed in earlier releases. These new classes are discussed in more detail in later sections.
Expanded biosample coverage enables both broader and more structured analyses of regulatory elements across biological contexts. Beyond increasing the total number of tissues and cell types represented, ENCODE4 includes multiple coordinated biosample collections designed to support specific analytical questions. These include deeply profiled reference cell lines with matched functional genomics assays (e.g., Bru-seq, long-read RNA-seq, Hi-C, PRO-cap), iPSC-derived lineages that allow regulatory comparisons across differentiation from a shared genetic background, and donor-matched collections such as EN-TEx that capture regulatory variation across tissues within the same individual. Together, these mini-collections enable analyses of cell type specificity, cell state transitions, developmental trajectories, and inter-individual variability, providing a richer framework for interpreting cCRE function than would be possible from isolated biosamples alone.
Multi-mapper cCREs represent regulatory elements located in genomic regions where short sequencing reads cannot be uniquely assigned, such as duplicated loci or repetitive elements. While these cCREs are included for completeness, they do not have full biosample-specific classifications because recomputing all chromatin and transcription factor signals using remapped reads across thousands of datasets was computationally infeasible. We recommend including multi-mapper cCREs in genome-wide overlap or completeness analyses, but for biosample-specific or class-based analyses, we generally suggest using the ~2.35 million human cCREs with full annotations and classifications.
The smaller number of mouse cCREs primarily reflects differences in data availability rather than fundamental biological divergence. ENCODE4 includes substantially more human biosamples and experiments than mouse, and analyses indicate that comparable numbers of regulatory elements would be identified if coverage were similar. Future expansions of the Registry will incorporate additional mouse datasets to improve cross-species comparability.