Download Resources

The "Download Resources" subworkflow downloads resources required for common bioinformatics analyses. The resources are downloaded from the providers listed below. Each provider has its own license and citation requirements - please review and acknowledge the original sources when using an OBLX generated library.

Note: OBLX downloads Twist Exome BED files from UCSC, these do not fall under an open source license, please check for your use case.

Provider	Download URL	Citation
GENCODE (https://www.gencodegenes.org/)	https://ftp.ebi.ac.uk/pub/databases/gencode	Mudge et al. (2025)
UCSC (https://genome.ucsc.edu/)	https://hgdownload.soe.ucsc.edu	Casper et al. (2026)
GATK / Broad resource bundle (https://gatk.broadinstitute.org/)	https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0	Van der Auwera et al. (2020)
gnomAD (https://gnomad.broadinstitute.org/)	https://storage.googleapis.com/gcp-public-data--gnomad/release	Chen et al. (2024), Karczewski et al. (2020)
UniProt (https://www.uniprot.org/)	https://rest.uniprot.org/uniprotkb/stream	The UniProt Consortium (2025)
NCBI (https://www.ncbi.nlm.nih.gov/snp/)	https://ftp.ncbi.nih.gov/	Phan et al. (2025)
Ensembl (https://www.ensembl.org)	https://ftp.ensembl.org/pub/	Keane et al. (2011)
Genbank (https://www.ncbi.nlm.nih.gov/genbank/)	download via efetch with Genbank accessions specified in https://github.com/TRON-Bioinformatics/oblx/blob/dev/workflow/resources/tcga_viruses.tsv	Clark et al. (2015)

Input

No input is required. However, the organism and releases of individual resources can be specified in the config file.

Usage

To run the pull resources subworkflow, run the following command.

snakemake --until pull_resources \ 
    --directory </path/to/output/directory> \
    --software-deployment-method [conda|apptainer] \
    --latency-wait 60 \
    [--configfile <path/to/config/file>] \
    [--profile </path/to/cluster/profile/>]

--directory: Directory to store the results of the workflow.
--software-deployment-method: Either conda or apptainer. Container images for apptainer are configured in config/container_config.yaml.
--latency-wait: Wait for e.g. 60 seconds for files to be created due to IO latency
--configfile (optional): Defines e.g. the reference genome version that should be used, see Configuration
--profile (optional): Specify cluster profile to submit jobs e.g. to a HPC

Output

The pull resources step gathers all files that are required for index generation or that are directly used by downstream tools. An overview of all downloaded/generated resources is given for human- and mouse-mode.

The workflow generates the following directory structure (in human mode):

</path/to/output/dir>/resources
├── exome_definition
│   ├── ref_cds.bed
│   ├── ref_exome.bed
│   ├── ref_exome.bed.gz
│   ├── ref_exome.bed.gz.tbi
│   ├── twist_comprehensive_exome.bed
│   ├── twist_core_exome.bed
│   ├── twist_exome2.bed
│   └── twist_refseq.bed
├── gatk_bundle
│   ├── 1000G_omni2.5.hg38.vcf.gz
│   ├── 1000G_omni2.5.hg38.vcf.gz.tbi
│   ├── 1000G_phase1.snps.high_confidence.hg38.vcf.gz
│   ├── 1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi
│   ├── hapmap_3.3.hg38.vcf.gz
│   ├── hapmap_3.3.hg38.vcf.gz.tbi
│   ├── Homo_sapiens_assembly38.dbsnp138.vcf.gz
│   ├── Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi
│   ├── Homo_sapiens_assembly38.known_indels.vcf.gz
│   ├── Homo_sapiens_assembly38.known_indels.vcf.gz.tbi
│   ├── Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
│   └── Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi
├── germline_variants
│   ├── gnomAD
│   │   └── {exomes,genomes}
│   │       ├── af_only_gnomad_hg38.vcf.gz
│   │       ├── af_only_gnomad_hg38.vcf.gz.tbi
│   │       ├── common_biallelic_chr1.vcf.gz
│   │       └── common_biallelic_chr1.vcf.gz.tbi
│   ├── dbSNP_151.vcf.gz
│   └── dbSNP_151.vcf.gz.tbi
├── mappability
│   ├── encode_exclusion.bed
│   ├── grcExclusions.bed
│   └── ucsc_problematic.bed
├── chromosome_sizes.txt
├── ref_annot.bed
├── ref_annot.gtf
├── ref_annot_gene2symbol.tsv
├── ref_annot_splice_sites.tsv
├── ref_annot_transcript2gene.tsv
├── ref_annot_metadata_SwissProt.tsv
├── ref_annot_metadata_TrEMBL.tsv
├── ref_genome_primary.fasta
├── ref_genome_grc_masked.fasta
├── ref_genome_masked_final.fasta
├── ref_genome.fasta
├── ref_genome.fasta.fai
├── ref_genome.dict
├── ref_transcripts.fasta
├── ref_genome_repeatmasker.bed
├── uniprot
│   └── uniprot_annotations.tsv
└── viruses
    └── tcga_virus_decoy.fasta

Gencode reference files

The following files are downloaded directly from Gencode (Mudge et al., 2025). In human mode, problematic regions defined by GRC are hard masked in the reference fasta while repetitive regions are not masked.

chromosome_sizes.txt: Lengths of the chromosomes
ref_annot.gtf: Comprehensive gene annotation based on primary assembly (PRI) (gencode.v<release>.primary_assembly.annotation.gtf.gz)
ref_annot.bed: BED12 file of the transcripts (transformed from GTF file)
ref_genome.fasta: Symlink to the primary assembly reference genome fasta. When pull_resources is run in human mode, the symlink points to the masked genome (masking is based on resources/mappability/grcExclusions.bed which contains a set of regions that have been flagged by the GRC to contain false duplications or contamination sequences (Behera et al., 2022), downloaded from UCSC, see section Mappability). Additionally in human mode, pseudoautosomal regions (defined in workflow/resources/GRCh38_pseudoautosomal_regions.bed from Ensembl) are hard masked. If pull_resources is run in mouse mode, the symlink points to the primary assembly (ref_genome_primary.fasta).
ref_genome_grc_masked.fasta (Only given in human mode): Based on the primary assembly, problematic regions defined by GRC are hard masked (e.g. false duplications and contaminations, see GIAB readme)
ref_genome_masked_final.fasta (Only given in human mode): Based on ref_genome_grc_masked.fasta file, pseudoautosomal regions (defined in workflow/resources/GRCh38_pseudoautosomal_regions.bed from Ensembl) are masked.
ref_annot_metadata_SwissProt.tsv: UniProtKB/SwissProt entry associated to the transcript (from Ensembl xref pipeline gencode.v<release>.metadata.SwissProt.gz)
ref_annot_metadata_TrEMBL.tsv: UniProtKB/TrEMBL entry associated to the transcript (from Ensembl xref pipeline gencode.v<release>.metadata.TrEMBL.gz)
ref_genome_primary.fasta: Primary (PRI) assembly (GRC<build>.primary_assembly.genome.fa.gz)
ref_transcripts.fasta: Transcript sequences (gencode.v<release>.transcripts.fa.gz)
ref_annot_transcript2gene.tsv: Translation of transcript ID to gene ID (transformed from GTF file)
ref_annot_gene2symbol.tsv: Translation of gene ID to gene symbol (gencode.v<release>.metadata.HGNC.gz)
ref_annot_splice_sites.tsv: Splice sites of reference transcripts generated from the GTF (see splice2neo)

UCSC repeatmasker regions

The RepeatMasker annotated regions are downloaded.

Note: We do not mask the repetitive regions in the ref_genome.fasta file

ref_genome_repeatmasker.bed: Repeat masker regions from UCSC golden path translated into BED format.

Exome definition

The exome definition directory contains exonic and CDS (coding sequence) BED files. These files can be used e.g. to restrict specific variant callers (e.g. Mutect2) to only consider the specified regions for variant calling. In human and mouse mode the following files can be found in this directory:

ref_cds.bed: The CDS (coding sequence) intervals derived from the annotation GTF. CDS regions are merged. Based on the DeepVariant RNA-seq variant calling tutorial.
ref_exome.bed: The exonic intervals extended by intron_slop (default: 20) bases defined in the config file. The file was generated by selecting the exons from the annotation GTF with the tag defined in exome_transcript_definition.

In human mode, the Twist exome bed files are additionally in this directory:

twist_refseq.bed: Exome capture kit Twist_Exome_RefSeq_targets_hg38.bb downloaded from UCSC exomeProbesets transformed into bed file using ucsc-bigbedtobed version 469
twist_core_exome.bed: Exome capture kit Twist_Exome_Target_hg38.bb downloaded from UCSC exomeProbesets transformed into bed file using ucsc-bigbedtobed version 469
twist_comprehensive_exome.bed: Exome capture kit Twist_ComprehensiveExome_targets_hg38.bb downloaded from UCSC exomeProbesets transformed into bed file using ucsc-bigbedtobed version 469
twist_exome2.bed: Exome capture kit TwistExome21.bb downloaded from UCSC exomeProbesets transformed into bed file using ucsc-bigbedtobed version 469

GATK bundle

The GATK bundle is only available for human and thus only present in the output when pull_resources is run in human mode. The URL to the GATK bundle for download can be specified in the config file via the parameter gatk_url and is by default the public Google cloud bucket: https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0.

The following files are downloaded:

1000G_omni2.5.hg38.vcf.gz
1000G_phase1.snps.high_confidence.hg38.vcf.gz
hapmap_3.3.hg38.vcf.gz
Homo_sapiens_assembly38.dbsnp138.vcf.gz
Homo_sapiens_assembly38.known_indels.vcf.gz
Mills_and_1000G_gold_standard.indels.hg38.vcf.gz

For all VCF files, the index is created using tabix v1.11.

Germline variants

The germline variants are downloaded from gnomAD and processed and are only available when pull_resources was run in human mode. The gnomAD version has to be specified in the config file (see Configuration). Default is v4.1 and tests were done for v4.1. For every file an index is created with tabix v1.11. For gnomAD, the files af_only_gnomad_hg38.vcf.gz and common_biallelic_chr1.vcf.gz are both generated for exomes and genomes and are found in the respective subdirectory.

af_only_gnomad_hg38.vcf.gz: gnomAD exome variants annotated only with population allele frequency for Mutect2
This file was created by downloading all chromosomes of gnomAD exome variants. Subsequently variants with a population allele frequency higher than minimum_allele_frequency, defined in the config (see Configuration, default 0.001) are filtered for PASS. Finally all annotations are removed and only allele frequency is kept.
This file should be used as --germline-resource when running Mutect2.
common_biallelic_chr1.vcf.gz: Common germline variant sites VCF for GetPileupSummaries
This file was created from exonic variants on chromosome 1 from GNOMAD. These variants are filtered for AF > 0.05, --max-alleles 2 and PASS (these filters are described in the Mutect2 best practices workflow where "variants_for_contamination" is described)
dbSNP_151.vcf.gz: dbSNP from NCBI FTP server. Currently only version 151 is supported for human.
This file is downloaded from NCBI FTP and ENSEMBl chromosome names (without chr) are translated to Gencode chromosome names (with chr) using chromosome mappings.
In mouse mode, the dbSNP file is named dbSNP_mouse.vcf.gz and is downloaded from Ensembl FTP for the matching Ensembl version to the config specified GENCODE version. The chromosome names are adjusted to GENCODE convention via https://github.com/dpryan79/ChromosomeMappings.

Mappability

Mappability contains bed files with regions that are complicated to map with short reads. The files are only downloaded when pull_resources is run in human mode. These files are downloaded from UCSC. All files are transformed to bed format using ucsc-bigbedtobed v469.

encode_exclusion.bed: File encBlacklist.bb transformed to bed format.
grcExclusions.bed: File grcExclusions.bb transformed to bed format.
ucsc_problematic.bed: File comments.bb transformed to bed format.

UniProt

The file resources/uniprot/uniprot_annotations.tsv contains data fetched from https://rest.uniprot.org/uniprotkb/stream for the fields specified in workflow/scripts/programmatically_get_uniprot.py. The table contains the column transcript_id which contains the GENCODE identifiers retrieved from the TrEMBL and SwissProt mappings fetched from GENCODE (resources/ref_annot_metadata_{TrEMBL,SwissProt}.tsv).

Viruses

Genome sequences of common cancer related viruses in fasta format that can be used for contamination detection and profiling. The list of viruses (workflow/resources/tcga_viruses.tsv) was downloaded from TCGA. The viral decoy sequences listed in this file can be appended to the reference genome (resources/ref_genome.fasta) to enable investigation of reads mapping to viral sequences. If this is needed, all tool-specific indices that depend on the reference FASTA (e.g., aligner index) must be rebuilt from the updated file. Viral sequences are only downloaded in human mode.

Overview of downloaded resources in human-mode

Path in OBLX Library	Origin	Short description
resources/chromosome_sizes.txt	GENCODE	Chromosome sizes file for the resources/ref_genome.fasta file.
resources/exome_definition/ref_cds.bed	GENCODE	Reference CDS BED containing all CDS regions specified in the GTF.
resources/exome_definition/ref_exome.bed	GENCODE	Reference exome BED containing all exonic regions specified in the GTF with N positions padded left and right (defined with config parameter `intron_slop`).
resources/exome_definition/ref_exome.bed.gz	GENCODE	Bgzipped file of ref_exome.bed.
resources/exome_definition/ref_exome.bed.gz.tbi	GENCODE	Tabix of ref_exome.bed.
resources/exome_definition/twist_comprehensive_exome.bed	UCSC	Twist comprehensive exome definition BED.
resources/exome_definition/twist_core_exome.bed	UCSC	Twist core exome definition BED.
resources/exome_definition/twist_exome2.bed	UCSC	Twist Exome2 definition BED.
resources/exome_definition/twist_refseq.bed	UCSC	Twist RefSeq exome definition BED.
resources/gatk_bundle/1000G_omni2.5.hg38.vcf.gz	GATK	1000 Genomes Omni 2.5 SNP resource VCF.
resources/gatk_bundle/1000G_omni2.5.hg38.vcf.gz.tbi	GATK	Tabix index for the 1000 Genomes Omni 2.5 VCF.
resources/gatk_bundle/1000G_phase1.snps.high_confidence.hg38.vcf.gz	GATK	1000 Genomes high-confidence SNP resource VCF.
resources/gatk_bundle/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi	GATK	Tabix index for the 1000 Genomes high-confidence SNP VCF.
resources/gatk_bundle/hapmap_3.3.hg38.vcf.gz	GATK	HapMap 3.3 SNP resource VCF.
resources/gatk_bundle/hapmap_3.3.hg38.vcf.gz.tbi	GATK	Tabix index for the HapMap 3.3 VCF.
resources/gatk_bundle/Homo_sapiens_assembly38.dbsnp138.vcf.gz	GATK	GATK dbSNP 138 VCF.
resources/gatk_bundle/Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi	GATK	Tabix index for the GATK dbSNP 138 VCF.
resources/gatk_bundle/Homo_sapiens_assembly38.known_indels.vcf.gz	GATK	GATK known indels VCF.
resources/gatk_bundle/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi	GATK	Tabix index for the GATK known indels VCF.
resources/gatk_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz	GATK	GATK gold-standard indel resource VCF.
resources/gatk_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi	GATK	Tabix index for the GATK gold-standard indel VCF.
resources/germline_variants/dbSNP_151.vcf.gz	dbSNP	dbSNP 151 germline variant VCF.
resources/germline_variants/dbSNP_151.vcf.gz.tbi	dbSNP	Tabix index for the dbSNP 151 VCF.
resources/germline_variants/gnomAD/exomes/af_only_gnomad_hg38.vcf.gz	GnomAD	GnomAD exomes allele-frequency-only VCF.
resources/germline_variants/gnomAD/exomes/af_only_gnomad_hg38.vcf.gz.tbi	GnomAD	Tabix index for the GnomAD exomes VCF.
resources/germline_variants/gnomAD/exomes/common_biallelic_chr1.vcf.gz	GnomAD	Common biallelic chr1 variants from GnomAD exomes.
resources/germline_variants/gnomAD/exomes/common_biallelic_chr1.vcf.gz.tbi	GnomAD	Tabix index for the GnomAD exomes chr1 VCF.
resources/germline_variants/gnomAD/genomes/af_only_gnomad_hg38.vcf.gz	GnomAD	GnomAD genomes allele-frequency-only VCF.
resources/germline_variants/gnomAD/genomes/af_only_gnomad_hg38.vcf.gz.tbi	GnomAD	Tabix index for the GnomAD genomes VCF.
resources/germline_variants/gnomAD/genomes/common_biallelic_chr1.vcf.gz	GnomAD	Common biallelic chr1 variants from GnomAD genomes.
resources/germline_variants/gnomAD/genomes/common_biallelic_chr1.vcf.gz.tbi	GnomAD	Tabix index for the GnomAD genomes chr1 VCF.
resources/mappability/encode_exclusion.bed	UCSC	ENCODE exclusion regions BED (downloaded from UCSC).
resources/mappability/grcExclusions.bed	UCSC	GRC exclusion regions BED (downloaded from UCSC) used to mask the reference genome in human mode.
resources/mappability/ucsc_problematic.bed	UCSC	UCSC problematic regions BED (downloaded from UCSC).
resources/ref_annot.bed	GENCODE	BED12 transcript annotation derived from the GTF.
resources/ref_annot.gtf	GENCODE	Comprehensive gene annotation GTF based on the primary assembly.
resources/ref_annot_gene2symbol.tsv	GENCODE	Gene-to-symbol mapping table.
resources/ref_annot_metadata_SwissProt.tsv	GENCODE	UniProtKB/SwissProt entry associated to the transcript (from Ensembl xref pipeline)
resources/ref_annot_metadata_TrEMBL.tsv	GENCODE	UniProtKB/TrEMBL entry associated to the transcript (from Ensembl xref pipeline)
resources/ref_annot_splice_sites.tsv	GENCODE	Splice sites of reference transcripts generated from the GTF (see https://github.com/TRON-Bioinformatics/splice2neo)
resources/ref_annot_transcript2gene.tsv	GENCODE	Transcript-to-gene mapping table.
resources/ref_genome.dict	GENCODE	Sequence dictionary for the reference genome (generated with GATK CreateSequenceDictionary).
resources/ref_genome.fasta	GENCODE	Symlink to the final reference genome FASTA (masked in human mode, primary assembly in mouse mode).
resources/ref_genome.fasta.fai	GENCODE	Index for the reference genome FASTA.
resources/ref_genome_grc_masked.fasta	GENCODE	Primary assembly with GRC-flagged problematic regions hard-masked (human only).
resources/ref_genome_masked_final.fasta	GENCODE	GRC-masked assembly with pseudoautosomal regions additionally hard-masked (human only).
resources/ref_genome_primary.fasta	GENCODE	Primary assembly genome FASTA downloaded from Gencode.
resources/ref_genome_repeatmasker.bed	UCSC	RepeatMasker-derived genome repeat regions BED downloaded from UCSC.
resources/ref_transcripts.fasta	GENCODE	Transcript sequences FASTA derived from Gencode resources.
resources/uniprot/uniprot_annotations.tsv	UniProt	Combined UniProt annotation table with current status (directly downloaded from UniProt's rest API) of SwissProt and TrEMBL. The `transcript_id` column maps to the transcript identifiers specified in the `ref_annot.gtf` file.
resources/viruses/tcga_virus_decoy.fasta	TCGA/GenBank	Viral decoy FASTA used for contamination/decoy-aware analyses.

Overview of downloaded resources in mouse-mode

Path in OBLX Library	Origin	Short description
resources/chromosome_sizes.txt	GENCODE	Chromosome sizes file for the resources/ref_genome.fasta file.
resources/exome_definition/ref_cds.bed	GENCODE	Reference CDS BED containing all CDS regions specified in the GTF.
resources/exome_definition/ref_exome.bed	GENCODE	Reference exome BED containing all exonic regions specified in the GTF with N positions padded left and right (defined with config parameter `intron_slop`).
resources/exome_definition/ref_exome.bed.gz	GENCODE	Bgzipped file of ref_exome.bed.
resources/exome_definition/ref_exome.bed.gz.tbi	GENCODE	Tabix of ref_exome.bed.
resources/germline_variants/dbSNP_mouse.vcf.gz	dbSNP	dbSNP germline variant VCF.
resources/germline_variants/dbSNP_mouse.vcf.gz.tbi	dbSNP	Tabix index for the dbSNP VCF.
resources/ref_annot.bed	GENCODE	BED12 transcript annotation derived from the GTF.
resources/ref_annot.gtf	GENCODE	Comprehensive gene annotation GTF based on the primary assembly.
resources/ref_annot_gene2symbol.tsv	GENCODE	Gene-to-symbol mapping table.
resources/ref_annot_metadata_SwissProt.tsv	GENCODE	UniProtKB/SwissProt entry associated to the transcript (from Ensembl xref pipeline)
resources/ref_annot_metadata_TrEMBL.tsv	GENCODE	UniProtKB/TrEMBL entry associated to the transcript (from Ensembl xref pipeline)
resources/ref_annot_splice_sites.tsv	GENCODE	Splice sites of reference transcripts generated from the GTF (see https://github.com/TRON-Bioinformatics/splice2neo)
resources/ref_annot_transcript2gene.tsv	GENCODE	Transcript-to-gene mapping table.
resources/ref_genome.dict	GENCODE	Sequence dictionary for the reference genome (generated with GATK CreateSequenceDictionary).
resources/ref_genome.fasta	GENCODE	Symlink to the primary assembly reference genome FASTA (in mouse mode).
resources/ref_genome.fasta.fai	GENCODE	Index for the reference genome FASTA.
resources/ref_genome_primary.fasta	GENCODE	Primary assembly genome FASTA downloaded from Gencode.
resources/ref_genome_repeatmasker.bed	UCSC	RepeatMasker-derived genome repeat regions BED downloaded from UCSC.
resources/ref_transcripts.fasta	GENCODE	Transcript sequences FASTA derived from Gencode resources.
resources/uniprot/uniprot_annotations.tsv	UniProt	Combined UniProt annotation table with current status (directly downloaded from UniProt's rest API) of SwissProt and TrEMBL. The `transcript_id` column maps to the transcript identifiers specified in the `ref_annot.gtf` file.

References

Behera, S., LeFaive, J., Orchard, P., Mahmoud, M., Paulin, L. F., Farek, J., Soto, D. C., Parker, S. C. J., Smith, A. V., Dennis, M. Y., Zook, J. M., & Sedlazeck, F. J. (2022). Fixing reference errors efficiently improves sequencing results. Genomics. https://doi.org/10.1101/2022.07.18.500506
Casper, J., Speir, M. L., Raney, B. J., Perez, G., Nassar, L. R., Lee, C. M., Hinrichs, A. S., Gonzalez, J. N., Fischer, C., Diekhans, M., Clawson, H., Benet-Pages, A., Barber, G. P., Vaske, C. J., van Baren, M. J., Wang, K., Rodriguez, Y. J. P., Jenkins-Kiefer, J. A., Chalamala, M., … Haeussler, M. (2026). The UCSC Genome Browser database: 2026 update. Nucleic Acids Research, 54(D1), D1331–D1335. https://doi.org/10.1093/nar/gkaf1250
Chen, S., Francioli, L. C., Goodrich, J. K., Collins, R. L., Kanai, M., Wang, Q., Alföldi, J., Watts, N. A., Vittal, C., Gauthier, L. D., Poterba, T., Wilson, M. W., Tarasova, Y., Phu, W., Grant, R., Yohannes, M. T., Koenig, Z., Farjoun, Y., Banks, E., … Karczewski, K. J. (2024). A genomic mutational constraint map using variation in 76,156 human genomes. Nature, 625(7993), 92–100. https://doi.org/10.1038/s41586-023-06045-0
Karczewski, K. J., Francioli, L. C., Tiao, G., Cummings, B. B., Alföldi, J., Wang, Q., Collins, R. L., Laricchia, K. M., Ganna, A., Birnbaum, D. P., Gauthier, L. D., Brand, H., Solomonson, M., Watts, N. A., Rhodes, D., Singer-Berk, M., England, E. M., Seaby, E. G., Kosmicki, J. A., … MacArthur, D. G. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature, 581(7809), 434–443. https://doi.org/10.1038/s41586-020-2308-7
Mudge, J. M., Carbonell-Sala, S., Diekhans, M., Martinez, J. G., Hunt, T., Jungreis, I., Loveland, J. E., Arnan, C., Barnes, I., Bennett, R., Berry, A., Bignell, A., Cerdán-Vélez, D., Cochran, K., Cortés, L. T., Davidson, C., Donaldson, S., Dursun, C., Fatima, R., … Frankish, A. (2025). GENCODE 2025: Reference gene annotation for human and mouse. Nucleic Acids Research, 53(D1), D966–D975. https://doi.org/10.1093/nar/gkae1078
Phan, L., Zhang, H., Wang, Q., Villamarin, R., Hefferon, T., Ramanathan, A., & Kattman, B. (2025). The evolution of dbSNP: 25 years of impact in genomic research. Nucleic Acids Research, 53(D1), D925–D931. https://doi.org/10.1093/nar/gkae977
The UniProt Consortium, Bateman, A., Martin, M.-J., Orchard, S., Magrane, M., Adesina, A., Ahmad, S., Bowler-Barnett, E. H., Bye-A-Jee, H., Carpentier, D., Denny, P., Fan, J., Garmiri, P., Gonzales, L. J. D. C., Hussein, A., Ignatchenko, A., Insana, G., Ishtiaq, R., Joshi, V., … Zhang, J. (2025). UniProt: The Universal Protein Knowledgebase in 2025. Nucleic Acids Research, 53(D1), D609–D617. https://doi.org/10.1093/nar/gkae1010
van der Auwera, G., & O’Connor, B. D. (2020). Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. O’Reilly Media, Incorporated. https://books.google.de/books?id=wwiCswEACAAJ