Configuration
The configuration file defines the values for the parameters below. Default
values are described in
workflow/schemas/config.schema.yaml.
If the user provides a config file, it is validated against this schema.
organism: Takeshumanormouserelease: Gencode release version (starts withMfor mouse)genome_build: Genome build version, has to match therelease(currently supported:GRCh38,GRCm39,GRCm38)gnomad_release: Version of the gnomad release (suggested version:4.1)intron_slop: Number of intronic bases that should extend the exome definitionref_exome.bed(suggested:20)exome_transcript_definition: Define which GENCODE transcripts should be used to create the exome definition. The default is the"basic"tag. Please note that GENCODE basic definitions were deprecated in version 48 and replaced by the tag"GENCODE_Primary". These are not the same sets of transcripts. For now, we recommend continuing to use basic transcripts.star_sjdb_overhang:--sjdbOverhangparameter for STAR index generation (should be read length - 1, in most cases100should be sufficient)star_genome_sa_index_n_bases: Defines the STAR parametergenomeSAindexNbases(default 14). The lower the value, the smaller the index. This can be useful, when creating test data for CI tests to keep the file small.minimum_allele_frequency(only required in human mode):af_only_gnomad_hg38.vcf.gzvariants are filtered for population allele frequency > this cutoff (suggested:0.001)gencode_url: URL of GENCODE, where reference genome and annotation is downloaded from (default: "https://ftp.ebi.ac.uk/pub/databases/gencode")ucsc_url(only required in human mode): URL of UCSC FTP (default: "https://hgdownload.soe.ucsc.edu/gbdb/hg38")ucsc_golden_path_url(only required in human mode): URL of UCSC golden path (default: "https://hgdownload.soe.ucsc.edu/goldenPath")gatk_url(only required in human mode): URL of GATK resource bundle (default: "https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0")gnomad_url(only required in human mode): URL of GNOMAD (default: "https://storage.googleapis.com/gcp-public-data--gnomad/release")chrom_filter: List of chromosome names that should be contained in theaf_only_gnomad_hg38.vcf.gzfile
On top of that, we use the
config/container_config.yaml
file to specify URLs for the apptainer/docker containers --sdm apptainer.
Example Human
An example config file to create a human OBLX Library. Unless required, it is
recommended to use the default configuration (see
workflow/schemas/config.schema.yaml).
organism: human
release: "49"
genome_build: GRCh38
gnomad_release: 4.1
# adds the defined slop to the generic exome definition (+/- intron_slop bp of intronic sequence)
intron_slop: 20
# Parameter for STAR index creation. Adapt to the read size if necessary
star_sjdb_overhang: 100
star_genome_sa_index_n_bases: 14
# Remove variants from gnomad VCF with poplutation allele frequency <= this cutoff
minimum_allele_frequency: 0.001
gencode_url: "https://ftp.ebi.ac.uk/pub/databases/gencode"
ucsc_url: "https://hgdownload.soe.ucsc.edu/gbdb/hg38"
ucsc_golden_path_url: "https://hgdownload.soe.ucsc.edu/goldenPath"
gatk_url: "https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0"
gnomad_url: "https://storage.googleapis.com/gcp-public-data--gnomad/release"
chrom_filter:
- chr1
- chr2
- chr3
- chr4
- chr5
- chr6
- chr7
- chr8
- chr9
- chr10
- chr11
- chr12
- chr13
- chr14
- chr15
- chr16
- chr17
- chr18
- chr19
- chr20
- chr21
- chr22
- chrX
- chrY
Example Mouse
An example config file for mouse mode:
organism: mouse
release: "M36"
genome_build: GRCm39
# adds the defined slop to the generic exome definition (+/- intron_slop bp of intronic sequence)
intron_slop: 20
# Parameter for STAR index creation. Adapt to the read size if necessary
star_sjdb_overhang: 100
star_genome_sa_index_n_bases: 14