Supported bioinformatics pipelines
OBLX libraries can serve references for multiple bioinformatic workflows. Here, we show two examples:
nf-core/sarek
To run nf-core/sarek with an OBLX library, create a nextflow.config using an
utils script coming with OBLX that points Sarek to the required reference and
resource files in the OBLX library:
bash utils/write_sarek_config.sh </path/to/oblx/library> </path/to/output/nextflow.config>
And run nf-core/sarek with the previously generated nextflow.config file:
nextflow run nf-core/sarek -r 3.8.1 \
-c </path/to/output/nextflow.config> \
-profile singularity \
--input <samplesheet.csv> \
--tools mutect2,snpeff \
--only_paired_variant_calling \
--wes \
--intervals </path/to/oblx/library>/resources/exome_definition/ref_exome.bed \
--outdir </path/to/output/directory>
tronflows
Tronflows are a collection of standard bioinformatics workflows (https://github.com/TRON-Bioinformatics/tronflow).
tronflow-alignment
For
tronflow-alignment
the reference has to be specified via the --reference flag.
Example for bwa-mem2:
nextflow run tron-bioinformatics/tronflow-alignment \
-profile conda \
--input_files $input \
--output $output \
--algorithm mem2 \
--library paired \
--reference </path/to/oblx/library>/indices/bwa_mem2/ref_genome.fasta
Example for STAR:
nextflow run tron-bioinformatics/tronflow-alignment \
-profile conda \
--input_files $input \
--output $output \
--algorithm star \
--library paired \
--reference </path/to/oblx/library>/indices/star/
tronflow-bam-preprocessing
To run tronflow-bam-preprocessing with the OBLX Library, run the following.
nextflow run tron-bioinformatics/tronflow-bam-preprocessing
-profile conda \
--input_files $input \
--reference </path/to/oblx/library>/resources/ref_genome.fasta \
--dbsnp </path/to/oblx/library>/resources/germline_variants/dbSNP_151.vcf.gz \
--known_indels1 </path/to/oblx/library>/resources/gatk_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
--known_indels2 </path/to/oblx/library>/resources/gatk_bundle/Homo_sapiens_assembly38.known_indels.vcf.gz
--intervals </path/to/oblx/library>/resources/exome_definition/ref_exome.bed
Supported bioinformatics tools
The table below lists the bioinformatics tools that use files from the generated OBLX Library and which file each tool requires.
Tool versions: The pipeline does not pin downstream tool versions, but the indices are produced with specific tool versions. To guarantee compatibility, use the same tool versions for downstream analysis. The exact versions used to build each index are defined in the per-rule conda environments under
workflow/envs/and the corresponding apptainer/docker images inconfig/container_config.yaml.
| Tool Name | Link | Paths in Genome Library required to run the tool |
|---|---|---|
| Arriba | https://github.com/suhrig/arriba | • resources/ref_genome.fasta |
| • resources/ref_annot.gtf | ||
| • indices/star | ||
| bowtie2 | https://github.com/benlangmead/bowtie2 | • indices/bowtie2 |
| bwa-mem | https://github.com/lh3/bwa | • indices/bwa_mem/ |
| bwa-mem2 | https://github.com/bwa-mem2/bwa-mem2 | • indices/bwa_mem2/ |
| DeepVariant DNA | https://github.com/google/deepvariant | • resources/exome_definition/ref_exome.bed |
| • workflow/resources/GRCh38_pseudoautosomal_regions.bed (from OBLX workflow) | ||
| • resources/ref_genome.fasta | ||
| DeepVariant RNA | https://github.com/google/deepvariant | • resources/exome_definition/ref_cds.bed |
| • resources/ref_genome.fasta | ||
| featureCounts | https://doi.org/10.1093/bioinformatics/btt656 | • resources/exome_definition/ref_exome.bed.gz |
| FreeBayes | https://github.com/freebayes/freebayes | • resources/ref_genome.fasta |
| GATK ApplyBQSR | https://github.com/broadinstitute/gatk | • resources/ref_genome.fasta |
| • resources/ref_genome.dict | ||
| GATK BaseRecalibrator | https://github.com/broadinstitute/gatk | • resources/ref_genome.fasta |
| • resources/ref_genome.dict | ||
| • resources/germline_variants/dbSNP_151.vcf.gz | ||
| • resources/germline_variants/dbSNP_151.vcf.gz.tbi | ||
| • resources/gatk_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz | ||
| • resources/gatk_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi | ||
| • resources/gatk_bundle/Homo_sapiens_assembly38.known_indels.vcf.gz | ||
| • resources/gatk_bundle/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi | ||
| GATK GetPileupSummaries | https://github.com/broadinstitute/gatk | • resources/germline_variants/gnomAD/exomes/common_biallelic_chr1.vcf.gz |
| • resources/germline_variants/gnomAD/exomes/common_biallelic_chr1.vcf.gz.tbi | ||
| GATK HaplotypeCaller | https://github.com/broadinstitute/gatk | • resources/ref_genome.fasta |
| • resources/ref_genome.fasta.fai | ||
| • resources/germline_variants/dbSNP_151.vcf.gz | ||
| • resources/germline_variants/dbSNP_151.vcf.gz.tbi | ||
| • resources/exome_definition/ref_exome.bed | ||
| GATK Mutect2 | https://github.com/broadinstitute/gatk | • resources/ref_genome.fasta |
| • resources/ref_genome.fasta.fai | ||
| • resources/germline_variants/gnomAD/exomes/af_only_gnomad_hg38.vcf.gz | ||
| • resources/germline_variants/gnomAD/exomes/af_only_gnomad_hg38.vcf.gz.tbi | ||
| • resources/exome_definition/ref_exome.bed | ||
| hisat2 | https://github.com/daehwankimlab/hisat2 | • indices/hisat2 |
| Kallisto | https://github.com/pachterlab/kallisto | • indices/kallisto/ref_transcript.idx |
| minimap2 | https://github.com/lh3/minimap2 | • resources/ref_genome.fasta |
| qualimap | https://github.com/scchess/Qualimap | • resources/ref_annot.gtf |
| RSeQC | https://github.com/MonashBioinformaticsPlatform/RSeQC | • resources/chromosome_sizes.txt |
| • resources/ref_annot.bed | ||
| • resources/ref_annot.gtf | ||
| Salmon | https://github.com/COMBINE-lab/salmon | • indices/salmon/transcriptome_index/ |
| snpEff | https://github.com/pcingola/snpeff | • indices/snpeff/snpeff.config |
| splice2neo | https://github.com/TRON-Bioinformatics/splice2neo | • indices/R/ref_annot_txdb.sqlite |
| • indices/R/ref_genome.2bit | ||
| • indices/R/ref_cds.Rds | ||
| • indices/R/ref_transcript_ranges.Rds | ||
| • indices/R/ref_transcripts.Rds | ||
| STAR | https://github.com/alexdobin/STAR | • indices/star |
| Strelka2 | https://github.com/illumina/strelka | • resources/ref_genome.fasta |
| • resources/exome_definition/ref_exome.bed.gz | ||
| • resources/exome_definition/ref_exome.bed.gz.tbi | ||
| stringtie | https://github.com/gpertea/stringtie | • resources/ref_genome.fasta |
| • resources/ref_annot.gtf | ||
| tximport | https://github.com/thelovelab/tximport | • resources/ref_annot_transcript2gene.tsv |