R/add_peptide.R
add_peptide.RdAnnotates splice junctions with resulting CDS and peptide sequence.
add_peptide(df, cds, flanking_size = 14, bsg = NULL, keep_ranges = FALSE)A data.frame with splice junctions in rows and at least the columns:
junc_id junction id consisting of genomic coordinates
tx_id the ID of the affected transcript/CDS (see add_tx)
as a named GRangesList of coding sequences (CDS) ranges
the number of wild-type amino acids flanking the junction or novel sequence caused by the splice junction to the left and to the right. Frame shift junctions are flanked by wild-type amino acids to the left and are annotated until the first stop codon.
BSgenome object such as
BSgenome.Hsapiens.UCSC.hg19
Should GRanges of CDS and modified CDS be
kept? If TRUE, the list columns cds_lst and cds_mod_lst are added to the output.
A data.frame with the same rows as the input df but with the
following additional column(s):
cds_mod_id An identifier made from tx_id and junc_id
junc_pos_cds the junction position in the modified CDS sequence
frame_shift Indicator whether junction leads to frame shift.
is_first_reading_frame Indicator whether modified CDS sequence is
translated into protein sequence using the 1st reading frame
(i.e. reading is starting from the first nucleotide) for in-frame peptides.
normalized_cds_junc_pos The normalized position of the junction in the
modified CDS sequence to the left junction side.
protein The full protein sequence of the translated modified CDS.
normalized_protein_junc_pos The normalized position of the junction in
the protein sequence to the left junction side.
peptide_context_junc_pos The junction position relative to the peptide_context sequence
junc_in_orf Indicator whether the junction is located in an open reading frame.
peptide_context_seq_raw The peptide sequence around the junction including stop codons.
peptide_context The peptide sequence around the junction truncated after stop codons.
truncated_cds Indicator whether the mutated gene product is a truncated
from of the WT gene product. If TRUE, peptide_context = NA.
cds_description Descriptor of of the mutated gene product. Can be one of
c("mutated cds", "truncated wt cds", "no mutated gene product", "no wt cds", "not in ORF")
If the keep_ranges is TRUE, the following additional columns are added to
the output data.frame:
requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)
bsg <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19
add_peptide(toy_junc_df, toy_cds, bsg = bsg, flanking_size = 13)
#> # A tibble: 17 × 18
#> junc_id tx_id protein protein_wt frame_shift cds_mod_id junc_pos_cds
#> <chr> <chr> <chr> <chr> <lgl> <chr> <int>
#> 1 chr2:152389996-… ENST… MADDED… MADDEDYEE… TRUE ENST00000… 16209
#> 2 chr2:152389996-… ENST… MADDED… MADDEDYEE… TRUE ENST00000… 16314
#> 3 chr2:152389955-… ENST… MADDED… MADDEDYEE… FALSE ENST00000… 21417
#> 4 chr2:152388410-… ENST… MADDED… MADDEDYEE… FALSE ENST00000… 16209
#> 5 chr2:152388410-… ENST… MADDED… MADDEDYEE… FALSE ENST00000… 16314
#> 6 chr2:179415981-… ENST… MTTQAP… MTTQAPTFT… FALSE ENST00000… 83566
#> 7 chr2:179415987-… ENST… MTTQAP… MTTQAPTFT… FALSE ENST00000… 83935
#> 8 chr2:179415000-… ENST… MTTQAP… MTTQAPTFT… FALSE ENST00000… 83566
#> 9 chr2:179445336-… ENST… MTTQAP… MTTQAPTFT… TRUE ENST00000… 59084
#> 10 chr2:179446225-… ENST… MTTQAP… MTTQAPTFT… TRUE ENST00000… 59065
#> 11 chr2:179445336-… ENST… MTTQAP… MTTQAPTFT… FALSE ENST00000… 58759
#> 12 chr2:179642044-… ENST… MTTQAP… MTTQAPTFT… TRUE ENST00000… 4605
#> 13 chr2:179642146-… ENST… MTTQAP… MTTQAPTFT… FALSE ENST00000… 4645
#> 14 chr2:179642044-… ENST… MTTQAP… MTTQAPTFT… FALSE ENST00000… 4480
#> 15 chr2:152226533-… ENST… NA NA NA NA NA
#> 16 chr2:152222731-… ENST… NA NA NA NA NA
#> 17 chr2:152388410-… ENST… MADDED… MADDEDYEE… FALSE ENST00000… 22962
#> # ℹ 11 more variables: protein_junc_pos <dbl>, normalized_cds_junc_pos <int>,
#> # normalized_protein_junc_pos_pre <dbl>, is_first_reading_frame <lgl>,
#> # normalized_protein_junc_pos <dbl>, junc_in_orf <lgl>, truncated_cds <lgl>,
#> # cds_description <chr>, peptide_context_seq_raw <chr>,
#> # peptide_context_junc_pos <dbl>, peptide_context <chr>