R/add_peptide.R
add_peptide.Rd
Annotate splice junctions with resulting CDS and peptide sequence
add_peptide(df, cds, flanking_size = 14, bsg = NULL, keep_ranges = FALSE)
A data.frame with splice junctions in rows and at least the columns:
junc_id
junction id consisting of genomic coordinates
tx_id
the ID of the affected transcript/CDS (see add_tx
)
as a named GRangesList
of coding sequences (CDS) ranges
the number of wild-type amino acids flanking the junction or novel sequence caused by the splice junction to the left and to the right. Frame shift junctions are flanked by wild-type amino acids to the left and are annotated until the first stop codon.
BSgenome
object such as
BSgenome.Hsapiens.UCSC.hg19
Should GRanges of CDS and modified CDS be
kept? If TRUE, the list columns cds_lst
and cds_mod_lst
are added to the output.
A data.frame as the input with the additional column(s):
cds_mod_id
an identifier made from tx_id
and junc_id
junc_pos_cds
the junction position in the modified CDS sequence
frame_shift
Indicator whether junction leads to frame shift.
is_first_reading_frame
Indicator whether modified CDS sequence is translated into protein
sequence using the 1st reading frame (i.e. reading is starting from the first nucleotide) for in-frame peptides.
normalized_cds_junc_pos
The normalized position of the junction in the modified CDS sequence to the left junction side.
protein
the full protein sequence of the translated modified CDS.
normalized_protein_junc_pos
The normalized position of the junction in the protein
sequence to the left junction side.
peptide_context_junc_pos
The junction position relative to the peptide_context
sequence
junc_in_orf
Indicator whether the junction is located in an open reading frame.
peptide_context_seq_raw
the peptide sequence around the junction including stop codons.
peptide_context
the peptide sequence around the junction truncated after stop codons.
truncated_cds
Indicator whether the mutated gene product is a truncated from of the WT gene product. If TRUE, peptide_context
= NA.
cds_description
Descriptor of of the mutated gene product. Can be one of c("mutated cds", "truncated cds", "no mutated gene product", "no wt cds", "not in ORF")
If the keep_ranges
is TRUE, the following additional columns are added to
the output data.frame:
requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)
bsg <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19
add_peptide(toy_junc_df, toy_cds, bsg = bsg, flanking_size = 13)
#> # A tibble: 17 × 17
#> junc_id tx_id protein protein_wt frame_shift cds_mod_id junc_pos_cds
#> <chr> <chr> <chr> <chr> <lgl> <chr> <int>
#> 1 chr2:152389996-… ENST… MADDED… MADDEDYEE… TRUE ENST00000… 16209
#> 2 chr2:152389996-… ENST… MADDED… MADDEDYEE… TRUE ENST00000… 16314
#> 3 chr2:152389955-… ENST… MADDED… MADDEDYEE… FALSE ENST00000… 21417
#> 4 chr2:152388410-… ENST… MADDED… MADDEDYEE… FALSE ENST00000… 16209
#> 5 chr2:152388410-… ENST… MADDED… MADDEDYEE… FALSE ENST00000… 16314
#> 6 chr2:179415981-… ENST… MTTQAP… MTTQAPTFT… FALSE ENST00000… 83566
#> 7 chr2:179415987-… ENST… MTTQAP… MTTQAPTFT… FALSE ENST00000… 83935
#> 8 chr2:179415000-… ENST… MTTQAP… MTTQAPTFT… FALSE ENST00000… 83566
#> 9 chr2:179445336-… ENST… MTTQAP… MTTQAPTFT… TRUE ENST00000… 59084
#> 10 chr2:179446225-… ENST… MTTQAP… MTTQAPTFT… TRUE ENST00000… 59065
#> 11 chr2:179445336-… ENST… MTTQAP… MTTQAPTFT… FALSE ENST00000… 58759
#> 12 chr2:179642044-… ENST… MTTQAP… MTTQAPTFT… TRUE ENST00000… 4605
#> 13 chr2:179642146-… ENST… MTTQAP… MTTQAPTFT… FALSE ENST00000… 4645
#> 14 chr2:179642044-… ENST… MTTQAP… MTTQAPTFT… FALSE ENST00000… 4480
#> 15 chr2:152226533-… ENST… NA NA NA NA NA
#> 16 chr2:152222731-… ENST… NA NA NA NA NA
#> 17 chr2:152388410-… ENST… MADDED… MADDEDYEE… FALSE ENST00000… 22962
#> # ℹ 10 more variables: protein_junc_pos <dbl>, is_first_reading_frame <lgl>,
#> # normalized_cds_junc_pos <int>, normalized_protein_junc_pos <dbl>,
#> # junc_in_orf <lgl>, truncated_cds <lgl>, cds_description <chr>,
#> # peptide_context_seq_raw <chr>, peptide_context_junc_pos <dbl>,
#> # peptide_context <chr>