R/add_context_seq.R
add_context_seq.RdAnnotate splice junctions with resulting transcript sequence
add_context_seq(df, transcripts, size = 400, bsg = NULL, keep_ranges = FALSE)A data.frame with splice junctions in rows and at least the columns:
junc_id junction id consisting of genomic coordinates
tx_id the ID of the affected transcript (see add_tx)
as a named GRangesList of transcripts
the size of the output sequence around the junction position (might be shorter if transcripts is shorter)
BSgenome object such as
BSgenome.Hsapiens.UCSC.hg19
Should GRanges of transcripts and modified transcript be
kept? If TRUE, the list columns tx_lst and tx_mod_lst are added to the output.
A data.frame with the same rows as the input df but with the
following additional column(s):
tx_mod_id an identifier made from tx_id and junc_id
junc_pos_tx the junction position in the modified transcript sequence
cts_seq the context sequence
cts_junc_pos the junction position in the context sequence
cts_size the size of the context sequence
cts_id a unique id for the context sequence as hash value using the
XXH128 hash algorithm
If the keep_ranges is TRUE, the following additional columns are added to
the output data.frame:
requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)
bsg <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19
add_context_seq(toy_junc_df, toy_transcripts, size = 20, bsg = bsg)
#> # A tibble: 17 × 8
#> junc_id tx_id tx_mod_id junc_pos_tx cts_seq cts_junc_pos cts_size cts_id
#> <chr> <chr> <chr> <int> <chr> <chr> <int> <chr>
#> 1 chr2:152389… ENST… ENST0000… 16412 GATGAA… 10 20 83d60…
#> 2 chr2:152389… ENST… ENST0000… 16517 GATCAG… 10 20 563a4…
#> 3 chr2:152389… ENST… ENST0000… 21620 GTGGAG… 0,10,1555,1… 1565 73426…
#> 4 chr2:152388… ENST… ENST0000… 16412 GATGAA… 10 20 ca48b…
#> 5 chr2:152388… ENST… ENST0000… 16517 GATCAG… 10 20 aae48…
#> 6 chr2:179415… ENST… ENST0000… 83789 TCTCCA… 10 20 ea545…
#> 7 chr2:179415… ENST… ENST0000… 84158 TCTCCA… 0,10,379,389 389 e84b4…
#> 8 chr2:179415… ENST… ENST0000… 83789 TCTCCA… 10 20 a1ab2…
#> 9 chr2:179445… ENST… ENST0000… 59307 ACCAGG… 10 20 cfcaf…
#> 10 chr2:179446… ENST… ENST0000… 59288 GACATC… 0,10,899,909 909 af1aa…
#> 11 chr2:179445… ENST… ENST0000… 58982 GACCCT… 10 20 21e75…
#> 12 chr2:179642… ENST… ENST0000… 4828 CCAAAA… 10 20 f2ed0…
#> 13 chr2:179642… ENST… ENST0000… 4868 ACTGTG… 0,10,112,122 122 6896b…
#> 14 chr2:179642… ENST… ENST0000… 4703 TTTCAT… 10 20 4b22f…
#> 15 chr2:152226… ENST… ENST0000… 3878 AACCCA… 0,10,3812,3… 3822 5abc8…
#> 16 chr2:152222… ENST… ENST0000… 76 AACCCA… 0,10,3812,3… 3822 5abc8…
#> 17 chr2:152388… ENST… ENST0000… 23165 GTGGAG… 0,10,1555,1… 1565 73426…