R/add_context_seq.R
add_context_seq.Rd
Annotate splice junctions with resulting transcript sequence
add_context_seq(df, transcripts, size = 400, bsg = NULL, keep_ranges = FALSE)
A data.frame with splice junctions in rows and at least the columns:
junc_id
junction id consisting of genomic coordinates
tx_id
the ID of the affected transcript (see add_tx
)
as a named GRangesList
of transcripts
the size of the output sequence around the junction position (might be shorter if transcripts is shorter)
BSgenome
object such as
BSgenome.Hsapiens.UCSC.hg19
Should GRanges of transcripts and modified transcript be
kept? If TRUE, the list columns tx_lst
and tx_mod_lst
are added to the output.
A data.frame with the same rows as the input df
but with the
following additional column(s):
tx_mod_id
an identifier made from tx_id
and junc_id
junc_pos_tx
the junction position in the modified transcript sequence
cts_seq
the context sequence
cts_junc_pos
the junction position in the context sequence
cts_size
the size of the context sequence
cts_id
a unique id for the context sequence as hash value using the
XXH128 hash algorithm
If the keep_ranges
is TRUE, the following additional columns are added to
the output data.frame:
requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)
bsg <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19
add_context_seq(toy_junc_df, toy_transcripts, size = 20, bsg = bsg)
#> # A tibble: 17 × 8
#> junc_id tx_id tx_mod_id junc_pos_tx cts_seq cts_junc_pos cts_size cts_id
#> <chr> <chr> <chr> <int> <chr> <chr> <int> <chr>
#> 1 chr2:152389… ENST… ENST0000… 16412 GATGAA… 10 20 83d60…
#> 2 chr2:152389… ENST… ENST0000… 16517 GATCAG… 10 20 563a4…
#> 3 chr2:152389… ENST… ENST0000… 21620 GTGGAG… 0,10,1555,1… 1565 73426…
#> 4 chr2:152388… ENST… ENST0000… 16412 GATGAA… 10 20 ca48b…
#> 5 chr2:152388… ENST… ENST0000… 16517 GATCAG… 10 20 aae48…
#> 6 chr2:179415… ENST… ENST0000… 83789 TCTCCA… 10 20 ea545…
#> 7 chr2:179415… ENST… ENST0000… 84158 TCTCCA… 0,10,379,389 389 e84b4…
#> 8 chr2:179415… ENST… ENST0000… 83789 TCTCCA… 10 20 a1ab2…
#> 9 chr2:179445… ENST… ENST0000… 59307 ACCAGG… 10 20 cfcaf…
#> 10 chr2:179446… ENST… ENST0000… 59288 GACATC… 0,10,899,909 909 af1aa…
#> 11 chr2:179445… ENST… ENST0000… 58982 GACCCT… 10 20 21e75…
#> 12 chr2:179642… ENST… ENST0000… 4828 CCAAAA… 10 20 f2ed0…
#> 13 chr2:179642… ENST… ENST0000… 4868 ACTGTG… 0,10,112,122 122 6896b…
#> 14 chr2:179642… ENST… ENST0000… 4703 TTTCAT… 10 20 4b22f…
#> 15 chr2:152226… ENST… ENST0000… 3878 AACCCA… 0,10,3812,3… 3822 5abc8…
#> 16 chr2:152222… ENST… ENST0000… 76 AACCCA… 0,10,3812,3… 3822 5abc8…
#> 17 chr2:152388… ENST… ENST0000… 23165 GTGGAG… 0,10,1555,1… 1565 73426…