Annotate splice junctions with resulting transcript sequence

add_context_seq(df, transcripts, size = 400, bsg = NULL, keep_ranges = FALSE)

Arguments

df

A data.frame with splice junctions in rows and at least the columns:

  • junc_id junction id consisting of genomic coordinates

  • tx_id the ID of the affected transcript (see add_tx)

transcripts

as a named GRangesList of transcripts

size

the size of the output sequence around the junction position (might be shorter if transcripts is shorter)

bsg

BSgenome object such as BSgenome.Hsapiens.UCSC.hg19

keep_ranges

Should GRanges of transcripts and modified transcript be kept? If TRUE, the list columns tx_lst and tx_mod_lst are added to the output.

Value

A data.frame with the same rows as the input df but with the following additional column(s):

  • tx_mod_id an identifier made from tx_id and junc_id

  • junc_pos_tx the junction position in the modified transcript sequence

  • cts_seq the context sequence

  • cts_junc_pos the junction position in the context sequence

  • cts_size the size of the context sequence

  • cts_id a unique id for the context sequence as hash value using the XXH128 hash algorithm

If the keep_ranges is TRUE, the following additional columns are added to the output data.frame:

  • tx_lst a list of GRanges with the original transcript as provided in tx_id column and transcripts object..

  • tx_mod_lst a list of GRanges with the modified transcript (see modify_tx)

Examples


requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)
bsg <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19

add_context_seq(toy_junc_df, toy_transcripts, size = 20, bsg = bsg)
#> # A tibble: 17 × 8
#>    junc_id      tx_id tx_mod_id junc_pos_tx cts_seq cts_junc_pos cts_size cts_id
#>    <chr>        <chr> <chr>           <int> <chr>   <chr>           <int> <chr> 
#>  1 chr2:152389… ENST… ENST0000…       16412 GATGAA… 10                 20 83d60…
#>  2 chr2:152389… ENST… ENST0000…       16517 GATCAG… 10                 20 563a4…
#>  3 chr2:152389… ENST… ENST0000…       21620 GTGGAG… 0,10,1555,1…     1565 73426…
#>  4 chr2:152388… ENST… ENST0000…       16412 GATGAA… 10                 20 ca48b…
#>  5 chr2:152388… ENST… ENST0000…       16517 GATCAG… 10                 20 aae48…
#>  6 chr2:179415… ENST… ENST0000…       83789 TCTCCA… 10                 20 ea545…
#>  7 chr2:179415… ENST… ENST0000…       84158 TCTCCA… 0,10,379,389      389 e84b4…
#>  8 chr2:179415… ENST… ENST0000…       83789 TCTCCA… 10                 20 a1ab2…
#>  9 chr2:179445… ENST… ENST0000…       59307 ACCAGG… 10                 20 cfcaf…
#> 10 chr2:179446… ENST… ENST0000…       59288 GACATC… 0,10,899,909      909 af1aa…
#> 11 chr2:179445… ENST… ENST0000…       58982 GACCCT… 10                 20 21e75…
#> 12 chr2:179642… ENST… ENST0000…        4828 CCAAAA… 10                 20 f2ed0…
#> 13 chr2:179642… ENST… ENST0000…        4868 ACTGTG… 0,10,112,122      122 6896b…
#> 14 chr2:179642… ENST… ENST0000…        4703 TTTCAT… 10                 20 4b22f…
#> 15 chr2:152226… ENST… ENST0000…        3878 AACCCA… 0,10,3812,3…     3822 5abc8…
#> 16 chr2:152222… ENST… ENST0000…          76 AACCCA… 0,10,3812,3…     3822 5abc8…
#> 17 chr2:152388… ENST… ENST0000…       23165 GTGGAG… 0,10,1555,1…     1565 73426…