Build canonical junctions from transcripts
canonical_junctions(tx)
a GRangesList
of reference transcripts
a character vector of canonical splice junction ids
We build all canonical splice junctions that are in the annotated input transcripts. The following lists implementation rules for adjacent canonical exon-exon junction:
strand = +: \(e_i\), \(s_{i+1}\)
strand = -: \(e_{i+1}\), \(s_{i}\)
We also include canonical intron-retention junctions. These are 5' donor or 3' acceptor sites of canonical exon-exon junctions that are not used in all isoforms of the gene. They are located within an exon of other transcripts. Canonical intron-retention junctions are defined by the coordinate of the last exon base and the next base. Therefore, we just need to check whether both bases are included in a single exon.
gtf_file <- system.file("extdata","GTF_files","Aedes_aegypti.partial.gtf",
package="splice2neo")
tx <- parse_gtf(gtf_file)
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> OK
canonical_junctions(tx[1:10])
#> [1] "supercont1.1:35644-35699:+" "supercont1.1:35901-35993:+"
#> [3] "supercont1.1:36098-52193:+" "supercont1.1:52851-52973:+"
#> [5] "supercont1.1:67696-68065:+" "supercont1.1:68188-68248:+"
#> [7] "supercont1.1:68492-78190:+" "supercont1.1:78879-79230:+"
#> [9] "supercont1.1:79304-79389:+" "supercont1.1:86420-86471:+"
#> [11] "supercont1.1:86703-86765:+" "supercont1.1:87532-96341:+"
#> [13] "supercont1.1:229407-229997:+" "supercont1.1:322232-322301:+"
#> [15] "supercont1.1:323178-365519:+" "supercont1.1:366979-377341:+"
#> [17] "supercont1.1:378002-378174:+" "supercont1.1:379119-379216:+"
#> [19] "supercont1.1:380180-382693:+" "supercont1.1:384202-389885:+"
#> [21] "supercont1.1:453215-453294:+" "supercont1.1:454174-458284:+"
#> [23] "supercont1.1:458711-458777:+" "supercont1.1:529329-530742:+"
#> [25] "supercont1.1:560183-560246:+" "supercont1.1:560494-560556:+"