Select mutations by maximal coverage

select_greedy(set_df, mut_col = "mut_id", sample_col = "patient_id")

Arguments

set_df

A dataset of mutations (sets) in patients. Each line is an occurrence of a mutation in a patient.

mut_col

Character for column name of the mutation identifier (set)

sample_col

Character for column name of the sample identifier (elements in set)

Value

a tibble with the selected mutations (sets), number and coverage of the mutations

Examples

mut_toy
#> # A tibble: 9 × 6
#>   patient_id mut_id chr   start   end gene 
#>   <chr>      <chr>  <chr> <dbl> <dbl> <chr>
#> 1 p1         m01    1      1000  1000 g1   
#> 2 p1         m02    1      2000  2000 g1   
#> 3 p1         m03    2      3000  3000 g2   
#> 4 p2         m01    1      1000  1000 g1   
#> 5 p2         m04    2      4000  4000 g2   
#> 6 p2         m05    3      5000  5000 g3   
#> 7 p3         m02    1      2000  2000 g1   
#> 8 p3         m04    2      4000  4000 g2   
#> 9 p3         m06    2      6000  6000 g2   

select_greedy(mut_toy)
#> 100% covered by 2 sets.
#> # A tibble: 2 × 8
#>   mut_id     n n_samples  rank order n_cum coverage coverage_cum
#>   <chr>  <int>     <int> <dbl> <int> <int>    <dbl>        <dbl>
#> 1 m01        2         3     1     1     2     66.7         66.7
#> 2 m02        1         3     2     2     3     33.3        100