select_greedy.Rd
Select mutations by maximal coverage
select_greedy(set_df, mut_col = "mut_id", sample_col = "patient_id")
a tibble with the selected mutations (sets), number and coverage of the mutations
mut_toy
#> # A tibble: 9 × 6
#> patient_id mut_id chr start end gene
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 p1 m01 1 1000 1000 g1
#> 2 p1 m02 1 2000 2000 g1
#> 3 p1 m03 2 3000 3000 g2
#> 4 p2 m01 1 1000 1000 g1
#> 5 p2 m04 2 4000 4000 g2
#> 6 p2 m05 3 5000 5000 g3
#> 7 p3 m02 1 2000 2000 g1
#> 8 p3 m04 2 4000 4000 g2
#> 9 p3 m06 2 6000 6000 g2
select_greedy(mut_toy)
#> 100% covered by 2 sets.
#> # A tibble: 2 × 8
#> mut_id n n_samples rank order n_cum coverage coverage_cum
#> <chr> <int> <int> <dbl> <int> <int> <dbl> <dbl>
#> 1 m01 2 3 1 1 2 66.7 66.7
#> 2 m02 1 3 2 2 3 33.3 100