Include/Exclude Files
Effect of excluderegions and include/exclude options
When calculating reported discrepancy rates (substitution and indel errors), binned quality score accuracy, and phasing statistics, only the regions that GQC writes to the file “includednonexcludedregions.<benchmarkname>.bed” are included. GQC creates this BED file by GQC by first creating an excluded regions BED file by merging:
Regions in the “excluderegions” BED file in GQC’s config file (these are typically the benchmark’s low-confidence regions)
Regions in a BED file passed to GQC with the –excludefile option
Benchmark regions not covered by regions in a BED file passed to GQC with the –includefile option
These excluded regions are then subtracted from the entire benchmark genome to obtain the “includednonexcluded” regions.
Running GQC for restricted parts of the genome benchmark
It is possible to calculate GQC statistics for particular types of sequence in the benchmark. Some examples are in the following table.
Sequence type |
Include file |
|---|---|
Gene sequence |
|
Segmental duplications |
|
Centromere sequence |
|
Human satellites (HSATs) |
|
Ribosomal DNA (rDNAs) |