The coverage is a metric that represents how many base copies were generated to cover a specific DNA site. The average coverage can be measured on the scale of the genome by implementing tools like bamtools
, samtools
, bedtools
, and awk
.
The average coverage can also be measured on a specific number of genes that were targeted during a whole exome sequencing run. The difference between agilent.bed
file and ucsc.bed
is that the former was generated by Agilent and the latter by me. Agilent file contains little information about exons being sequenced by gene, by site, by DNA frame. Not supportive for more thorough analysis of variant recognition by gene. That’s why with data mining of UCSC databases I can add more informative features to the sites being sequenced. This would increase the amount of results, improve interpretability, and lastly validate the full price for sequencing a genome. Thus, taking full advantage of our data.
- Filtering parameters and thresholds
- mapq, mapping quality
- XM, how many mismatches in alignment
- XT, how many unique reads by site
- $8 or $11, mapping depth by nucleotide by site
- Maximum depth can also be used with
samtools -d
- Execessive mismatches can also be penalized with
bcftools mpileup -C