Chapter-3.2-FASTQ-and-FASTQC
Format
- Sequence ID (@)
- Sequence
- Quality ID
- Quality Score
Quality Score Phred Quality - ASCII of sequence quality + 33 - -10 log10Pr (bp is wrongly sequenced)
Why Quality Control
- Sequencer output
- Sequence "reads" + quality = FASTQ File
- Is the quality of my sequence data OK?
- What can i do if the quality is not good? - Read, sample, or run? Read: If the molecule are too close might give a diffrent color.
- Problem : FASTQ are massive file! Files are so massive which requires quality control
- Common tool: FASTQC. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (opens in a new tab)
FASTQC: Per Base Sequence Quality
Good Quality | Poor Quality |
---|---|
Consistent | High Variance |
High-Quality along the read | Quality decrease with length |
Smooth over length | Sequence-position bias |
Fits with expectation | Does not fit with expectation |
If colors in sample are overlapping with eachother then this is called badquality.