Bioinformatics
STAT115-2020
Chapter 3.2 Fastq and Fastqc

Chapter-3.2-FASTQ-and-FASTQC

Format

  1. Sequence ID (@)
  2. Sequence
  3. Quality ID
  4. Quality Score

Quality Score Phred Quality - ASCII of sequence quality + 33 - -10 log10Pr (bp is wrongly sequenced)

Why Quality Control

  • Sequencer output
    • Sequence "reads" + quality = FASTQ File
  • Is the quality of my sequence data OK?
  • What can i do if the quality is not good? - Read, sample, or run? Read: If the molecule are too close might give a diffrent color.
  • Problem : FASTQ are massive file! Files are so massive which requires quality control
  • Common tool: FASTQC. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (opens in a new tab)

FASTQC: Per Base Sequence Quality

Good QualityPoor Quality
ConsistentHigh Variance
High-Quality along the readQuality decrease with length
Smooth over lengthSequence-position bias
Fits with expectationDoes not fit with expectation

If colors in sample are overlapping with eachother then this is called badquality.