Sample: S1_fastqc.zip

Date: 2017-04-10
Sample data: /Users/kassambara/Documents/R/MyPackages/fastqcr/inst/fastqc_results/S1_fastqc.zip
R packages: Report generated with the R package fastqcr version 0.1.0
Experiment description: Sequencing data

Required R packages

library(fastqcr)
library(dplyr)

Reading the file

# Read all modules
qc <- qc_read(qc.path)
# Elements contained in the qc object
names(qc)

##  [1] "summary"                       "basic_statistics"              "per_base_sequence_quality"     "per_tile_sequence_quality"    
##  [5] "per_sequence_quality_scores"   "per_base_sequence_content"     "per_sequence_gc_content"       "per_base_n_content"           
##  [9] "sequence_length_distribution"  "sequence_duplication_levels"   "overrepresented_sequences"     "adapter_content"              
## [13] "kmer_content"                  "total_deduplicated_percentage"

Plotting and Interpreting

Summary

Summary shows a summary of the modules which were tested, and the status of the test resuls:

normal results (PASS),
slightly abnormal (WARN: warning)
or very unusual (FAIL: failure).

Some experiments may be expected to produce libraries which are biased in particular ways. You should treat the summary evaluations therefore as pointers to where you should concentrate your attention and understand why your library may not look normal.

qc_plot(qc, "summary")

Basic Statistics

Basic statistics shows basic data metrics such as:

Total sequences: the number of reads (total sequences),
Sequence length: the length of reads (minimum - maximum)
%GC: GC content

qc_plot(qc, "Basic statistics")

Per base sequence quality

qc_plot(qc, "Per base sequence quality")

Per sequence quality scores

qc_plot(qc, "Per sequence quality scores")

Per base sequence content

qc_plot(qc, "Per base sequence content")

Per sequence GC content

qc_plot(qc, "Per sequence GC content")

Per base N content

qc_plot(qc, "Per base N content")

Sequence length distribution

qc_plot(qc, "Sequence length distribution")

Sequence duplication levels

qc_plot(qc, "Sequence duplication levels")

Overrepresented sequences

qc_plot(qc, "Overrepresented sequences")

Adapter content

qc_plot(qc, "Adapter content")

Kmer content

qc_plot(qc, "Kmer content")

Useful Links

FastQC report for a good Illumina dataset
FastQC report for a bad Illumina dataset
Online documentation for each FastQC report