Aggregate multiple FastQC reports into a data frame.

qc_aggregate(qc.dir = ".", progressbar = TRUE)

# S3 method for qc_aggregate
summary(object, ...)

qc_stats(object)

Arguments

qc.dir

path to the FastQC result directory to scan.

progressbar

logical value. If TRUE, shows a progress bar.

object

an object of class qc_aggregate.

...

other arguments.

Value

  • qc_aggregate() returns an object of class qc_aggregate which is a (tibble) data frame with the following column names:

    • sample: sample names

    • module: fastqc modules

    • status: fastqc module status for each sample

    • tot.seq: total sequences (i.e.: the number of reads)

    • seq.length: sequence length

    • pct.gc: % of GC content

    • pct.dup: % of duplicate reads

  • summary: Generates a summary of qc_aggregate. Returns a data frame with the following columns:

    • module: fastqc modules

    • nb_samples: the number of samples tested

    • nb_pass, nb_fail, nb_warn: the number of samples that passed, failed and warned, respectively.

    • failed, warned: the name of samples that failed and warned, respectively.

  • qc_stats: returns a data frame containing general statistics of fastqc reports. columns are: sample, pct.dup, pct.gc, tot.seq and seq.length.

Functions

  • qc_aggregate: Aggregate FastQC Reports for Multiple Samples

  • qc_stats: Creates general statistics of fastqc reports.

Examples

# Demo QC dir qc.dir <- system.file("fastqc_results", package = "fastqcr") qc.dir
#> [1] "/Users/kassambara/Documents/R/MyPackages/fastqcr/inst/fastqc_results"
# List of files in the directory list.files(qc.dir)
#> [1] "S1_fastqc.zip" "S2_fastqc.zip" "S3_fastqc.zip" "S4_fastqc.zip" #> [5] "S5_fastqc.zip"
# Aggregate the report qc <- qc_aggregate(qc.dir, progressbar = FALSE) qc
#> # A tibble: 60 x 7 #> sample module status tot.seq seq.length pct.gc pct.dup #> * <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> #>  1 S1 Basic Statistics PASS NA NA NA 17.2 #>  2 S1 Per base sequence quality PASS NA NA NA 17.2 #>  3 S1 Per tile sequence quality PASS NA NA NA 17.2 #>  4 S1 Per sequence quality scores PASS NA NA NA 17.2 #>  5 S1 Per base sequence content FAIL NA NA NA 17.2 #>  6 S1 Per sequence GC content WARN NA NA NA 17.2 #>  7 S1 Per base N content PASS NA NA NA 17.2 #>  8 S1 Sequence Length Distribution WARN NA NA NA 17.2 #>  9 S1 Sequence Duplication Levels PASS NA NA NA 17.2 #> 10 S1 Overrepresented sequences PASS NA NA NA 17.2 #> # ... with 50 more rows
# Generates a summary of qc_aggregate summary(qc)
#> # A tibble: 12 x 7 #> # Groups: module [?] #> module nb_samples nb_fail nb_pass nb_warn failed warned #> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> #>  1 Adapter Content 5 0 5 0 NA NA #>  2 Basic Statistics 5 0 5 0 NA NA #>  3 Kmer Content 5 0 5 0 NA NA #>  4 Overrepresented … 5 0 5 0 NA NA #>  5 Per base N conte… 5 0 5 0 NA NA #>  6 Per base sequenc… 5 5 0 0 S1, S2, S3… NA #>  7 Per base sequenc… 5 0 5 0 NA NA #>  8 Per sequence GC … 5 2 0 3 S3, S4 S1, S2, S5 #>  9 Per sequence qua… 5 0 5 0 NA NA #> 10 Per tile sequenc… 5 0 5 0 NA NA #> 11 Sequence Duplica… 5 0 5 0 NA NA #> 12 Sequence Length … 5 0 0 5 NA S1, S2, S3…
# General statistics of fastqc reports. qc_stats(qc)
#> # A tibble: 5 x 5 #> sample pct.dup pct.gc tot.seq seq.length #> <chr> <dbl> <dbl> <chr> <chr> #> 1 S1 17.2 NA NA NA #> 2 S2 15.7 NA NA NA #> 3 S3 22.1 NA NA NA #> 4 S4 19.9 NA NA NA #> 5 S5 18.2 NA NA NA