Aggregate multiple FastQC reports into a data frame.

qc_aggregate(qc.dir = ".", progressbar = TRUE)

# S3 method for qc_aggregate
summary(object, ...)

qc_stats(object)

Arguments

qc.dir

path to the FastQC result directory to scan.

progressbar

logical value. If TRUE, shows a progress bar.

object

an object of class qc_aggregate.

...

other arguments.

Value

  • qc_aggregate() returns an object of class qc_aggregate which is a (tibble) data frame with the following column names:

    • sample: sample names

    • module: fastqc modules

    • status: fastqc module status for each sample

    • tot.seq: total sequences (i.e.: the number of reads)

    • seq.length: sequence length

    • pct.gc: % of GC content

    • pct.dup: % of duplicate reads

  • summary: Generates a summary of qc_aggregate. Returns a data frame with the following columns:

    • module: fastqc modules

    • nb_samples: the number of samples tested

    • nb_pass, nb_fail, nb_warn: the number of samples that passed, failed and warned, respectively.

    • failed, warned: the name of samples that failed and warned, respectively.

  • qc_stats: returns a data frame containing general statistics of fastqc reports. columns are: sample, pct.dup, pct.gc, tot.seq and seq.length.

Functions

  • qc_aggregate: Aggregate FastQC Reports for Multiple Samples

  • qc_stats: Creates general statistics of fastqc reports.

Examples

# Demo QC dir qc.dir <- system.file("fastqc_results", package = "fastqcr") qc.dir
#> [1] "/Users/kassambara/Documents/R/MyPackages/fastqcr/inst/fastqc_results"
# List of files in the directory list.files(qc.dir)
#> [1] "S1_fastqc.zip" "S2_fastqc.zip" "S3_fastqc.zip" "S4_fastqc.zip" #> [5] "S5_fastqc.zip"
# Aggregate the report qc <- qc_aggregate(qc.dir, progressbar = FALSE) qc
#> # A tibble: 60 x 7 #> sample module status tot.seq seq.length pct.gc pct.dup #> * <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> #> 1 S1 Basic Statistics PASS 50299587 35-76 48 17.2 #> 2 S1 Per base sequence quality PASS 50299587 35-76 48 17.2 #> 3 S1 Per tile sequence quality PASS 50299587 35-76 48 17.2 #> 4 S1 Per sequence quality scores PASS 50299587 35-76 48 17.2 #> 5 S1 Per base sequence content FAIL 50299587 35-76 48 17.2 #> 6 S1 Per sequence GC content WARN 50299587 35-76 48 17.2 #> 7 S1 Per base N content PASS 50299587 35-76 48 17.2 #> 8 S1 Sequence Length Distribution WARN 50299587 35-76 48 17.2 #> 9 S1 Sequence Duplication Levels PASS 50299587 35-76 48 17.2 #> 10 S1 Overrepresented sequences PASS 50299587 35-76 48 17.2 #> # … with 50 more rows
# Generates a summary of qc_aggregate summary(qc)
#> # A tibble: 12 x 7 #> # Groups: module [?] #> module nb_samples nb_fail nb_pass nb_warn failed warned #> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> #> 1 Adapter Content 5 0 5 0 NA NA #> 2 Basic Statistics 5 0 5 0 NA NA #> 3 Kmer Content 5 0 5 0 NA NA #> 4 Overrepresented … 5 0 5 0 NA NA #> 5 Per base N conte… 5 0 5 0 NA NA #> 6 Per base sequenc… 5 5 0 0 S1, S2, S3,… NA #> 7 Per base sequenc… 5 0 5 0 NA NA #> 8 Per sequence GC … 5 2 0 3 S3, S4 S1, S2, S5 #> 9 Per sequence qua… 5 0 5 0 NA NA #> 10 Per tile sequenc… 5 0 5 0 NA NA #> 11 Sequence Duplica… 5 0 5 0 NA NA #> 12 Sequence Length … 5 0 0 5 NA S1, S2, S3…
# General statistics of fastqc reports. qc_stats(qc)
#> # A tibble: 5 x 5 #> sample pct.dup pct.gc tot.seq seq.length #> <chr> <dbl> <dbl> <chr> <chr> #> 1 S1 17.2 48 50299587 35-76 #> 2 S2 15.7 48 50299587 35-76 #> 3 S3 22.1 49 67255341 35-76 #> 4 S4 19.9 49 67255341 35-76 #> 5 S5 18.2 48 65011962 35-76