Aggregate FastQC Reports for Multiple Samples

Aggregate multiple FastQC reports into a data frame.

qc_aggregate(qc.dir = ".", progressbar = TRUE)

# S3 method for qc_aggregate
summary(object, ...)

qc_stats(object)

Arguments

qc.dir: path to the FastQC result directory to scan.
progressbar: logical value. If TRUE, shows a progress bar.
object: an object of class qc_aggregate.
...: other arguments.

Value

qc_aggregate() returns an object of class qc_aggregate which is a (tibble) data frame with the following column names:
- sample: sample names
- module: fastqc modules
- status: fastqc module status for each sample
- tot.seq: total sequences (i.e.: the number of reads)
- seq.length: sequence length
- pct.gc: % of GC content
- pct.dup: % of duplicate reads
summary: Generates a summary of qc_aggregate. Returns a data frame with the following columns:
- module: fastqc modules
- nb_samples: the number of samples tested
- nb_pass, nb_fail, nb_warn: the number of samples that passed, failed and warned, respectively.
- failed, warned: the name of samples that failed and warned, respectively.
qc_stats: returns a data frame containing general statistics of fastqc reports. columns are: sample, pct.dup, pct.gc, tot.seq and seq.length.

Functions

qc_aggregate(): Aggregate FastQC Reports for Multiple Samples
qc_stats(): Creates general statistics of fastqc reports.

Examples

# Demo QC dir
qc.dir <- system.file("fastqc_results", package = "fastqcr")
qc.dir
#> [1] "/private/var/folders/xm/8p6yj4bj6s57n4v_51714lwm0000gp/T/RtmpT6jSz8/temp_libpatha9b37e9f6eab/fastqcr/fastqc_results"

# List of files in the directory
list.files(qc.dir)
#> [1] "S1_fastqc.zip" "S2_fastqc.zip" "S3_fastqc.zip" "S4_fastqc.zip"
#> [5] "S5_fastqc.zip"

# Aggregate the report
qc <- qc_aggregate(qc.dir, progressbar = FALSE)
qc
#> # A tibble: 60 × 7
#>    sample module                       status tot.seq  seq.length pct.gc pct.dup
#>  * <chr>  <chr>                        <chr>  <chr>    <chr>       <dbl>   <dbl>
#>  1 S1     Basic Statistics             PASS   50299587 35-76          48    17.2
#>  2 S1     Per base sequence quality    PASS   50299587 35-76          48    17.2
#>  3 S1     Per tile sequence quality    PASS   50299587 35-76          48    17.2
#>  4 S1     Per sequence quality scores  PASS   50299587 35-76          48    17.2
#>  5 S1     Per base sequence content    FAIL   50299587 35-76          48    17.2
#>  6 S1     Per sequence GC content      WARN   50299587 35-76          48    17.2
#>  7 S1     Per base N content           PASS   50299587 35-76          48    17.2
#>  8 S1     Sequence Length Distribution WARN   50299587 35-76          48    17.2
#>  9 S1     Sequence Duplication Levels  PASS   50299587 35-76          48    17.2
#> 10 S1     Overrepresented sequences    PASS   50299587 35-76          48    17.2
#> # … with 50 more rows

# Generates a summary of qc_aggregate
summary(qc)
#> # A tibble: 12 × 7
#> # Groups:   module [12]
#>    module                       nb_samples nb_fail nb_pass nb_warn failed warned
#>    <chr>                             <dbl>   <dbl>   <dbl>   <dbl> <chr>  <chr> 
#>  1 Adapter Content                       5       0       5       0 NA     NA    
#>  2 Basic Statistics                      5       0       5       0 NA     NA    
#>  3 Kmer Content                          5       0       5       0 NA     NA    
#>  4 Overrepresented sequences             5       0       5       0 NA     NA    
#>  5 Per base N content                    5       0       5       0 NA     NA    
#>  6 Per base sequence content             5       5       0       0 S1, S… NA    
#>  7 Per base sequence quality             5       0       5       0 NA     NA    
#>  8 Per sequence GC content               5       2       0       3 S3, S4 S1, S…
#>  9 Per sequence quality scores           5       0       5       0 NA     NA    
#> 10 Per tile sequence quality             5       0       5       0 NA     NA    
#> 11 Sequence Duplication Levels           5       0       5       0 NA     NA    
#> 12 Sequence Length Distribution          5       0       0       5 NA     S1, S…

# General statistics of fastqc reports.
qc_stats(qc)
#> # A tibble: 5 × 5
#>   sample pct.dup pct.gc tot.seq  seq.length
#>   <chr>    <dbl>  <dbl> <chr>    <chr>     
#> 1 S1        17.2     48 50299587 35-76     
#> 2 S2        15.7     48 50299587 35-76     
#> 3 S3        22.1     49 67255341 35-76     
#> 4 S4        19.9     49 67255341 35-76     
#> 5 S5        18.2     48 65011962 35-76