Aggregate multiple FastQC reports into a data frame.
qc_aggregate(qc.dir = ".", progressbar = TRUE)
# S3 method for qc_aggregate
summary(object, ...)
qc_stats(object)
path to the FastQC result directory to scan.
logical value. If TRUE, shows a progress bar.
an object of class qc_aggregate.
other arguments.
qc_aggregate() returns an object of class qc_aggregate which is a (tibble) data frame with the following column names:
sample: sample names
module: fastqc modules
status: fastqc module status for each sample
tot.seq: total sequences (i.e.: the number of reads)
seq.length: sequence length
pct.gc: % of GC content
pct.dup: % of duplicate reads
summary: Generates a summary of qc_aggregate. Returns a data frame with the following columns:
module: fastqc modules
nb_samples: the number of samples tested
nb_pass, nb_fail, nb_warn: the number of samples that passed, failed and warned, respectively.
failed, warned: the name of samples that failed and warned, respectively.
qc_stats: returns a data frame containing general statistics of fastqc reports. columns are: sample, pct.dup, pct.gc, tot.seq and seq.length.
qc_aggregate()
: Aggregate FastQC Reports for Multiple Samples
qc_stats()
: Creates general statistics of fastqc reports.
# Demo QC dir
qc.dir <- system.file("fastqc_results", package = "fastqcr")
qc.dir
#> [1] "/private/var/folders/xm/8p6yj4bj6s57n4v_51714lwm0000gp/T/RtmpT6jSz8/temp_libpatha9b37e9f6eab/fastqcr/fastqc_results"
# List of files in the directory
list.files(qc.dir)
#> [1] "S1_fastqc.zip" "S2_fastqc.zip" "S3_fastqc.zip" "S4_fastqc.zip"
#> [5] "S5_fastqc.zip"
# Aggregate the report
qc <- qc_aggregate(qc.dir, progressbar = FALSE)
qc
#> # A tibble: 60 × 7
#> sample module status tot.seq seq.length pct.gc pct.dup
#> * <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 S1 Basic Statistics PASS 50299587 35-76 48 17.2
#> 2 S1 Per base sequence quality PASS 50299587 35-76 48 17.2
#> 3 S1 Per tile sequence quality PASS 50299587 35-76 48 17.2
#> 4 S1 Per sequence quality scores PASS 50299587 35-76 48 17.2
#> 5 S1 Per base sequence content FAIL 50299587 35-76 48 17.2
#> 6 S1 Per sequence GC content WARN 50299587 35-76 48 17.2
#> 7 S1 Per base N content PASS 50299587 35-76 48 17.2
#> 8 S1 Sequence Length Distribution WARN 50299587 35-76 48 17.2
#> 9 S1 Sequence Duplication Levels PASS 50299587 35-76 48 17.2
#> 10 S1 Overrepresented sequences PASS 50299587 35-76 48 17.2
#> # … with 50 more rows
# Generates a summary of qc_aggregate
summary(qc)
#> # A tibble: 12 × 7
#> # Groups: module [12]
#> module nb_samples nb_fail nb_pass nb_warn failed warned
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Adapter Content 5 0 5 0 NA NA
#> 2 Basic Statistics 5 0 5 0 NA NA
#> 3 Kmer Content 5 0 5 0 NA NA
#> 4 Overrepresented sequences 5 0 5 0 NA NA
#> 5 Per base N content 5 0 5 0 NA NA
#> 6 Per base sequence content 5 5 0 0 S1, S… NA
#> 7 Per base sequence quality 5 0 5 0 NA NA
#> 8 Per sequence GC content 5 2 0 3 S3, S4 S1, S…
#> 9 Per sequence quality scores 5 0 5 0 NA NA
#> 10 Per tile sequence quality 5 0 5 0 NA NA
#> 11 Sequence Duplication Levels 5 0 5 0 NA NA
#> 12 Sequence Length Distribution 5 0 0 5 NA S1, S…
# General statistics of fastqc reports.
qc_stats(qc)
#> # A tibble: 5 × 5
#> sample pct.dup pct.gc tot.seq seq.length
#> <chr> <dbl> <dbl> <chr> <chr>
#> 1 S1 17.2 48 50299587 35-76
#> 2 S2 15.7 48 50299587 35-76
#> 3 S3 22.1 49 67255341 35-76
#> 4 S4 19.9 49 67255341 35-76
#> 5 S5 18.2 48 65011962 35-76