Read FastQC data into R.

qc_read(file, modules = "all", verbose = TRUE)

Arguments

file

Path to the file to be imported. Can be the path to either :

  • the fastqc zipped file (e.g.: 'path/to/samplename_fastqc.zip'). No need to unzip,

  • or the unzipped folder name (e.g.: 'path/to/samplename_fastqc'),

  • or the sample name (e.g.: 'path/to/samplename' )

  • or the fastqc_data.txt file,

modules

Character vector containing the names of FastQC modules for which you want to import/inspect the data. Default is all. Allowed values include one or the combination of:

  • "Summary",

  • "Basic Statistics",

  • "Per base sequence quality",

  • "Per tile sequence quality",

  • "Per sequence quality scores",

  • "Per base sequence content",

  • "Per sequence GC content",

  • "Per base N content",

  • "Sequence Length Distribution",

  • "Sequence Duplication Levels",

  • "Overrepresented sequences",

  • "Adapter Content",

  • "Kmer Content"

Partial match of module names allowed. For example, you can use modules = "GC content", instead of the full names modules = "Per sequence GC content".

verbose

logical value. If TRUE, print filename when reading.

Value

Returns a list of tibbles containing the data for specified modules.

Examples

# Demo file qc.file <- system.file("fastqc_results", "S1_fastqc.zip", package = "fastqcr") qc.file
#> [1] "/Users/kassambara/Documents/R/MyPackages/fastqcr/inst/fastqc_results/S1_fastqc.zip"
# Read all modules qc_read(qc.file)
#> Reading: /Users/kassambara/Documents/R/MyPackages/fastqcr/inst/fastqc_results/S1_fastqc.zip
#> $summary #> # A tibble: 12 x 3 #> status module sample #> <chr> <chr> <chr> #>  1 PASS Basic Statistics S1.fastq #>  2 PASS Per base sequence quality S1.fastq #>  3 PASS Per tile sequence quality S1.fastq #>  4 PASS Per sequence quality scores S1.fastq #>  5 FAIL Per base sequence content S1.fastq #>  6 WARN Per sequence GC content S1.fastq #>  7 PASS Per base N content S1.fastq #>  8 WARN Sequence Length Distribution S1.fastq #>  9 PASS Sequence Duplication Levels S1.fastq #> 10 PASS Overrepresented sequences S1.fastq #> 11 PASS Adapter Content S1.fastq #> 12 PASS Kmer Content S1.fastq #> #> $basic_statistics #> # A tibble: 8 x 2 #> `Basic Statistics` pass #> <chr> <chr> #> 1 Measure Value #> 2 Filename S1.fastq #> 3 File type Conventional base calls #> 4 Encoding Sanger / Illumina 1.9 #> 5 Total Sequences 50299587 #> 6 Sequences flagged as poor quality 0 #> 7 Sequence length 35-76 #> 8 %GC 48 #> #> $per_base_sequence_quality #> # A tibble: 43 x 7 #> Base Mean Median `Lower Quartile` `Upper Quartile` `10th Percentil… #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> #>  1 1 31.2 32 32 32 32 #>  2 2 31.5 32 32 32 32 #>  3 3 31.7 32 32 32 32 #>  4 4 31.7 32 32 32 32 #>  5 5 31.7 32 32 32 32 #>  6 6 35.3 36 36 36 36 #>  7 7 35.3 36 36 36 36 #>  8 8 35.3 36 36 36 36 #>  9 9 35.3 36 36 36 36 #> 10 10-11 35.3 36 36 36 36 #> # ... with 33 more rows, and 1 more variable: `90th Percentile` <dbl> #> #> $per_tile_sequence_quality #> # A tibble: 18,576 x 3 #> Tile Base Mean #> <dbl> <chr> <dbl> #>  1 11101 1 0.175 #>  2 11101 2 0.0478 #>  3 11101 3 0.0668 #>  4 11101 4 0.0558 #>  5 11101 5 0.0485 #>  6 11101 6 0.0194 #>  7 11101 7 0.104 #>  8 11101 8 0.0629 #>  9 11101 9 0.103 #> 10 11101 10-11 0.0580 #> # ... with 18,566 more rows #> #> $per_sequence_quality_scores #> # A tibble: 34 x 2 #> Quality Count #> <dbl> <dbl> #>  1 2 75 #>  2 3 0 #>  3 4 0 #>  4 5 0 #>  5 6 0 #>  6 7 0 #>  7 8 0 #>  8 9 0 #>  9 10 0 #> 10 11 0 #> # ... with 24 more rows #> #> $per_base_sequence_content #> # A tibble: 43 x 5 #> Base G A T C #> <chr> <dbl> <dbl> <dbl> <dbl> #>  1 1 24.1 27.4 24.5 24.0 #>  2 2 23.5 27.2 25.5 23.8 #>  3 3 23.2 25.8 26.3 24.7 #>  4 4 23.5 25.9 26.2 24.3 #>  5 5 23.7 26.3 26.1 23.9 #>  6 6 24.3 25.4 25.6 24.7 #>  7 7 24.1 25.7 26.1 24.1 #>  8 8 23.5 25.8 26.2 24.5 #>  9 9 23.5 25.6 26.4 24.5 #> 10 10-11 23.6 25.9 26.4 24.1 #> # ... with 33 more rows #> #> $per_sequence_gc_content #> # A tibble: 101 x 2 #> `GC Content` Count #> <dbl> <dbl> #>  1 0 81 #>  2 1 44 #>  3 2 14 #>  4 3 39.5 #>  5 4 58 #>  6 5 78.5 #>  7 6 143 #>  8 7 264. #>  9 8 342. #> 10 9 428. #> # ... with 91 more rows #> #> $per_base_n_content #> # A tibble: 43 x 2 #> Base `N-Count` #> <chr> <dbl> #>  1 1 0.0634 #>  2 2 0.000310 #>  3 3 0.000270 #>  4 4 0.000153 #>  5 5 0.000149 #>  6 6 0.00938 #>  7 7 0.00256 #>  8 8 0.000260 #>  9 9 0.000282 #> 10 10-11 0.000604 #> # ... with 33 more rows #> #> $sequence_length_distribution #> # A tibble: 42 x 2 #> Length Count #> <dbl> <dbl> #>  1 35 1282 #>  2 36 144 #>  3 37 160 #>  4 38 172 #>  5 39 177 #>  6 40 164 #>  7 41 174 #>  8 42 183 #>  9 43 167 #> 10 44 198 #> # ... with 32 more rows #> #> $sequence_duplication_levels #> # A tibble: 16 x 3 #> `Duplication Level` `Percentage of deduplicated` `Percentage of total` #> <chr> <dbl> <dbl> #>  1 1 83.8 69.4 #>  2 2 12.7 21.1 #>  3 3 2.63 6.53 #>  4 4 0.591 1.96 #>  5 5 0.152 0.629 #>  6 6 0.0400 0.199 #>  7 7 0.0128 0.0744 #>  8 8 0.00532 0.0352 #>  9 9 0.00243 0.0181 #> 10 >10 0.00393 0.0415 #> 11 >50 0.0000298 0.00218 #> 12 >100 0.0000247 0.00524 #> 13 >500 0.00000300 0.00202 #> 14 >1k 0.00000266 0.00260 #> 15 >5k 0 0 #> 16 >10k+ 0.00000241 0.0565 #> #> $overrepresented_sequences #> # A tibble: 0 x 0 #> #> $adapter_content #> # A tibble: 64 x 5 #> Position `Illumina Unive… `Illumina Small… `Nextera Transp… `SOLID Small RN… #> <dbl> <dbl> <dbl> <dbl> <dbl> #>  1 1 0.00000994 0.00000199 0.0000139 0 #>  2 2 0.0000119 0.00000199 0.0000258 0.00000199 #>  3 3 0.0000179 0.00000398 0.0000358 0.00000596 #>  4 4 0.0000258 0.00000398 0.0000437 0.00000596 #>  5 5 0.0000338 0.00000398 0.0000497 0.00000596 #>  6 6 0.0000378 0.00000398 0.0000596 0.00000795 #>  7 7 0.0000437 0.00000398 0.0000676 0.00000795 #>  8 8 0.0000537 0.00000398 0.0000736 0.00000994 #>  9 9 0.0000557 0.00000398 0.0000835 0.00000994 #> 10 10 0.0000557 0.00000398 0.0000954 0.00000994 #> # ... with 54 more rows #> #> $kmer_content #> # A tibble: 0 x 0 #> #> $total_deduplicated_percentage #> [1] 82.76 #> #> attr(,"class") #> [1] "list" "qc_read"
# Read a specified module qc_read(qc.file,"Per base sequence quality")
#> Reading: /Users/kassambara/Documents/R/MyPackages/fastqcr/inst/fastqc_results/S1_fastqc.zip
#> $per_base_sequence_quality #> # A tibble: 43 x 7 #> Base Mean Median `Lower Quartile` `Upper Quartile` `10th Percentil… #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> #>  1 1 31.2 32 32 32 32 #>  2 2 31.5 32 32 32 32 #>  3 3 31.7 32 32 32 32 #>  4 4 31.7 32 32 32 32 #>  5 5 31.7 32 32 32 32 #>  6 6 35.3 36 36 36 36 #>  7 7 35.3 36 36 36 36 #>  8 8 35.3 36 36 36 36 #>  9 9 35.3 36 36 36 36 #> 10 10-11 35.3 36 36 36 36 #> # ... with 33 more rows, and 1 more variable: `90th Percentile` <dbl> #> #> attr(,"class") #> [1] "list" "qc_read"