CRISPRessoWGS Examples
Example:
Using Bioconda:
CRISPRessoWGS -b WGS/50/50_sorted_rmdup_fixed_groups.bam -f WGS_TEST.txt -r /GENOMES/mm9/mm9.fa --gene_annotations ensemble_mm9.txt.gz --name CRISPR_WGS_SRR1542350
Using Docker:
docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoWGS -b WGS/50/50_sorted_rmdup_fixed_groups.bam -f WGS_TEST.txt -r /GENOMES/mm9/mm9.fa --gene_annotations ensemble_mm9.txt.gz --name CRISPR_WGS_SRR1542350
The output from these files will consist of:
-
REPORT_READS_ALIGNED_TO_SELECTED_REGIONS_WGS.txt: this file contains the same information provided in the input description file, plus some additional columns:
a. sequence: sequence in the reference genome for the region specified.
b. gene_overlapping: gene/s overlapping the region specified.
c. n_reads: number of reads recovered for the region.
d. bam_file_with_reads_in_region: file containing only the subset of the reads that overlap, also partially, with the region. This file is indexed and can be easily loaded for example on IGV for visualization of single reads or for the comparison of two conditions. For example, in the figure below (fig X) we show reads mapped to a region inside the coding sequence of the gene Crygc subjected to NHEJ (CRISPR_WGS_SRR1542350) vs reads from a control experiment (CONTROL_WGS_SRR1542349).
e. fastq.gz_file_trimmed_reads_in_region: file containing only the subset of reads fully covering the specified regions, and trimmed to match the sequence in that region. These reads are used for the subsequent analysis with CRISPResso.
-
ANALYZED_REGIONS (folder): this folder contains all the BAM and FASTQ files, one for each region analyzed.
-
A set of folders with the CRISPResso report on the regions provided in input with enough reads (the default setting is to have at least 10 reads, but the parameter can be adjusted with the option
--min_reads_to_use_region).
-
CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.
This utility is particular useful to investigate and quantify mutation frequency in a list of potential target or off-target sites, coming for example from prediction tools, or from other orthogonal assays.