CRISPRessoPooled Mixed Mode (Amplicons + Genome)

Mixed Mode Input

In this mode, the tool first aligns reads to the genome and, as in the Genome mode, discovers aligning regions with reads exceeding a tunable threshold. Next it will align the amplicon sequences to the reference genome and will use only the reads that match both the amplicon locations and the discovered genomic locations, excluding spurious reads coming from other regions, or reads not properly trimmed. Finally CRISPResso is run using each of the surviving regions.

To run the tool in this mode the user must provide:

  1. Paired-end reads (two files) or single-end reads (single file) in FASTQ format (fastq.gz files are also accepted)

  2. A description file containing the amplicon sequences used to enrich regions in the genome and some additional information (as described in the Amplicons mode section).

  3. The reference genome in bowtie2 format (as described in Genome mode section).

  4. Optionally the gene annotations from UCSC (as described in Genome mode section).

Mixed Mode Output

The output of CRISPRessoPooled Mixed Amplicons + Genome mode consists of these files:

  1. REPORT_READS_ALIGNED_TO_GENOME_AND_AMPLICONS.txt: this file contains the same information provided in the input description file, plus some additional columns:

    1. Amplicon_Specific_fastq.gz_filename: name of the file containing the raw reads recovered for the amplicon.

    2. n_reads: number of reads recovered for the amplicon.

    3. Gene_overlapping: gene/s overlapping the amplicon region.

    4. chr_id: chromosome of the amplicon in the reference genome.

    5. bpstart: start coordinate of the amplicon in the reference genome.

    6. bpend: end coordinate of the amplicon in the reference genome.

    7. Reference_Sequence: sequence in the reference genome for the region mapped for the amplicon.

  2. MAPPED_REGIONS (folder): this folder contains all the fastq.gz files for the discovered regions.

  3. A set of folders with the CRISPResso report on the amplicons with enough reads.

  4. SAMPLES_QUANTIFICATION_SUMMARY.txt: this file contains a summary of the quantification and the alignment statistics for each region analyzed (read counts and percentages for the various classes: Unmodified, NHEJ, point mutations, and HDR).

  5. CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.

The Mixed mode combines the benefits of the two previous running modes. In this mode it is possible to recover in an unbiased way all the genomic regions contained in the library, and hence discover contaminations or mapping artifacts. In addition, by knowing the location of the amplicon with respect to the reference genome, reads not properly trimmed or mapped to pseudogenes or other problematic regions will be automatically discarded, providing the cleanest set of reads to quantify the mutations in the target regions with CRISPResso.

If the focus of the analysis is to obtain the best quantification of editing efficiency for a set of amplicons, we suggest running the tool in the Mixed mode. The Genome mode is instead suggested to check problematic libraries, since a report is generated for each region discovered, even if the region is not mappable to any amplicon (however, his may be time consuming). Finally the Amplicons mode is the fastest, although the least reliable in terms of quantification accuracy.