CRISPRessoPooled Amplicons Mode
Amplicons Mode Input
Given a set of amplicon sequences, in this mode the tool demultiplexes the reads, aligning each read to the amplicon with best alignment, and creates separate compressed FASTQ files, one for each amplicon. Reads that do not align to any amplicon are discarded. After this preprocessing, CRISPResso is run for each FASTQ file, and separated reports are generated, one for each amplicon.
To run the tool in this mode the user must provide:
-
Paired-end reads (two files) or single-end reads (single file) in FASTQ format (fastq.gz files are also accepted)
-
A description file containing the amplicon sequences used to enrich regions in the genome and some additional information. In particular, this file, is a tab delimited text file with up to 14 columns (first 2 columns required):
-
AMPLICON_NAME: an identifier for the amplicon (must be unique).
-
AMPLICON_SEQUENCE: amplicon sequence used in the design of the experiment.
-
sgRNA_SEQUENCE (OPTIONAL): sgRNA sequence used for this amplicon without the PAM sequence. If not available, enter
NA
. -
EXPECTED_AMPLICON_AFTER_HDR (OPTIONAL): expected amplicon sequence in case of HDR. If more than one, separate by commas and not spaces. If not available, enter
NA
. -
CODING_SEQUENCE (OPTIONAL): Subsequence(s) of the amplicon corresponding to coding sequences. If more than one, separate by commas and not spaces. If not available, enter
NA
. -
PRIME_EDITING_PEGRNA_SPACER_SEQ (OPTIONAL): pegRNA spacer sgRNA sequence used in prime editing. The spacer should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the given sequence. If not available, enter
NA
. -
PRIME_EDITING_NICKING_GUIDE_SEQ (OPTIONAL): Nicking sgRNA sequence used in prime editing. The sgRNA should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the sequence. If not available, enter
NA
. -
PRIME_EDITING_PEGRNA_EXTENSION_SEQ (OPTIONAL): Extension sequence used in prime editing. The sequence should be given in the RNA 5'->3' order, such that the sequence starts with the RT template including the edit, followed by the Primer-binding site (PBS). If not available, enter
NA
. -
PRIME_EDITING_PEGRNA_SCAFFOLD_SEQ (OPTIONAL): If given, reads containing any of this scaffold sequence before extension sequence (provided by
--prime_editing_pegRNA_extension_seq
) will be classified as 'Scaffold-incorporated'. The sequence should be given in the 5'->3' order such that the RT template directly follows this sequence. A common value ends with 'GGCACCGAGUCGGUGC'. If not available, enterNA
. -
PRIME_EDITING_PEGRNA_SCAFFOLD_MIN_MATCH_LENGTH (OPTIONAL): Minimum number of bases matching scaffold sequence for the read to be counted as 'Scaffold-incorporated'. If the scaffold sequence matches the reference sequence at the incorporation site, the minimum number of bases to match will be minimally increased (beyond this parameter) to disambiguate between prime-edited and scaffold-incorporated sequences. If not available, enter
NA
. -
PRIME_EDITING_OVERRIDE_PRIME_EDITED_REF_SEQ (OPTIONAL):If given, this sequence will be used as the prime-edited reference sequence. This may be useful if the prime-edited reference sequence has large indels or the algorithm cannot otherwise infer the correct reference sequence. If not available, enter
NA
. -
QWC or QUANTIFICATION_WINDOW_COORDINATES (OPTIONAL): Bp positions in the amplicon sequence specifying the quantification window. Any indels/substitutions outside this window are excluded. Indexes are 0-based, meaning that the first nucleotide is position 0. Ranges are separated by the dash sign like "start-stop", and multiple ranges can be separated by the underscore (_). A value of 0 disables this filter. If not available, enter
NA
. -
W or QUANTIFICATION_WINDOW_SIZE (OPTIONAL): Defines the size (in bp) of the quantification window extending from the position specified by the
--cleavage_offset
or--quantification_window_center
parameter in relation to the provided guide RNA sequence(s) (--sgRNA
). Mutations within this number of bp from the quantification window center are used in classifying reads as modified or unmodified. A value of 0 disables this window and indels in the entire amplicon are considered. Default is 1, 1bp on each side of the cleavage position for a total length of 2bp. (default: 1) If not available, enterNA
. -
WC or QUANTIFICATION_WINDOW_CENTER (OPTIONAL): Center of quantification window to use within respect to the 3' end of the provided sgRNA sequence. Remember that the sgRNA sequence must be entered without the PAM. For cleaving nucleases, this is the predicted cleavage position. The default is -3 and is suitable for the Cas9 system. For alternate nucleases, other cleavage offsets may be appropriate, for example, if using Cpf1/Cas12a this parameter would be set to 1. For base editors, this could be set to -17. (default: -3) If not available, enter
NA
.
-
A file in the correct format should look like this:
Site1 CACACTGTGGCCCCTGTGCCCAGCCCTGGGCTCTCTGTACATGAAGCAAC CCCTGTGCCCAGCCC NA NA
Site2 GTCCTGGTTTTTGGTTTGGGAAATATAGTCATC NA GTCCTGGTTTTTGGTTTAAAAAAATATAGTCATC NA
Site3 TTTCTGGTTTTTGGTTTGGGAAATATAGTCATC NA NA GGAAATATA
The user can easily create this file with any text editor or with spreadsheet software like Excel (Microsoft), Numbers (Apple) or Sheets (Google Docs) and then save it as tab delimited file.
Amplicons Mode Output
The output of CRISPRessoPooled Amplicons mode consists of:
-
REPORT_READS_ALIGNED_TO_AMPLICONS.txt: this file contains the same information provided in the input description file, plus some additional columns:
-
Demultiplexed_fastq.gz_filename: name of the files containing the raw reads for each amplicon.
-
n_reads: number of reads recovered for each amplicon.
-
-
A set of fastq.gz files, one for each amplicon.
-
A set of folders, one for each amplicon, containing a full CRISPResso report.
-
SAMPLES_QUANTIFICATION_SUMMARY.txt: this file contains a summary of the quantification and the alignment statistics for each region analyzed (read counts and percentages for the various classes: Unmodified, NHEJ, point mutations, and HDR).
-
CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.