About

CRISPResso is a software pipeline designed to enable rapid and intuitive interpretation of genome editing experiments. A limited web implementation is available at: http://crispresso2.pinellolab.org/ or http://crispresso.com.

Briefly, CRISPResso:

  • Aligns sequencing reads to a reference sequence
  • Quantifies insertions, mutations and deletions to determine whether a read is modified or unmodified by genome editing
  • Summarizes editing results in intuitive plots and datasets

Tools

CRISPResso is a suite of complementary tools:

  • CRISPResso - for analyzing and interpreting single experimental conditions on a single amplicon
  • CRISPRessoBatch - for analyzing and comparing multiple experimental conditions at the same site
  • CRISPRessoPooled - for analyzing multiple amplicons from a pooled amplicon sequencing experiment
  • CRISPRessoWGS - for analyzing specific sites in whole-genome sequencing samples
  • CRISPRessoCompare - for comparing editing between two samples (e.g., treated vs control)
  • CRISPRessoAggregate - for aggregating results from previously-run CRISPResso analyses

How can you use CRISPResso?

CRISPResso can be used to analyze genome editing outcomes using cleaving nucleases (e.g. Cas9 or Cpf1) or noncleaving nucleases (e.g. base editors). The following operations can be automatically performed:

  • Filtering of low-quality reads
  • Adapter trimming
  • Alignment of reads to one or multiple reference sequences (in the case of multiple alleles)
  • Quantification of HDR and NHEJ outcomes (if the HDR sequence is provided)
  • Quantification frameshift/inframe mutations and identification affected splice sites (if an exon sequence is provided)
  • Visualization of the indel distribution and position (for cleaving nucleases)
  • Visualization of distribution and position of substitutions (for base editors)
  • Visualization of alleles and their frequencies

CRISPResso processing

CRISPResso Schematic

Quality filtering

Input reads are first filtered based on the quality score (phred33) in order to remove potentially false positive indels. The filtering based on the phred33 quality score can be modulated by adjusting the optimal parameters (see additional notes below).

Adapter trimming

Next, adapters are trimmed from the reads. If no adapter are present, select 'No Trimming' under the 'Trimming adapter' heading in the optional parameters. If reads contain adapter sequences that need to be trimmed, select the adapters used for trimming under the ‘Trimming adapter’ heading in the optional parameters. Possible adapters include Nextera PE, TruSeq3 PE, TruSeq3 SE, TruSeq2 PE, and TruSeq2 SE. The adapters are trimmed from the reads using fastp.

Read merging

If paired-end reads are provided, reads are merged using fastp. This produces a single read for alignment to the amplicon sequence, and reduces sequencing errors that may be present at the end of sequencing reads.

Alignment

The preprocessed reads are then aligned to the reference sequence with a global sequence alignment algorithm that takes into account our biological knowledge of nuclease function. If multiple alleles are present at the editing site, each allele can be passed to CRISPResso and sequenced reads will be assigned to the reference sequence or origin.

Visualization and analysis

Finally, after analyzing the aligned reads, a set of informative graphs are generated, allowing for the quantification and visualization of the position and type of outcomes within the amplicon sequence.

How is CRISPResso2 different from CRISPResso?

CRISPResso2 introduces four key innovations for the analysis of genome editing data:

  1. Comprehensive analysis of sequencing data from base editors. We have added additional analysis and visualization capabilities especially for experiments using base editors.
  2. Allele specific quantification of heterozygous references. If the targeted editing region has more than one allele, reads arising from each allele can be deconvoluted.
  3. A novel biologically-informed alignment algorithm. This algorithm incorporates knowledge about the mutations produced by gene editing tools to create more biologically-likely alignments.
  4. Ultra-fast processing time.

Installation

CRISPResso can be installed using the conda package manager Bioconda, or it can be run using the Docker containerization system.

Bioconda

To install CRISPResso using Bioconda, download and install Anaconda Python, following the instructions at: https://docs.anaconda.com/free/anaconda/install/.

Open a terminal and type:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

To install CRISPResso into the current conda environment, type:

conda install crispresso2

Alternatively, to create a new environment named crispresso2_env with CRISPResso, type:

conda create -n crispresso2_env -c bioconda crispresso2

Activate your conda environment:

conda activate crispresso2_env

Verify that CRISPResso is installed using the command:

CRISPResso -h

Bioconda for Apple Silicon

If you would like to install CRISPResso using bioconda on a Mac with Apple silicon (aren't sure?), then there is a slight change you need to make. First, ensure that you have Rosetta installed. Next, you must tell bioconda to install the Intel versions of the packages. If you would like to do this system wide, which we recommend, run the command:

conda config --add subdirs osx-64

Then you can proceed with the installation instructions above.

If you would like to use the Intel versions in a single environment, then run:

CONDA_SUBDIR=osx-64 conda create -n crispresso2_env -c bioconda crispresso2

If you choose to use the CONDA_SUBDIR=osx-64 method, note that if you install additional packages into the environment you will need to add the CONDA_SUBDIR=osx-64 to the beginning of each command. Alternatively, you could set this environment variable in your shell, but we recommend to use the conda config --add subdirs osx-64 method because it is less error prone.

Docker

CRISPResso can be used via the Docker containerization system. This system allows CRISPResso to run on your system without configuring and installing additional packages. To run CRISPResso, first download and install docker: https://docs.docker.com/engine/installation/.

Next, Docker must be configured to access your hard drive and to run with sufficient memory. These parameters can be found in the Docker settings menu. To allow Docker to access your hard drive, select 'Shared Drives' and make sure your drive name is selected. To adjust the memory allocation, select the 'Advanced' tab and allocate at least 4G of memory.

To run CRISPResso, make sure Docker is running, then open a command prompt (Mac) or Powershell (Windows). Change directories to the location where your data is, and run the following command:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso -h

The first time you run this command, it will download the Docker image. The -v parameter mounts the current directory to be accessible by CRISPResso, and the -w parameter sets the CRISPResso working directory. As long as you are running the command from the directory containing your data, you should not change the Docker -v or -w parameters.

Additional parameters for CRISPResso as described below can be added to this command. For example,

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso -r1 sample.fastq.gz -a ATTAACCAAG

CRISPResso

CRISPResso is designed be run on a single amplicon. For experiments involving multiple amplicons in the same fastq, see the instructions for CRISPRessoPooled or CRISPRessoWGS.

CRISPResso requires only two parameters: input sequences in the form of fastq files (given by the --fastq_r1 and --fastq_r2) parameters, and the amplicon sequence to align to (given by the --amplicon_seq parameter).

CRISPResso Parameters

CRISPResso Examples

CRISPResso Output

Guardrails

CRISPResso‑‑fastq_r1 ‑‑fastq_r2 ‑‑amplicon_seq ‑‑amplicon_name ‑‑amplicon_min_alignment_score ‑‑default_min_aln_score ‑‑expand_ambiguous_alignments ‑‑assign_ambiguous_alignments_to_first_reference ‑‑guide_seq ‑‑guide_name ‑‑flexiguide_seq ‑‑flexiguide_homology ‑‑flexiguide_name ‑‑flexiguide_gap_open_penalty ‑‑flexiguide_gap_extend_penalty ‑‑discard_guide_positions_overhanging_amplicon_edge ‑‑expected_hdr_amplicon_seq ‑‑coding_seq ‑‑config_file ‑‑min_average_read_quality ‑‑min_single_bp_quality ‑‑min_bp_quality_or_N ‑‑file_prefix ‑‑name ‑‑suppress_amplicon_name_truncation ‑‑output_folder ‑‑verbosity ‑‑split_interleaved_input ‑‑trim_sequences ‑‑trimmomatic_command ‑‑trimmomatic_options_string ‑‑flash_command ‑‑fastp_command ‑‑fastp_options_string ‑‑min_paired_end_reads_overlap ‑‑max_paired_end_reads_overlap ‑‑samtools_exclude_flags_core ‑‑stringent_flash_merging ‑‑quantification_window_size ‑‑quantification_window_center ‑‑exclude_bp_from_left ‑‑exclude_bp_from_right ‑‑use_legacy_insertion_quantification ‑‑ignore_substitutions ‑‑ignore_insertions ‑‑ignore_deletions ‑‑discard_indel_reads ‑‑needleman_wunsch_gap_open ‑‑needleman_wunsch_gap_extend ‑‑needleman_wunsch_gap_incentive ‑‑needleman_wunsch_aln_matrix_loc ‑‑plot_histogram_outliers ‑‑plot_window_size ‑‑min_frequency_alleles_around_cut_to_plot ‑‑expand_allele_plots_by_quantification ‑‑allele_plot_pcts_only_for_assigned_reference ‑‑quantification_window_coordinates ‑‑annotate_wildtype_allele ‑‑keep_intermediate ‑‑dump ‑‑write_detailed_allele_table ‑‑fastq_output ‑‑bam_output ‑‑bowtie2_index ‑‑zip_output ‑‑max_rows_alleles_around_cut_to_plot ‑‑suppress_report ‑‑place_report_in_output_folder ‑‑suppress_plots ‑‑base_editor_output ‑‑conversion_nuc_from ‑‑conversion_nuc_to ‑‑prime_editing_pegRNA_spacer_seq ‑‑prime_editing_pegRNA_extension_seq ‑‑prime_editing_pegRNA_extension_quantification_window_size ‑‑prime_editing_pegRNA_scaffold_seq ‑‑prime_editing_pegRNA_scaffold_min_match_length ‑‑prime_editing_nicking_guide_seq ‑‑prime_editing_override_prime_edited_ref_seq ‑‑prime_editing_override_sequence_checks ‑‑crispresso1_mode ‑‑dsODN ‑‑auto ‑‑debug ‑‑no_rerun ‑‑n_processes ‑‑bam_input ‑‑bam_chr_loc ‑‑disable_guardrails ‑‑use_matplotlib ‑‑halt_on_plot_fail

CRISPResso Parameters

Fastq R1

-r1, --fastq_r1

Help: First fastq file

Type: str


Fastq R2

-r2, --fastq_r2

Help: Second fastq file for paired end reads

Type: str


Amplicon Sequence

-a, --amplicon_seq

Help: Amplicon Sequence (can be comma-separated list of multiple sequences)

Type: str


Amplicon Name

-an, --amplicon_name

Help: Amplicon Name (can be comma-separated list of multiple names, corresponding to amplicon sequences given in --amplicon_seq

Type: str

Default: Reference


Amplicon Min Alignment Score

-amas, --amplicon_min_alignment_score

Help: Amplicon Minimum Alignment Score; score between 0 and 100; sequences must have at least this homology score with the amplicon to be aligned (can be comma-separated list of multiple scores, corresponding to amplicon sequences given in --amplicon_seq)

Type: str


Default Minimum Alignment Score

--default_min_aln_score, --min_identity_score

Help: Default minimum homology score for a read to align to a reference amplicon

Type: int

Default: 60


Expand Ambiguous Alignments

--expand_ambiguous_alignments

Help: If more than one reference amplicon is given, reads that align to multiple reference amplicons will count equally toward each amplicon. Default behavior is to exclude ambiguous alignments.

Type: bool

Default: False


Assign Ambiguous Alignments To First Reference

--assign_ambiguous_alignments_to_first_reference

Help: If more than one reference amplicon is given, ambiguous reads that align with the same score to multiple amplicons will be assigned to the first amplicon. Default behavior is to exclude ambiguous alignments.

Type: bool

Default: False


Guide Seq

-g, --guide_seq, --sgRNA

Help: sgRNA sequence, if more than one, please separate by commas. Note that the sgRNA needs to be input as the guide RNA sequence (usually 20 nt) immediately adjacent to but not including the PAM sequence (5' of NGG for SpCas9). If the PAM is found on the opposite strand with respect to the Amplicon Sequence, ensure the sgRNA sequence is also found on the opposite strand. The CRISPResso convention is to depict the expected cleavage position using the value of the parameter '--quantification_window_center' nucleotides from the 3' end of the guide. In addition, the use of alternate nucleases besides SpCas9 is supported. For example, if using the Cpf1 system, enter the sequence (usually 20 nt) immediately 3' of the PAM sequence and explicitly set the '--cleavage_offset' parameter to 1, since the default setting of -3 is suitable only for SpCas9.

Type: str


Guide Name

-gn, --guide_name

Help: sgRNA names, if more than one, please separate by commas.

Type: str


Flexiguide Seq

-fg, --flexiguide_seq

Help: sgRNA sequence (flexible) (can be comma-separated list of multiple flexiguides). The flexiguide sequence will be aligned to the amplicon sequence(s), as long as the guide sequence has homology as set by --flexiguide_homology.

Type: str

Default: None


Flexiguide Homology

-fh, --flexiguide_homology

Help: flexiguides will yield guides in amplicons with at least this homology to the flexiguide sequence.

Type: int

Default: 80


Flexiguide Name

-fgn, --flexiguide_name

Help: flexiguide name

Type: str


Flexiguide Gap Open Penalty

--flexiguide_gap_open_penalty

Help:

Type: int

Default: -20


Flexiguide Gap Extend Penalty

--flexiguide_gap_extend_penalty

Help:

Type: int

Default: -2


Discard Guide Positions Overhanging Amplicon Edge

--discard_guide_positions_overhanging_amplicon_edge

Help: If set, for guides that align to multiple positions, guide positions will be discarded if plotting around those regions would included bp that extend beyond the end of the amplicon.

Type: bool

Default: False


Expected HDR Amplicon Sequence

-e, --expected_hdr_amplicon_seq

Help: Amplicon sequence expected after HDR

Type: str


Exon Specification Coding Sequence/s

-c, --coding_seq

Help: Subsequence/s of the amplicon sequence covering one or more coding sequences for frameshift analysis. If more than one (for example, split by intron/s), please separate by commas.

Type: str


Config File

--config_file

Help: File path to JSON file with config elements

Type: str

Default: None


Minimum Average Read Quality (phred33 Scale)

-q, --min_average_read_quality

Help: Minimum average quality score (phred33) to keep a read

Type: int


Minimum Single bp Quality (phred33 Scale)

-s, --min_single_bp_quality

Help: Minimum single bp score (phred33) to keep a read

Type: int


Minimum bp Quality or N (phred33 Scale)

--min_bp_quality_or_N

Help: Bases with a quality score (phred33) less than this value will be set to 'N'

Type: int


File Prefix

--file_prefix

Help: File prefix for output plots and tables

Type: str


Sample Name

-n, --name

Help: Output name of the report (default: the name is obtained from the filename of the fastq file/s used in input)

Type: str


Suppress Amplicon Name Truncation

--suppress_amplicon_name_truncation

Help: If set, amplicon names will not be truncated when creating output filename prefixes. If not set, amplicon names longer than 21 characters will be truncated when creating filename prefixes.

Type: bool

Default: False


Output Folder

-o, --output_folder

Help: Output folder to use for the analysis (default: current folder)

Type: str


Verbosity

-v, --verbosity

Help: Verbosity level of output to the console (1-4) 4 is the most verbose

Type: int

Default: 3


Split Interleaved Input

--split_interleaved_input, --split_paired_end

Help: Splits a single fastq file containing paired end reads into two files before running CRISPResso

Type: bool

Default: False


Trimming Adapter

--trim_sequences

Help: Enable the trimming with fastp

Type: bool

Default: False


Trimmomatic Command

--trimmomatic_command

Help: DEPRECATED in v2.3.0, use --fastp_command

Type: str

Default: None


Trimmomatic Options String

--trimmomatic_options_string

Help: DEPRECATED in v2.3.0, use --fastp_options_string

Type: str


Flash Command

--flash_command

Help: DEPRECATED in v2.3.0, use --fastp_command

Type: str

Default: None


Fastp Command

--fastp_command

Help: Command to run fastp

Type: str

Default: fastp


Fastp Options String

--fastp_options_string

Help: Override options for fastp, e.g. --length_required 70 --umi

Type: str


Min Paired End Reads Overlap

--min_paired_end_reads_overlap

Help: Parameter for the fastp read merging step. Minimum required overlap length between two reads to provide a confident overlap

Type: int

Default: 10


Max Paired End Reads Overlap

--max_paired_end_reads_overlap

Help: DEPRECATED in v2.3.0

Type: str

Default: None


Samtools Exclude Flags Core

--samtools_exclude_flags

Help: Exclude reads with any of the specified flags set in the SAM/BAM file. Flags can be specified in either base 16 (hex) or base 10. Default is 0 (no reads filtered).

Type: str

Default: 0


Stringent Flash Merging

--stringent_flash_merging

Help: DEPRECATED in v2.3.0

Type: bool

Default: False


Quantification Window Size

-w, --quantification_window_size, --window_around_sgrna

Help: Defines the size (in bp) of the quantification window extending from the position specified by the '--cleavage_offset' or '--quantification_window_center' parameter in relation to the provided guide RNA sequence(s) (--sgRNA). Mutations within this number of bp from the quantification window center are used in classifying reads as modified or unmodified. A value of 0 disables this window and indels in the entire amplicon are considered. Default is 1, 1bp on each side of the cleavage position for a total length of 2bp. Multiple quantification window sizes (corresponding to each guide specified by --guide_seq) can be specified with a comma-separated list.

Type: str

Default: 1


Quantification Window Center

-wc, --quantification_window_center, --cleavage_offset

Help: Center of quantification window to use within respect to the 3' end of the provided sgRNA sequence. Remember that the sgRNA sequence must be entered without the PAM. For cleaving nucleases, this is the predicted cleavage position. The default is -3 and is suitable for the Cas9 system. For alternate nucleases, other cleavage offsets may be appropriate, for example, if using Cpf1 this parameter would be set to 1. For base editors, this could be set to -17 to only include mutations near the 5' end of the sgRNA. Multiple quantification window centers (corresponding to each guide specified by --guide_seq) can be specified with a comma-separated list.

Type: str

Default: -3


Exclude bp From Left

--exclude_bp_from_left

Help: Exclude bp from the left side of the amplicon sequence for the quantification of the indels

Type: int

Default: 15


Exclude bp From Right

--exclude_bp_from_right

Help: Exclude bp from the right side of the amplicon sequence for the quantification of the indels

Type: int

Default: 15


Use Legacy Insertion Quantification

--use_legacy_insertion_quantification

Help: If set, the legacy insertion quantification method will be used (i.e. with a 1bp quantification window, indels at the cut site and 1bp away from the cut site would be quantified). By default (if this parameter is not set) with a 1bp quantification window, only insertions at the cut site will be quantified.

Type: bool

Default: False


Ignore Substitutions

--ignore_substitutions

Help: Ignore substitutions events for the quantification and visualization

Type: bool

Default: False


Ignore Insertions

--ignore_insertions

Help: Ignore insertions events for the quantification and visualization

Type: bool

Default: False


Ignore Deletions

--ignore_deletions

Help: Ignore deletions events for the quantification and visualization

Type: bool

Default: False


Discard Indel Reads

--discard_indel_reads

Help: Discard reads with indels in the quantification window from analysis

Type: bool

Default: False


Needleman Wunsch Gap Open

--needleman_wunsch_gap_open

Help: Gap open option for Needleman-Wunsch alignment

Type: int

Default: -20


Needleman Wunsch Gap Extend

--needleman_wunsch_gap_extend

Help: Gap extend option for Needleman-Wunsch alignment

Type: int

Default: -2


Needleman Wunsch Gap Incentive

--needleman_wunsch_gap_incentive

Help: Gap incentive value for inserting indels at cut sites

Type: int

Default: 1


Needleman Wunsch Alignment Matrix Location

--needleman_wunsch_aln_matrix_loc

Help: Location of the matrix specifying substitution scores in the NCBI format (see ftp://ftp.ncbi.nih.gov/blast/matrices/)

Type: str

Default: EDNAFULL


Plot Histogram Outliers

--plot_histogram_outliers

Help: If set, all values will be shown on histograms. By default (if unset), histogram ranges are limited to plotting data within the 99 percentile.

Type: bool

Default: False


Plot Window Size

--plot_window_size, --offset_around_cut_to_plot

Help: Defines the size of the window extending from the quantification window center to plot. Nucleotides within plot_window_size of the quantification_window_center for each guide are plotted.

Type: int

Default: 20


Min Frequency Alleles Around Cut To Plot

--min_frequency_alleles_around_cut_to_plot

Help: Minimum % reads required to report an allele in the alleles table plot.

Type: float

Default: 0.2


Expand Allele Plots By Quantification

--expand_allele_plots_by_quantification

Help: If set, alleles with different modifications in the quantification window (but not necessarily in the plotting window (e.g. for another sgRNA)) are plotted on separate lines, even though they may have the same apparent sequence. To force the allele plot and the allele table to be the same, set this parameter. If unset, all alleles with the same sequence will be collapsed into one row.

Type: bool

Default: False


Allele Plot Percentages Only for Assigned Reference

--allele_plot_pcts_only_for_assigned_reference

Help: If set, in the allele plots, the percentages will show the percentage as a percent of reads aligned to the assigned reference. Default behavior is to show percentage as a percent of all reads.

Type: bool

Default: False


Quantification Window Coordinates

-qwc, --quantification_window_coordinates

Help: Bp positions in the amplicon sequence specifying the quantification window. This parameter overrides values of the '--quantification_window_center', '--cleavage_offset', '--window_around_sgrna' or '--window_around_sgrna' values. Any indels/substitutions outside this window are excluded. Indexes are 0-based, meaning that the first nucleotide is position 0. Ranges are separted by the dash sign (e.g. 'start-stop'), and multiple ranges can be separated by the underscore (_) (can be comma-separated list of values, corresponding to amplicon sequences given in --amplicon_seq e.g. 5-10,5-10_20-30 would specify the 6th-11th bp in the first reference and the 6th-11th and 21st-31st bp in the second reference). A value of 0 disables this filter for a particular amplicon (e.g. 0,90-110 This would disable the quantification window for the first amplicon and specify the quantification window of 90-110 for the second).Note that if there are multiple amplicons provided, and only one quantification window coordinate is provided, the same quantification window will be used for all amplicons and be adjusted to account for insertions/deletions.(default: None)

Type: str


Annotate Wildtype Allele

--annotate_wildtype_allele

Help: Wildtype alleles in the allele table plots will be marked with this string (e.g. **).

Type: str


Keep Intermediate

--keep_intermediate

Help: Keep all the intermediate files

Type: bool

Default: False


Dump

--dump

Help: Dump numpy arrays and pandas dataframes to file for debugging purposes

Type: bool

Default: False


Write Detailed Allele Table

--write_detailed_allele_table

Help: If set, a detailed allele table will be written with the following columns:

  • #Reads: the number of reads this allele represents.
  • Aligned_Sequence: the alignment of the read sequence.
  • Reference_Sequence: the alignment of the amplicon sequence.
  • n_inserted: the number of insertions within the quantification window.
  • n_deleted: the number of deletions within the quantification window.
  • n_mutated: the number of substitutions within the quantification window.
  • Reference_Name: the amplicon name to which this allele is assigned.
  • Read_Status: the bin to which this allele is classified.
  • Aligned_Reference_Names: if there are multiple amplicons, this lists the amplicon names. The order corresponds to the alignment scores in Aligned_Reference_Scores.
  • Aligned_Reference_Scores: the alignment score (out of 100) for each amplicon.
  • ref_positions: this represents the indices in the Aligned_Sequence that map back to the original sequence. Negative values represent places that don't map back to the original reference.
  • all_insertion_positions: all of the indices where there is an insertion regardless of the quantification window.
  • all_insertion_left_positions: for all insertions, the left most index (e.g. where each insertion starts).
  • insertion_positions: the insertion positions within the quantification window.
  • insertion_coordinates: the start and end indices of the insertions within the quantificaiton window.
  • insertion_sizes: the size of each insertion within the quantification window.
  • all_deletion_positions: all of the indices where there is a deletion regardless of the quantification window.
  • deletion_positions: the indices where there is a deletion within the quantification window.
  • deletion_coordinates: the start and end indices of the deletions within the quantification window.
  • deletion_sizes: the size of the deletions within the quantification window.
  • all_substitution_positions: all of the indices where there is a substitution.
  • substitution_positions: the indices where there is a substitution within the quantification window.
  • substitution_values: the nucleotide to which it is substituted within the quantification window.
  • %Reads: the percentage of read this allele represents.

Type: bool

Default: False


Fastq Output

--fastq_output

Help: If set, a fastq file with annotations for each read will be produced.

Type: bool

Default: False


Bam Output

--bam_output

Help: If set, a bam file with alignments for each read will be produced.

Type: bool

Default: False


Bowtie2 Index

-x, --bowtie2_index

Help: Basename of Bowtie2 index for the reference genome

Type: str


Zip Output

--zip_output

Help: If set, the output will be placed in a zip folder.

Type: bool

Default: False


Max Rows Alleles Around Cut To Plot

--max_rows_alleles_around_cut_to_plot

Help: Maximum number of rows to report in the alleles table plot.

Type: int

Default: 50


Suppress Report

--suppress_report

Help: Suppress output report

Type: bool

Default: False


Place Report In Output Folder

--place_report_in_output_folder

Help: If true, report will be written inside the CRISPResso output folder. By default, the report will be written one directory up from the report output.

Type: bool

Default: False


Suppress Plots

--suppress_plots

Help: Suppress output plots

Type: bool

Default: False


Base Editor Output

--base_editor_output

Help: Outputs plots and tables to aid in analysis of base editor studies.

Type: bool

Default: False


Conversion Nuc From

--conversion_nuc_from

Help: For base editor plots, this is the nucleotide targeted by the base editor

Type: str

Default: C


Conversion Nuc To

--conversion_nuc_to

Help: For base editor plots, this is the nucleotide produced by the base editor

Type: str

Default: T


Prime Editing Spacer Sequence

--prime_editing_pegRNA_spacer_seq

Help: pegRNA spacer sgRNA sequence used in prime editing. The spacer should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the given sequence.

Type: str


Prime Editing Extension Sequence

--prime_editing_pegRNA_extension_seq

Help: Extension sequence used in prime editing. The sequence should be given in the RNA 5'->3' order, such that the sequence starts with the RT template including the edit, followed by the Primer-binding site (PBS).

Type: str


Prime Editing pegRNA Extension Quantification Window Size

--prime_editing_pegRNA_extension_quantification_window_size

Help: Quantification window size (in bp) at flap site for measuring modifications anchored at the right side of the extension sequence. Similar to the --quantification_window parameter, the total length of the quantification window will be 2x this parameter. Default: 5bp (10bp total window size)

Type: int

Default: 5


Prime Editing pegRNA Scaffold Sequence

--prime_editing_pegRNA_scaffold_seq

Help: If given, reads containing any of this scaffold sequence before extension sequence (provided by --prime_editing_extension_seq) will be classified as 'Scaffold-incorporated'. The sequence should be given in the 5'->3' order such that the RT template directly follows this sequence. A common value is 'GGCACCGAGUCGGUGC'.

Type: str


Prime Editing pegRNA Scaffold Min Match Length

--prime_editing_pegRNA_scaffold_min_match_length

Help: Minimum number of bases matching scaffold sequence for the read to be counted as 'Scaffold-incorporated'. If the scaffold sequence matches the reference sequence at the incorporation site, the minimum number of bases to match will be minimally increased (beyond this parameter) to disambiguate between prime-edited and scaffold-incorporated sequences.

Type: int

Default: 1


Prime Editing Nicking Guide Sequence

--prime_editing_nicking_guide_seq

Help: Nicking sgRNA sequence used in prime editing. The sgRNA should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the sequence

Type: str


Prime Editing Override Prime Edited Reference Sequence

--prime_editing_override_prime_edited_ref_seq

Help: If given, this sequence will be used as the prime-edited reference sequence. This may be useful if the prime-edited reference sequence has large indels or the algorithm cannot otherwise infer the correct reference sequence.

Type: str


Prime Editing Override Sequence Checks

--prime_editing_override_sequence_checks

Help: If set, checks to assert that the prime editing guides and extension sequence are in the proper orientation are not performed. This may be useful if the checks are failing inappropriately, but the user is confident that the sequences are correct.

Type: bool

Default: False


CRISPResso 1 Mode

--crispresso1_mode

Help: Parameter usage as in CRISPResso 1

Type: bool

Default: False


dsODN

--dsODN

Help: Label reads with the dsODN sequence provided

Type: str


Auto

--auto

Help: Infer amplicon sequence from most common reads

Type: bool

Default: False


Debug

--debug

Help: Show debug messages

Type: bool

Default: False


No Rerun

--no_rerun

Help: Don't rerun CRISPResso2 if a run using the same parameters has already been finished.

Type: bool

Default: False


Number of Processes

-p, --n_processes

Help: Specify the number of processes to use for analysis. Please use with caution since increasing this parameter will significantly increase the memory required to run CRISPResso. Can be set to 'max'.

Type: str

Default: 1


Bam Input

--bam_input

Help: Aligned reads for processing in bam format

Type: str


BAM Chromosome Location

--bam_chr_loc

Help: Chromosome location in bam for reads to process. For example: 'chr1:50-100' or 'chrX'.

Type: str


Disable Guardrails

--disable_guardrails

Help: Disable guardrail warnings

Type: bool

Default: False


Use Matplotlib

--use_matplotlib

Help: Use matplotlib for plotting instead of plotly/d3 when CRISPRessoPro is installed

Type: bool

Default: False


Halt On Plot Fail

--halt_on_plot_fail

Help: Halt execution if a plot fails to generate

Type: bool

Default: False


CRISPResso Examples

Example run: Non-homologous end joining (NHEJ)

Download the test datasets nhej.r1.fastq.gz and nhej.r2.fastq.gz to your current directory. This is the first 25,000 sequences from a paired-end sequencing experiment. To analyze this experiment, run the command:

Using Bioconda:

CRISPResso --fastq_r1 nhej.r1.fastq.gz --fastq_r2 nhej.r2.fastq.gz --amplicon_seq AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT -n nhej

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso --fastq_r1 nhej.r1.fastq.gz --fastq_r2 nhej.r2.fastq.gz --amplicon_seq AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT -n nhej

This should produce a folder called 'CRISPResso_on_nhej'. Open the file called CRISPResso_on_nhej/CRISPResso2_report.html in a web browser, and you should see an output like this: CRISPResso2_report.html.

Example run: Multiple alleles

Download the test dataset allele_specific.fastq.gz to your current directory. This is the first 25,000 sequences from a editing experiment targeting one allele. To analyze this experiment, run the following command:

Using Bioconda:

CRISPResso --fastq_r1 allele_specific.fastq.gz --amplicon_seq CGAGAGCCGCAGCCATGAACGGCACAGAGGGCCCCAATTTTTATGTGCCCTTCTCCAACGTCACAGGCGTGGTGCGGAGCCACTTCGAGCAGCCGCAGTACTACCTGGCGGAACCATGGCAGTTCTCCATGCTGGCAGCGTACATGTTCCTGCTCATCGTGCTGGG,CGAGAGCCGCAGCCATGAACGGCACAGAGGGCCCCAATTTTTATGTGCCCTTCTCCAACGTCACAGGCGTGGTGCGGAGCCCCTTCGAGCAGCCGCAGTACTACCTGGCGGAACCATGGCAGTTCTCCATGCTGGCAGCGTACATGTTCCTGCTCATCGTGCTGGG --amplicon_name P23H,WT --guide_seq GTGCGGAGCCACTTCGAGCAGC

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso --fastq_r1 allele_specific.fastq.gz --amplicon_seq CGAGAGCCGCAGCCATGAACGGCACAGAGGGCCCCAATTTTTATGTGCCCTTCTCCAACGTCACAGGCGTGGTGCGGAGCCACTTCGAGCAGCCGCAGTACTACCTGGCGGAACCATGGCAGTTCTCCATGCTGGCAGCGTACATGTTCCTGCTCATCGTGCTGGG,CGAGAGCCGCAGCCATGAACGGCACAGAGGGCCCCAATTTTTATGTGCCCTTCTCCAACGTCACAGGCGTGGTGCGGAGCCCCTTCGAGCAGCCGCAGTACTACCTGGCGGAACCATGGCAGTTCTCCATGCTGGCAGCGTACATGTTCCTGCTCATCGTGCTGGG --amplicon_name P23H,WT --guide_seq GTGCGGAGCCACTTCGAGCAGC

This should produce a folder called 'CRISPResso_on_allele_specific'. Open the file called CRISPResso_on_allele_specific/CRISPResso2_report.html in a web browser, and you should see an output like this: CRISPResso2_report.html.

Example run: Base editing experiment

Download the test dataset base_editor.fastq.gz to your current directory. This is the first 25,000 sequences from an editing experiment performed at the EMX1 locus. To analyze this experiment, run the following command:

Using Bioconda:

CRISPResso --fastq_r1 base_editor.fastq.gz --amplicon_seq GGCCCCAGTGGCTGCTCTGGGGGCCTCCTGAGTTTCTCATCTGTGCCCCTCCCTCCCTGGCCCAGGTGAAGGTGTGGTTCCAGAACCGGAGGACAAAGTACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAAGAAGGGCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACCTCCAATGACTAGGGTGG --guide_seq GAGTCCGAGCAGAAGAAGAA --quantification_window_size 10 --quantification_window_center -10 --base_editor_output

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso --fastq_r1 base_editor.fastq.gz --amplicon_seq GGCCCCAGTGGCTGCTCTGGGGGCCTCCTGAGTTTCTCATCTGTGCCCCTCCCTCCCTGGCCCAGGTGAAGGTGTGGTTCCAGAACCGGAGGACAAAGTACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAAGAAGGGCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACCTCCAATGACTAGGGTGG --guide_seq GAGTCCGAGCAGAAGAAGAA --quantification_window_size 10 --quantification_window_center -10 --base_editor_output

This should produce a folder called 'CRISPResso_on_base_editor'. Open the file called CRISPResso_on_base_editor/CRISPResso2_report.html in a web browser, and you should see an output like this: CRISPResso2_report.html.

CRISPResso output

The output of CRISPResso2 consists of a set of informative graphs that allow for the quantification and visualization of the position and type of outcomes within an amplicon sequence.

Data file descriptions

CRISPResso2_report.html is a summary report that can be viewed in a web browser containing all of the output plots and summary statistics.

Alleles_frequency_table.zip can be unzipped to a tab-separated text file that shows all reads and alignments to references. The first column shows the aligned sequence of the sequenced read. The second column shows the aligned sequence of the reference sequence. Gaps in each of these columns represent insertions and deletions. The next column 'Reference_Name' shows the name of the reference that the read aligned to. The fourth column, 'Read_Status' shows whether the read was modified or unmodified. The fifth through seventh columns ('n_deleted', 'n_inserted', 'n_substituted') show the number of bases deleted, inserted, and substituted as compared to the reference sequence. The eighth column shows the number of reads having that sequence, and the ninth column shows the percentage of all reads having that sequence.

CRISPResso_mapping_statistics.txt is a tab-delimited text file showing the number of reads in the input ('READS IN INPUTS') the number of reads after filtering, trimming and merging (READS AFTER PREPROCESSING), the number of reads aligned (READS ALIGNED) and the number of reads for which the alignment had to be computed vs read from cache.

CRISPResso_quantification_of_editing_frequency.txt is a tab-delimited text file showing the number of reads aligning to each reference amplicon, as well as the status (modified/unmodified, number of insertions, deletions, and/or substitutions) of those reads.

CRISPResso_RUNNING_LOG.txt is a text file and shows a log of the CRISPResso run.

CRISPResso2_info.json can be read by other CRISPResso tools and contains information about the run and results.

The remainder of the files are produced for each amplicon, and each file is prefixed by the name of the amplicon if more than one amplicon is given.

Alleles_frequency_table_around_sgRNA_NNNNN.txt is a tab-separated text file that shows alleles and alignments to the specified reference for a subsequence around the sgRNA (here, shown by 'NNNNN'). This data report is produced for each amplicon when a guide is found in the amplicon sequence. A report is generated for each guide. The number of nucleotides shown in this report can be modified by changing the --plot_window_size parameter.

Substitution_frequency_table_around_sgRNA_NNNNN.txt is a tab-separated text file that shows the frequency of substitutions in the amplicon sequence around the sgRNA (here, shown by 'NNNNN'). The first row shows the reference sequence. The following rows show the number of substitutions to each base. For example, the first numeric value in the second row (marked ‘A’) shows the number of bases that have a substitution resulting in an A at the first basepair of the amplicon sequence. The number of unmodified bases at each position is now shown in this table (because they aren’t substitutions). Thus, if the first basepair of the amplicon sequence is an A, the first value in the first row will show 0. A report is generated for each guide. The number of nucleotides shown in this report can be modified by changing the --plot_window_size parameter.

Substitution_frequency_table.txt is a tab-separated text file that shows the frequency of substitutions in the amplicon sequence across the entire amplicon. The first row shows the reference sequence. The following rows show the number of substitutions to each base. For example, the first numeric value in the second row (marked ‘A’) shows the number of bases that have a substitution resulting in an A at the first basepair of the amplicon sequence. The number of unmodified bases at each position is now shown in this table (because they aren’t substitutions). Thus, if the first basepair of the AMPLICON sequence is an A, the first value in the first row will show 0.

Insertion_histogram.txt is a tab-separated text file that shows a histogram of the insertion sizes in the amplicon sequence in the quantification window. Insertions outside of the quantification window are not included. The ins_size column shows the insertion length, and the fq column shows the number of reads having that insertion size.

Deletion_histogram.txt is a tab-separated text file that shows a histogram of the deletion sizes in the amplicon sequence in the quantification window. Deletions outside of the quantification window are not included. The del_size column shows length of the deletion, and the fq column shows the number of reads having that number of substitutions.

Substitution_histogram.txt is a tab-separated text file that shows a histogram of the number of substitutions in the amplicon sequence in the quantification window. Substitutions outside of the quantification window are not included. The sub_count column shows the number of substitutions, and the fq column shows the number of reads having that number of substitutions.

Effect_vector_insertion.txt is a tab-separated text file with a one-row header that shows the percentage of reads with an insertion at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a insertion at that location.

Effect_vector_deletion.txt is a tab-separated text file with a one-row header that shows the percentage of reads with a deletion at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a deletion at that location.

Effect_vector_substitution.txt is a tab-separated text file with a one-row header that shows the percentage of reads with a substitution at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a substitution at that location.

Effect_vector_combined.txt is a tab-separated text file with a one-row header that shows the percentage of reads with any modification (insertion, deletion, or substitution) at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a modification at that location.

Modification_count_vectors.txt is a tab-separated file showing the number of modifications for each position in the amplicon. The first row shows the amplicon sequence, and successive rows show the number of reads with insertions (row 2), insertions_left (row 3), deletions (row 4), substitutions (row 5) and the sum of all modifications (row 6). Additionally, the last row shows the number of reads aligned.

If an insertion occurs between bases 5 and 6, the insertions vector will be incremented at bases 5 and 6. However, the insertions_left vector will only be incremented at base 5 so the sum of the insertions_left row represents an accurate count of the number of insertions, whereas the sum of the insertions row will yield twice the number of insertions.

Quantification_window_modification_count_vectors.txt is a tab-separated file showing the number of modifications for positions in the quantification window of the amplicon. The first row shows the amplicon sequence in the quantification window, and successive rows show the number of reads with insertions (row 2), insertions_left (row 3), deletions (row 4), substitutions (row 5) and the sum of all modifications (row 6). Additionally, the last row shows the number of reads aligned.

Nucleotide_frequency_table.txt is a tab-separated file showing the number of each residue at each position in the amplicon. The first row shows the amplicon sequence, and successive rows show the number of reads with an A (row 2), C (row 3), G (row 4), T (row 5), N (row 6), or a deletion (-) (row 7) at each position.

Quantification_window_nucleotide_frequency_table.txt is a tab-separated file showing the number of each residue at positions in the quantification window of the amplicon. The first row shows the amplicon sequence in the quantification window, and successive rows show the number of reads with an A (row 2), C (row 3), G (row 4), T (row 5), N (row 6), or a deletion (-) (row 7) at each position.

Nucleotide_percentage_table.txt is a tab-separated file showing the percentage of each residue at each position in the amplicon. The first row shows the amplicon sequence, and successive rows show the percentage of reads with an A (row 2), C (row 3), G (row 4), T (row 5), N (row 6), or a deletion (-) (row 7) at each position.

Quantification_window_nucleotide_percentage_table.txt is a tab-separated file showing the percentage of each residue at positions in the quantification window of the amplicon. The first row shows the amplicon sequence in the quantification window, and successive rows show the percentage of reads with an A (row 2), C (row 3), G (row 4), T (row 5), N (row 6), or a deletion (-) (row 7) at each position.

The following report files are produced when the base editor mode is enabled:

Selected_nucleotide_percentage_table_around_sgRNA_NNNNN.txt is a tab-separated text file that shows the percentage of each base at selected nucleotides in the amplicon sequence around the sgRNA (here, shown by 'NNNNN'). If the base editing experiment targets cytosines (as set by the --base_editor_from parameter), each C in the quantification window will be numbered (e.g. C5 represents the cytosine at the 5th position in the selected nucleotides). The percentage of each base at these selected target cytosines is reported, with the first row showing the numbered cytosines, and the remainder of the rows showing the percentage of each nucleotide present at these locations. This file shows nucleotides within '--plot_window_size' bp of the position specified by the parameter '--quantification_window_center' relative to the 3' end of each guide.

Selected_nucleotide_frequency_table_around_sgRNA_NNNNN.txt is a tab-separated text file that shows the frequency of each base at selected nucleotides in the amplicon sequence around the sgRNA (here, shown by 'NNNNN'). If the base editing experiment targets cytosines (as set by the --base_editor_from parameter), each C in the quantification window will be numbered (e.g. C5 represents the cytosine at the 5th position in the selected nucleotides). The frequency of each base at these selected target cytosines is reported, with the first row showing the numbered cytosines, and the remainder of the rows showing the frequency of each nucleotide present at these locations. This file shows nucleotides within '--plot_window_size' bp of the position specified by the parameter '--quantification_window_center' relative to the 3' end of each guide.

The following report files are produced when the amplicon contains a coding sequence:

Frameshift_analysis.txt is a text file describing the number of noncoding, in-frame, and frameshift mutations. This report file is produced when the amplicon contains a coding sequence.

Splice_sites_analysis.txt is a text file describing the number of splicing sites that are unmodified and modified. This file report is produced when the amplicon contains a coding sequence.

Effect_vector_insertion_noncoding.txt is a tab-separated text file with a one-row header that shows the percentage of reads with a noncoding insertion at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a noncoding insertion at that location. This report file is produced when amplicon contains a coding sequence.

Effect_vector_deletion_noncoding.txt is a tab-separated text file with a one-row header that shows the percentage of reads with a noncoding deletion at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a noncoding deletion at that location. This report file is produced when amplicon contains a coding sequence.

Effect_vector_substitution_noncoding.txt is a tab-separated text file with a one-row header that shows the percentage of reads with a noncoding substitution at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a noncoding substitution at that location. This report file is produced when amplicon contains a coding sequence.

Guardrails

Guardrails automatically check the inputs and results of experiments against standardized values. The guardrail warnings that are triggered are printed in the commandline and at the top of generated reports. In order to turn off the guardrails, add the --disable_guardrails argument.

TotalReadsGuardrail: Checks if the number of reads is lower than expected. (Default: 10000)

OverallReadsAlignedGuardrail: Checks if the number of aligned reads is lower than expected. (Default: 90% of the total reads)

DisproportionateReadsAlignedGuardrail: Checks if the number of reads aligned to an amplicon is higher or lower than expected proportionally. (Default: 30% more or less than expected)

LowRatioOfModsInWindowToOutGuardrail: Checks if the ratio of modifications inside to outside the quantification window is lower than expected. (Default: 0.01)

HighRateOfModificationAtEndsGuardrail: Checks if there is a high rate of modifications at the ends of the read. (Default: 0.01)

HighRateOfSubstitutionsOutsideWindowGuardrail: Checks if there is a high rate of substitutions outside of the quantification windows. (Default: 0.002)

HighRateOfSubstitutionsGuardrail: Checks if the proportion of substitutions to other modifications is higher than expected. (Default: 0.3)

ShortSequenceGuardrail: Checks if the provided sequences (both Amplicons and Guides) are shorter than expected. (Amplicon Default: 50, Guide Default: 19)

LongAmpliconShortReadsGuardrail: Checks if the provided amplicon is more than times the average length of read. (Default: 1.5)

CRISPRessoBatch

CRISPRessoBatch allows users to analyze multiple input files and other command line arguments by uploading a Batch file, and then to run CRISPResso2 analysis on each file in parallel. Samples for which the amplicon and guide sequences are the same will be compared between batches, producing useful summary tables and comparison plots.

CRISPRessoBatch Parameters

CRISPRessoBatch Examples

CRISPRessoBatch‑‑amplicon_seq ‑‑amplicon_name ‑‑amplicon_min_alignment_score ‑‑default_min_aln_score ‑‑expand_ambiguous_alignments ‑‑assign_ambiguous_alignments_to_first_reference ‑‑guide_seq ‑‑guide_name ‑‑flexiguide_seq ‑‑flexiguide_homology ‑‑flexiguide_name ‑‑flexiguide_gap_open_penalty ‑‑flexiguide_gap_extend_penalty ‑‑discard_guide_positions_overhanging_amplicon_edge ‑‑expected_hdr_amplicon_seq ‑‑coding_seq ‑‑config_file ‑‑min_average_read_quality ‑‑min_single_bp_quality ‑‑min_bp_quality_or_N ‑‑file_prefix ‑‑name ‑‑suppress_amplicon_name_truncation ‑‑output_folder ‑‑verbosity ‑‑split_interleaved_input ‑‑trim_sequences ‑‑trimmomatic_command ‑‑trimmomatic_options_string ‑‑flash_command ‑‑fastp_command ‑‑fastp_options_string ‑‑min_paired_end_reads_overlap ‑‑max_paired_end_reads_overlap ‑‑stringent_flash_merging ‑‑quantification_window_size ‑‑quantification_window_center ‑‑exclude_bp_from_left ‑‑exclude_bp_from_right ‑‑use_legacy_insertion_quantification ‑‑ignore_substitutions ‑‑ignore_insertions ‑‑ignore_deletions ‑‑discard_indel_reads ‑‑needleman_wunsch_gap_open ‑‑needleman_wunsch_gap_extend ‑‑needleman_wunsch_gap_incentive ‑‑needleman_wunsch_aln_matrix_loc ‑‑plot_histogram_outliers ‑‑plot_window_size ‑‑min_frequency_alleles_around_cut_to_plot ‑‑expand_allele_plots_by_quantification ‑‑allele_plot_pcts_only_for_assigned_reference ‑‑quantification_window_coordinates ‑‑annotate_wildtype_allele ‑‑keep_intermediate ‑‑dump ‑‑write_detailed_allele_table ‑‑fastq_output ‑‑bam_output ‑‑bowtie2_index ‑‑zip_output ‑‑max_rows_alleles_around_cut_to_plot ‑‑suppress_report ‑‑place_report_in_output_folder ‑‑suppress_plots ‑‑base_editor_output ‑‑conversion_nuc_from ‑‑conversion_nuc_to ‑‑prime_editing_pegRNA_spacer_seq ‑‑prime_editing_pegRNA_extension_seq ‑‑prime_editing_pegRNA_extension_quantification_window_size ‑‑prime_editing_pegRNA_scaffold_seq ‑‑prime_editing_pegRNA_scaffold_min_match_length ‑‑prime_editing_nicking_guide_seq ‑‑prime_editing_override_prime_edited_ref_seq ‑‑prime_editing_override_sequence_checks ‑‑crispresso1_mode ‑‑dsODN ‑‑auto ‑‑debug ‑‑no_rerun ‑‑n_processes ‑‑bam_input ‑‑bam_chr_loc ‑‑batch_settings ‑‑skip_failed ‑‑min_reads_for_inclusion ‑‑batch_output_folder ‑‑suppress_batch_summary_plots ‑‑crispresso_command ‑‑disable_guardrails ‑‑use_matplotlib ‑‑halt_on_plot_fail

CRISPRessoBatch Parameters

Amplicon Sequence

-a, --amplicon_seq

Help: Amplicon Sequence (can be comma-separated list of multiple sequences)

Type: str


Amplicon Name

-an, --amplicon_name

Help: Amplicon Name (can be comma-separated list of multiple names, corresponding to amplicon sequences given in --amplicon_seq

Type: str

Default: Reference


Amplicon Min Alignment Score

-amas, --amplicon_min_alignment_score

Help: Amplicon Minimum Alignment Score; score between 0 and 100; sequences must have at least this homology score with the amplicon to be aligned (can be comma-separated list of multiple scores, corresponding to amplicon sequences given in --amplicon_seq)

Type: str


Default Minimum Alignment Score

--default_min_aln_score, --min_identity_score

Help: Default minimum homology score for a read to align to a reference amplicon

Type: int

Default: 60


Expand Ambiguous Alignments

--expand_ambiguous_alignments

Help: If more than one reference amplicon is given, reads that align to multiple reference amplicons will count equally toward each amplicon. Default behavior is to exclude ambiguous alignments.

Type: bool

Default: False


Assign Ambiguous Alignments To First Reference

--assign_ambiguous_alignments_to_first_reference

Help: If more than one reference amplicon is given, ambiguous reads that align with the same score to multiple amplicons will be assigned to the first amplicon. Default behavior is to exclude ambiguous alignments.

Type: bool

Default: False


Guide Seq

-g, --guide_seq, --sgRNA

Help: sgRNA sequence, if more than one, please separate by commas. Note that the sgRNA needs to be input as the guide RNA sequence (usually 20 nt) immediately adjacent to but not including the PAM sequence (5' of NGG for SpCas9). If the PAM is found on the opposite strand with respect to the Amplicon Sequence, ensure the sgRNA sequence is also found on the opposite strand. The CRISPResso convention is to depict the expected cleavage position using the value of the parameter '--quantification_window_center' nucleotides from the 3' end of the guide. In addition, the use of alternate nucleases besides SpCas9 is supported. For example, if using the Cpf1 system, enter the sequence (usually 20 nt) immediately 3' of the PAM sequence and explicitly set the '--cleavage_offset' parameter to 1, since the default setting of -3 is suitable only for SpCas9.

Type: str


Guide Name

-gn, --guide_name

Help: sgRNA names, if more than one, please separate by commas.

Type: str


Flexiguide Seq

-fg, --flexiguide_seq

Help: sgRNA sequence (flexible) (can be comma-separated list of multiple flexiguides). The flexiguide sequence will be aligned to the amplicon sequence(s), as long as the guide sequence has homology as set by --flexiguide_homology.

Type: str

Default: None


Flexiguide Homology

-fh, --flexiguide_homology

Help: flexiguides will yield guides in amplicons with at least this homology to the flexiguide sequence.

Type: int

Default: 80


Flexiguide Name

-fgn, --flexiguide_name

Help: flexiguide name

Type: str


Flexiguide Gap Open Penalty

--flexiguide_gap_open_penalty

Help:

Type: int

Default: -20


Flexiguide Gap Extend Penalty

--flexiguide_gap_extend_penalty

Help:

Type: int

Default: -2


Discard Guide Positions Overhanging Amplicon Edge

--discard_guide_positions_overhanging_amplicon_edge

Help: If set, for guides that align to multiple positions, guide positions will be discarded if plotting around those regions would included bp that extend beyond the end of the amplicon.

Type: bool

Default: False


Expected HDR Amplicon Sequence

-e, --expected_hdr_amplicon_seq

Help: Amplicon sequence expected after HDR

Type: str


Exon Specification Coding Sequence/s

-c, --coding_seq

Help: Subsequence/s of the amplicon sequence covering one or more coding sequences for frameshift analysis. If more than one (for example, split by intron/s), please separate by commas.

Type: str


Config File

--config_file

Help: File path to JSON file with config elements

Type: str

Default: None


Minimum Average Read Quality (phred33 Scale)

-q, --min_average_read_quality

Help: Minimum average quality score (phred33) to keep a read

Type: int


Minimum Single bp Quality (phred33 Scale)

-s, --min_single_bp_quality

Help: Minimum single bp score (phred33) to keep a read

Type: int


Minimum bp Quality or N (phred33 Scale)

--min_bp_quality_or_N

Help: Bases with a quality score (phred33) less than this value will be set to 'N'

Type: int


File Prefix

--file_prefix

Help: File prefix for output plots and tables

Type: str


Sample Name

-n, --name

Help: Output name of the report (default: the name is obtained from the filename of the fastq file/s used in input)

Type: str


Suppress Amplicon Name Truncation

--suppress_amplicon_name_truncation

Help: If set, amplicon names will not be truncated when creating output filename prefixes. If not set, amplicon names longer than 21 characters will be truncated when creating filename prefixes.

Type: bool

Default: False


Output Folder

-o, --output_folder

Help: Output folder to use for the analysis (default: current folder)

Type: str


Verbosity

-v, --verbosity

Help: Verbosity level of output to the console (1-4) 4 is the most verbose

Type: int

Default: 3


Split Interleaved Input

--split_interleaved_input, --split_paired_end

Help: Splits a single fastq file containing paired end reads into two files before running CRISPResso

Type: bool

Default: False


Trimming Adapter

--trim_sequences

Help: Enable the trimming with fastp

Type: bool

Default: False


Trimmomatic Command

--trimmomatic_command

Help: DEPRECATED in v2.3.0, use --fastp_command

Type: str

Default: None


Trimmomatic Options String

--trimmomatic_options_string

Help: DEPRECATED in v2.3.0, use --fastp_options_string

Type: str


Flash Command

--flash_command

Help: DEPRECATED in v2.3.0, use --fastp_command

Type: str

Default: None


Fastp Command

--fastp_command

Help: Command to run fastp

Type: str

Default: fastp


Fastp Options String

--fastp_options_string

Help: Override options for fastp, e.g. --length_required 70 --umi

Type: str


Min Paired End Reads Overlap

--min_paired_end_reads_overlap

Help: Parameter for the fastp read merging step. Minimum required overlap length between two reads to provide a confident overlap

Type: int

Default: 10


Max Paired End Reads Overlap

--max_paired_end_reads_overlap

Help: DEPRECATED in v2.3.0

Type: str

Default: None


Stringent Flash Merging

--stringent_flash_merging

Help: DEPRECATED in v2.3.0

Type: bool

Default: False


Quantification Window Size

-w, --quantification_window_size, --window_around_sgrna

Help: Defines the size (in bp) of the quantification window extending from the position specified by the '--cleavage_offset' or '--quantification_window_center' parameter in relation to the provided guide RNA sequence(s) (--sgRNA). Mutations within this number of bp from the quantification window center are used in classifying reads as modified or unmodified. A value of 0 disables this window and indels in the entire amplicon are considered. Default is 1, 1bp on each side of the cleavage position for a total length of 2bp. Multiple quantification window sizes (corresponding to each guide specified by --guide_seq) can be specified with a comma-separated list.

Type: str

Default: 1


Quantification Window Center

-wc, --quantification_window_center, --cleavage_offset

Help: Center of quantification window to use within respect to the 3' end of the provided sgRNA sequence. Remember that the sgRNA sequence must be entered without the PAM. For cleaving nucleases, this is the predicted cleavage position. The default is -3 and is suitable for the Cas9 system. For alternate nucleases, other cleavage offsets may be appropriate, for example, if using Cpf1 this parameter would be set to 1. For base editors, this could be set to -17 to only include mutations near the 5' end of the sgRNA. Multiple quantification window centers (corresponding to each guide specified by --guide_seq) can be specified with a comma-separated list.

Type: str

Default: -3


Exclude bp From Left

--exclude_bp_from_left

Help: Exclude bp from the left side of the amplicon sequence for the quantification of the indels

Type: int

Default: 15


Exclude bp From Right

--exclude_bp_from_right

Help: Exclude bp from the right side of the amplicon sequence for the quantification of the indels

Type: int

Default: 15


Use Legacy Insertion Quantification

--use_legacy_insertion_quantification

Help: If set, the legacy insertion quantification method will be used (i.e. with a 1bp quantification window, indels at the cut site and 1bp away from the cut site would be quantified). By default (if this parameter is not set) with a 1bp quantification window, only insertions at the cut site will be quantified.

Type: bool

Default: False


Ignore Substitutions

--ignore_substitutions

Help: Ignore substitutions events for the quantification and visualization

Type: bool

Default: False


Ignore Insertions

--ignore_insertions

Help: Ignore insertions events for the quantification and visualization

Type: bool

Default: False


Ignore Deletions

--ignore_deletions

Help: Ignore deletions events for the quantification and visualization

Type: bool

Default: False


Discard Indel Reads

--discard_indel_reads

Help: Discard reads with indels in the quantification window from analysis

Type: bool

Default: False


Needleman Wunsch Gap Open

--needleman_wunsch_gap_open

Help: Gap open option for Needleman-Wunsch alignment

Type: int

Default: -20


Needleman Wunsch Gap Extend

--needleman_wunsch_gap_extend

Help: Gap extend option for Needleman-Wunsch alignment

Type: int

Default: -2


Needleman Wunsch Gap Incentive

--needleman_wunsch_gap_incentive

Help: Gap incentive value for inserting indels at cut sites

Type: int

Default: 1


Needleman Wunsch Alignment Matrix Location

--needleman_wunsch_aln_matrix_loc

Help: Location of the matrix specifying substitution scores in the NCBI format (see ftp://ftp.ncbi.nih.gov/blast/matrices/)

Type: str

Default: EDNAFULL


Plot Histogram Outliers

--plot_histogram_outliers

Help: If set, all values will be shown on histograms. By default (if unset), histogram ranges are limited to plotting data within the 99 percentile.

Type: bool

Default: False


Plot Window Size

--plot_window_size, --offset_around_cut_to_plot

Help: Defines the size of the window extending from the quantification window center to plot. Nucleotides within plot_window_size of the quantification_window_center for each guide are plotted.

Type: int

Default: 20


Min Frequency Alleles Around Cut To Plot

--min_frequency_alleles_around_cut_to_plot

Help: Minimum % reads required to report an allele in the alleles table plot.

Type: float

Default: 0.2


Expand Allele Plots By Quantification

--expand_allele_plots_by_quantification

Help: If set, alleles with different modifications in the quantification window (but not necessarily in the plotting window (e.g. for another sgRNA)) are plotted on separate lines, even though they may have the same apparent sequence. To force the allele plot and the allele table to be the same, set this parameter. If unset, all alleles with the same sequence will be collapsed into one row.

Type: bool

Default: False


Allele Plot Percentages Only for Assigned Reference

--allele_plot_pcts_only_for_assigned_reference

Help: If set, in the allele plots, the percentages will show the percentage as a percent of reads aligned to the assigned reference. Default behavior is to show percentage as a percent of all reads.

Type: bool

Default: False


Quantification Window Coordinates

-qwc, --quantification_window_coordinates

Help: Bp positions in the amplicon sequence specifying the quantification window. This parameter overrides values of the '--quantification_window_center', '--cleavage_offset', '--window_around_sgrna' or '--window_around_sgrna' values. Any indels/substitutions outside this window are excluded. Indexes are 0-based, meaning that the first nucleotide is position 0. Ranges are separted by the dash sign (e.g. 'start-stop'), and multiple ranges can be separated by the underscore (_) (can be comma-separated list of values, corresponding to amplicon sequences given in --amplicon_seq e.g. 5-10,5-10_20-30 would specify the 6th-11th bp in the first reference and the 6th-11th and 21st-31st bp in the second reference). A value of 0 disables this filter for a particular amplicon (e.g. 0,90-110 This would disable the quantification window for the first amplicon and specify the quantification window of 90-110 for the second).Note that if there are multiple amplicons provided, and only one quantification window coordinate is provided, the same quantification window will be used for all amplicons and be adjusted to account for insertions/deletions.(default: None)

Type: str


Annotate Wildtype Allele

--annotate_wildtype_allele

Help: Wildtype alleles in the allele table plots will be marked with this string (e.g. **).

Type: str


Keep Intermediate

--keep_intermediate

Help: Keep all the intermediate files

Type: bool

Default: False


Dump

--dump

Help: Dump numpy arrays and pandas dataframes to file for debugging purposes

Type: bool

Default: False


Write Detailed Allele Table

--write_detailed_allele_table

Help: If set, a detailed allele table will be written with the following columns:

  • #Reads: the number of reads this allele represents.
  • Aligned_Sequence: the alignment of the read sequence.
  • Reference_Sequence: the alignment of the amplicon sequence.
  • n_inserted: the number of insertions within the quantification window.
  • n_deleted: the number of deletions within the quantification window.
  • n_mutated: the number of substitutions within the quantification window.
  • Reference_Name: the amplicon name to which this allele is assigned.
  • Read_Status: the bin to which this allele is classified.
  • Aligned_Reference_Names: if there are multiple amplicons, this lists the amplicon names. The order corresponds to the alignment scores in Aligned_Reference_Scores.
  • Aligned_Reference_Scores: the alignment score (out of 100) for each amplicon.
  • ref_positions: this represents the indices in the Aligned_Sequence that map back to the original sequence. Negative values represent places that don't map back to the original reference.
  • all_insertion_positions: all of the indices where there is an insertion regardless of the quantification window.
  • all_insertion_left_positions: for all insertions, the left most index (e.g. where each insertion starts).
  • insertion_positions: the insertion positions within the quantification window.
  • insertion_coordinates: the start and end indices of the insertions within the quantificaiton window.
  • insertion_sizes: the size of each insertion within the quantification window.
  • all_deletion_positions: all of the indices where there is a deletion regardless of the quantification window.
  • deletion_positions: the indices where there is a deletion within the quantification window.
  • deletion_coordinates: the start and end indices of the deletions within the quantification window.
  • deletion_sizes: the size of the deletions within the quantification window.
  • all_substitution_positions: all of the indices where there is a substitution.
  • substitution_positions: the indices where there is a substitution within the quantification window.
  • substitution_values: the nucleotide to which it is substituted within the quantification window.
  • %Reads: the percentage of read this allele represents.

Type: bool

Default: False


Fastq Output

--fastq_output

Help: If set, a fastq file with annotations for each read will be produced.

Type: bool

Default: False


Bam Output

--bam_output

Help: If set, a bam file with alignments for each read will be produced.

Type: bool

Default: False


Bowtie2 Index

-x, --bowtie2_index

Help: Basename of Bowtie2 index for the reference genome

Type: str


Zip Output

--zip_output

Help: If set, the output will be placed in a zip folder.

Type: bool

Default: False


Max Rows Alleles Around Cut To Plot

--max_rows_alleles_around_cut_to_plot

Help: Maximum number of rows to report in the alleles table plot.

Type: int

Default: 50


Suppress Report

--suppress_report

Help: Suppress output report

Type: bool

Default: False


Place Report In Output Folder

--place_report_in_output_folder

Help: If true, report will be written inside the CRISPResso output folder. By default, the report will be written one directory up from the report output.

Type: bool

Default: False


Suppress Plots

--suppress_plots

Help: Suppress output plots

Type: bool

Default: False


Base Editor Output

--base_editor_output

Help: Outputs plots and tables to aid in analysis of base editor studies.

Type: bool

Default: False


Conversion Nuc From

--conversion_nuc_from

Help: For base editor plots, this is the nucleotide targeted by the base editor

Type: str

Default: C


Conversion Nuc To

--conversion_nuc_to

Help: For base editor plots, this is the nucleotide produced by the base editor

Type: str

Default: T


Prime Editing Spacer Sequence

--prime_editing_pegRNA_spacer_seq

Help: pegRNA spacer sgRNA sequence used in prime editing. The spacer should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the given sequence.

Type: str


Prime Editing Extension Sequence

--prime_editing_pegRNA_extension_seq

Help: Extension sequence used in prime editing. The sequence should be given in the RNA 5'->3' order, such that the sequence starts with the RT template including the edit, followed by the Primer-binding site (PBS).

Type: str


Prime Editing pegRNA Extension Quantification Window Size

--prime_editing_pegRNA_extension_quantification_window_size

Help: Quantification window size (in bp) at flap site for measuring modifications anchored at the right side of the extension sequence. Similar to the --quantification_window parameter, the total length of the quantification window will be 2x this parameter. Default: 5bp (10bp total window size)

Type: int

Default: 5


Prime Editing pegRNA Scaffold Sequence

--prime_editing_pegRNA_scaffold_seq

Help: If given, reads containing any of this scaffold sequence before extension sequence (provided by --prime_editing_extension_seq) will be classified as 'Scaffold-incorporated'. The sequence should be given in the 5'->3' order such that the RT template directly follows this sequence. A common value is 'GGCACCGAGUCGGUGC'.

Type: str


Prime Editing pegRNA Scaffold Min Match Length

--prime_editing_pegRNA_scaffold_min_match_length

Help: Minimum number of bases matching scaffold sequence for the read to be counted as 'Scaffold-incorporated'. If the scaffold sequence matches the reference sequence at the incorporation site, the minimum number of bases to match will be minimally increased (beyond this parameter) to disambiguate between prime-edited and scaffold-incorporated sequences.

Type: int

Default: 1


Prime Editing Nicking Guide Sequence

--prime_editing_nicking_guide_seq

Help: Nicking sgRNA sequence used in prime editing. The sgRNA should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the sequence

Type: str


Prime Editing Override Prime Edited Reference Sequence

--prime_editing_override_prime_edited_ref_seq

Help: If given, this sequence will be used as the prime-edited reference sequence. This may be useful if the prime-edited reference sequence has large indels or the algorithm cannot otherwise infer the correct reference sequence.

Type: str


Prime Editing Override Sequence Checks

--prime_editing_override_sequence_checks

Help: If set, checks to assert that the prime editing guides and extension sequence are in the proper orientation are not performed. This may be useful if the checks are failing inappropriately, but the user is confident that the sequences are correct.

Type: bool

Default: False


CRISPResso 1 Mode

--crispresso1_mode

Help: Parameter usage as in CRISPResso 1

Type: bool

Default: False


dsODN

--dsODN

Help: Label reads with the dsODN sequence provided

Type: str


Auto

--auto

Help: Infer amplicon sequence from most common reads

Type: bool

Default: False


Debug

--debug

Help: Show debug messages

Type: bool

Default: False


No Rerun

--no_rerun

Help: Don't rerun CRISPResso2 if a run using the same parameters has already been finished.

Type: bool

Default: False


Number of Processes

-p, --n_processes

Help: Specify the number of processes to use for analysis. Please use with caution since increasing this parameter will significantly increase the memory required to run CRISPResso. Can be set to 'max'.

Type: str

Default: 1


Bam Input

--bam_input

Help: Aligned reads for processing in bam format

Type: str


BAM Chromosome Location

--bam_chr_loc

Help: Chromosome location in bam for reads to process. For example: 'chr1:50-100' or 'chrX'.

Type: str


Batch Settings

-bs, --batch_settings

Help: Settings file for batch. Must be tab-separated text file. The header row contains CRISPResso parameters (e.g., fastq_r1, fastq_r2, amplicon_seq, and other optional parameters). Each following row sets parameters for an additional batch.

Type: str


Skip Failed

--skip_failed

Help: Continue with batch analysis even if one sample fails

Type: bool

Default: False


Min Reads For Inclusion

--min_reads_for_inclusion

Help: Minimum number of reads for a batch to be included in the batch summary

Type: int


Batch Output Folder

-bo, --batch_output_folder

Help: Directory where batch analysis output will be stored

Type: str


Suppress Batch Summary Plots

--suppress_batch_summary_plots

Help: Suppress batch summary plots - e.g. if many samples are run at once, the summary plots of all sub-runs may be too large. This parameter suppresses the production of these plots.

Type: bool

Default: False


CRISPResso Command

--crispresso_command

Help: CRISPResso command to call

Type: str

Default: CRISPResso


Disable Guardrails

--disable_guardrails

Help: Disable guardrail warnings

Type: bool

Default: False


Use Matplotlib

--use_matplotlib

Help: Use matplotlib for plotting instead of plotly/d3 when CRISPRessoPro is installed

Type: bool

Default: False


Halt On Plot Fail

--halt_on_plot_fail

Help: Halt execution if a plot fails to generate

Type: bool

Default: False


CRISPRessoBatch Examples

Example run: Batch mode

Download the test dataset files SRR3305543.fastq.gz, SRR3305544.fastq.gz, SRR3305545.fastq.gz, and SRR3305546.fastq.gz to your current directory. These are files are the first 25,000 sequences from an editing experiment performed on several base editors. Also include a batch file that lists these files and the sample names: batch.batch To analyze this experiment, run the following command:

Using Bioconda:

CRISPRessoBatch --batch_settings batch.batch --amplicon_seq CATTGCAGAGAGGCGTATCATTTCGCGGATGTTCCAATCAGTACGCAGAGAGTCGCCGTCTCCAAGGTGAAAGCGGAAGTAGGGCCTTCGCGCACCTCATGGAATCCCTTCTGCAGCACCTGGATCGCTTTTCCGAGCTTCTGGCGGTCTCAAGCACTACCTACGTCAGCACCTGGGACCCC -p 4 --base_editor_output -g GGAATCCCTTCTGCAGCACC -wc -10 -w 20

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoBatch --batch_settings batch.batch --amplicon_seq CATTGCAGAGAGGCGTATCATTTCGCGGATGTTCCAATCAGTACGCAGAGAGTCGCCGTCTCCAAGGTGAAAGCGGAAGTAGGGCCTTCGCGCACCTCATGGAATCCCTTCTGCAGCACCTGGATCGCTTTTCCGAGCTTCTGGCGGTCTCAAGCACTACCTACGTCAGCACCTGGGACCCC -p 4 --base_editor_output -g GGAATCCCTTCTGCAGCACC -wc -10 -w 20

This should produce a folder called CRISPRessoBatch_on_batch. Open the file called CRISPRessoBatch_on_batch/CRISPResso2Batch_report.html in a web browser, and you should see an output like this: CRISPResso2Batch_report.html.

CRISPRessoPooled

CRISPRessoPooled is a utility to analyze and quantify targeted sequencing CRISPR/Cas9 experiments involving pooled amplicon sequencing libraries. One common experimental strategy is to pool multiple amplicons (e.g. a single on-target site plus a set of potential off-target sites) into a single deep sequencing reaction1. CRISPRessoPooled demultiplexes reads from multiple amplicons and runs the CRISPResso utility with appropriate reads for each amplicon separately.

Modes

This tool can run in 3 different modes:

CRISPRessoPooled Parameters

CRISPRessoPooled Examples

CRISPRessoPooled‑‑fastq_r1 ‑‑fastq_r2 ‑‑amplicon_seq ‑‑amplicon_name ‑‑amplicon_min_alignment_score ‑‑default_min_aln_score ‑‑expand_ambiguous_alignments ‑‑assign_ambiguous_alignments_to_first_reference ‑‑guide_seq ‑‑guide_name ‑‑flexiguide_seq ‑‑flexiguide_homology ‑‑flexiguide_name ‑‑flexiguide_gap_open_penalty ‑‑flexiguide_gap_extend_penalty ‑‑discard_guide_positions_overhanging_amplicon_edge ‑‑expected_hdr_amplicon_seq ‑‑coding_seq ‑‑config_file ‑‑min_average_read_quality ‑‑min_single_bp_quality ‑‑min_bp_quality_or_N ‑‑file_prefix ‑‑name ‑‑suppress_amplicon_name_truncation ‑‑output_folder ‑‑verbosity ‑‑split_interleaved_input ‑‑trim_sequences ‑‑trimmomatic_command ‑‑trimmomatic_options_string ‑‑flash_command ‑‑fastp_command ‑‑fastp_options_string ‑‑min_paired_end_reads_overlap ‑‑max_paired_end_reads_overlap ‑‑samtools_exclude_flags ‑‑stringent_flash_merging ‑‑quantification_window_size ‑‑quantification_window_center ‑‑exclude_bp_from_left ‑‑exclude_bp_from_right ‑‑use_legacy_insertion_quantification ‑‑ignore_substitutions ‑‑ignore_insertions ‑‑ignore_deletions ‑‑discard_indel_reads ‑‑needleman_wunsch_gap_open ‑‑needleman_wunsch_gap_extend ‑‑needleman_wunsch_gap_incentive ‑‑needleman_wunsch_aln_matrix_loc ‑‑plot_histogram_outliers ‑‑plot_window_size ‑‑min_frequency_alleles_around_cut_to_plot ‑‑expand_allele_plots_by_quantification ‑‑allele_plot_pcts_only_for_assigned_reference ‑‑quantification_window_coordinates ‑‑annotate_wildtype_allele ‑‑keep_intermediate ‑‑dump ‑‑write_detailed_allele_table ‑‑fastq_output ‑‑bam_output ‑‑bowtie2_index ‑‑zip_output ‑‑max_rows_alleles_around_cut_to_plot ‑‑suppress_report ‑‑place_report_in_output_folder ‑‑suppress_plots ‑‑base_editor_output ‑‑conversion_nuc_from ‑‑conversion_nuc_to ‑‑prime_editing_pegRNA_spacer_seq ‑‑prime_editing_pegRNA_extension_seq ‑‑prime_editing_pegRNA_extension_quantification_window_size ‑‑prime_editing_pegRNA_scaffold_seq ‑‑prime_editing_pegRNA_scaffold_min_match_length ‑‑prime_editing_nicking_guide_seq ‑‑prime_editing_override_prime_edited_ref_seq ‑‑prime_editing_override_sequence_checks ‑‑crispresso1_mode ‑‑dsODN ‑‑auto ‑‑debug ‑‑no_rerun ‑‑n_processes ‑‑bam_input ‑‑bam_chr_loc ‑‑skip_failed ‑‑crispresso_command ‑‑amplicons_file ‑‑gene_annotations ‑‑bowtie2_options_string ‑‑use_legacy_bowtie2_options_string ‑‑min_reads_to_use_region_pooled ‑‑skip_reporting_problematic_regions ‑‑compile_postrun_references ‑‑compile_postrun_reference_allele_cutoff ‑‑alternate_alleles ‑‑limit_open_files_for_demux ‑‑aligned_pooled_bam ‑‑demultiplex_only_at_amplicons ‑‑demultiplex_genome_wide ‑‑disable_guardrails ‑‑use_matplotlib ‑‑halt_on_plot_fail

1

Briefly, genomic DNA samples for pooled applications can be prepared by first amplifying the target regions for each gene/target of interest with regions of 150-400bp depending on the desired coverage. In a second round of PCR, with minimized cycle numbers, barcode and adaptors are added. With optimization, these two rounds of PCR can be merged into a single reaction. These reactions are then quantified, normalized, pooled, and undergo quality control before being sequenced.

CRISPRessoPooled Amplicons Mode

Amplicons Mode Input

Given a set of amplicon sequences, in this mode the tool demultiplexes the reads, aligning each read to the amplicon with best alignment, and creates separate compressed FASTQ files, one for each amplicon. Reads that do not align to any amplicon are discarded. After this preprocessing, CRISPResso is run for each FASTQ file, and separated reports are generated, one for each amplicon.

To run the tool in this mode the user must provide:

  • Paired-end reads (two files) or single-end reads (single file) in FASTQ format (fastq.gz files are also accepted)

  • A description file containing the amplicon sequences used to enrich regions in the genome and some additional information. In particular, this file, is a tab delimited text file with up to 14 columns (first 2 columns required):

    1. AMPLICON_NAME: an identifier for the amplicon (must be unique).

    2. AMPLICON_SEQUENCE: amplicon sequence used in the design of the experiment.

    3. sgRNA_SEQUENCE (OPTIONAL): sgRNA sequence used for this amplicon without the PAM sequence. If not available, enter NA.

    4. EXPECTED_AMPLICON_AFTER_HDR (OPTIONAL): expected amplicon sequence in case of HDR. If more than one, separate by commas and not spaces. If not available, enter NA.

    5. CODING_SEQUENCE (OPTIONAL): Subsequence(s) of the amplicon corresponding to coding sequences. If more than one, separate by commas and not spaces. If not available, enter NA.

    6. PRIME_EDITING_PEGRNA_SPACER_SEQ (OPTIONAL): pegRNA spacer sgRNA sequence used in prime editing. The spacer should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the given sequence. If not available, enter NA.

    7. PRIME_EDITING_NICKING_GUIDE_SEQ (OPTIONAL): Nicking sgRNA sequence used in prime editing. The sgRNA should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the sequence. If not available, enter NA.

    8. PRIME_EDITING_PEGRNA_EXTENSION_SEQ (OPTIONAL): Extension sequence used in prime editing. The sequence should be given in the RNA 5'->3' order, such that the sequence starts with the RT template including the edit, followed by the Primer-binding site (PBS). If not available, enter NA.

    9. PRIME_EDITING_PEGRNA_SCAFFOLD_SEQ (OPTIONAL): If given, reads containing any of this scaffold sequence before extension sequence (provided by --prime_editing_pegRNA_extension_seq) will be classified as 'Scaffold-incorporated'. The sequence should be given in the 5'->3' order such that the RT template directly follows this sequence. A common value ends with 'GGCACCGAGUCGGUGC'. If not available, enter NA.

    10. PRIME_EDITING_PEGRNA_SCAFFOLD_MIN_MATCH_LENGTH (OPTIONAL): Minimum number of bases matching scaffold sequence for the read to be counted as 'Scaffold-incorporated'. If the scaffold sequence matches the reference sequence at the incorporation site, the minimum number of bases to match will be minimally increased (beyond this parameter) to disambiguate between prime-edited and scaffold-incorporated sequences. If not available, enter NA.

    11. PRIME_EDITING_OVERRIDE_PRIME_EDITED_REF_SEQ (OPTIONAL):If given, this sequence will be used as the prime-edited reference sequence. This may be useful if the prime-edited reference sequence has large indels or the algorithm cannot otherwise infer the correct reference sequence. If not available, enter NA.

    12. QWC or QUANTIFICATION_WINDOW_COORDINATES (OPTIONAL): Bp positions in the amplicon sequence specifying the quantification window. Any indels/substitutions outside this window are excluded. Indexes are 0-based, meaning that the first nucleotide is position 0. Ranges are separated by the dash sign like "start-stop", and multiple ranges can be separated by the underscore (_). A value of 0 disables this filter. If not available, enter NA.

    13. W or QUANTIFICATION_WINDOW_SIZE (OPTIONAL): Defines the size (in bp) of the quantification window extending from the position specified by the --cleavage_offset or --quantification_window_center parameter in relation to the provided guide RNA sequence(s) (--sgRNA). Mutations within this number of bp from the quantification window center are used in classifying reads as modified or unmodified. A value of 0 disables this window and indels in the entire amplicon are considered. Default is 1, 1bp on each side of the cleavage position for a total length of 2bp. (default: 1) If not available, enter NA.

    14. WC or QUANTIFICATION_WINDOW_CENTER (OPTIONAL): Center of quantification window to use within respect to the 3' end of the provided sgRNA sequence. Remember that the sgRNA sequence must be entered without the PAM. For cleaving nucleases, this is the predicted cleavage position. The default is -3 and is suitable for the Cas9 system. For alternate nucleases, other cleavage offsets may be appropriate, for example, if using Cpf1/Cas12a this parameter would be set to 1. For base editors, this could be set to -17. (default: -3) If not available, enter NA.

A file in the correct format should look like this:

Site1	CACACTGTGGCCCCTGTGCCCAGCCCTGGGCTCTCTGTACATGAAGCAAC CCCTGTGCCCAGCCC	NA	NA
Site2	GTCCTGGTTTTTGGTTTGGGAAATATAGTCATC	NA GTCCTGGTTTTTGGTTTAAAAAAATATAGTCATC	NA
Site3	TTTCTGGTTTTTGGTTTGGGAAATATAGTCATC	NA	NA	GGAAATATA

The user can easily create this file with any text editor or with spreadsheet software like Excel (Microsoft), Numbers (Apple) or Sheets (Google Docs) and then save it as tab delimited file.

Amplicons Mode Output

The output of CRISPRessoPooled Amplicons mode consists of:

  1. REPORT_READS_ALIGNED_TO_AMPLICONS.txt: this file contains the same information provided in the input description file, plus some additional columns:

    1. Demultiplexed_fastq.gz_filename: name of the files containing the raw reads for each amplicon.

    2. n_reads: number of reads recovered for each amplicon.

  2. A set of fastq.gz files, one for each amplicon.

  3. A set of folders, one for each amplicon, containing a full CRISPResso report.

  4. SAMPLES_QUANTIFICATION_SUMMARY.txt: this file contains a summary of the quantification and the alignment statistics for each region analyzed (read counts and percentages for the various classes: Unmodified, NHEJ, point mutations, and HDR).

  5. CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.

CRISPRessoPooled Genome Mode

Genome Mode Input

In this mode the tool aligns each read to the best location in the genome. Then potential amplicons are discovered looking for regions with enough reads (the default setting is to have at least 1000 reads, but the parameter can be adjusted with the option --min_reads_to_use_region). If a gene annotation file from UCSC is provided, the tool also reports the overlapping gene/s to the region. In this way it is possible to check if the amplified regions map to expected genomic locations and/or also to pseudogenes or other problematic regions. Finally CRISPResso is run in each region discovered.

To run the tool in this mode the user must provide:

  • Paired-end reads (two files) or single-end reads (single file) in FASTQ format (fastq.gz files are also accepted)

  • The full path of the reference genome in bowtie2 format (e.g. /genomes/human_hg19/hg19). Instructions on how to build a custom index or precomputed index for human and mouse genome assembly can be downloaded from the bowtie2 website: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml.

  • Optionally the full path of a gene annotations file from UCSC. The user can download this file from the UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start) selecting as table "knownGene", as output format "all fields from selected table" and as file returned "gzip compressed". (e.g. /genomes/human_hg19/gencode_v19.gz)

Genome Mode Output

The output of CRISPRessoPooled Genome mode consists of:

  1. REPORT_READS_ALIGNED_TO_GENOME_ONLY.txt: this file contains the list of all the regions discovered, one per line with the following information:

    1. chr_id: chromosome of the region in the reference genome.

    2. bpstart: start coordinate of the region in the reference genome.

    3. bpend: end coordinate of the region in the reference genome.

    4. fastq_file: location of the fastq.gz file containing the reads mapped to the region.

    5. n_reads: number of reads mapped to the region.

    6. sequence: the sequence, on the reference genome for the region.

  2. MAPPED_REGIONS (folder): this folder contains all the fastq.gz files for the discovered regions.

  3. A set of folders with the CRISPResso report on the regions with enough reads.

  4. SAMPLES_QUANTIFICATION_SUMMARY.txt: this file contains a summary of the quantification and the alignment statistics for each region analyzed (read counts and percentages for the various classes: Unmodified, NHEJ, point mutations, and HDR).

  5. CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.

This running mode is particularly useful to check for mapping artifacts or contamination in the library. In an optimal experiment, the list of the regions discovered should contain only the regions for which amplicons were designed.

CRISPRessoPooled Mixed Mode (Amplicons + Genome)

Mixed Mode Input

In this mode, the tool first aligns reads to the genome and, as in the Genome mode, discovers aligning regions with reads exceeding a tunable threshold. Next it will align the amplicon sequences to the reference genome and will use only the reads that match both the amplicon locations and the discovered genomic locations, excluding spurious reads coming from other regions, or reads not properly trimmed. Finally CRISPResso is run using each of the surviving regions.

To run the tool in this mode the user must provide:

  1. Paired-end reads (two files) or single-end reads (single file) in FASTQ format (fastq.gz files are also accepted)

  2. A description file containing the amplicon sequences used to enrich regions in the genome and some additional information (as described in the Amplicons mode section).

  3. The reference genome in bowtie2 format (as described in Genome mode section).

  4. Optionally the gene annotations from UCSC (as described in Genome mode section).

Mixed Mode Output

The output of CRISPRessoPooled Mixed Amplicons + Genome mode consists of these files:

  1. REPORT_READS_ALIGNED_TO_GENOME_AND_AMPLICONS.txt: this file contains the same information provided in the input description file, plus some additional columns:

    1. Amplicon_Specific_fastq.gz_filename: name of the file containing the raw reads recovered for the amplicon.

    2. n_reads: number of reads recovered for the amplicon.

    3. Gene_overlapping: gene/s overlapping the amplicon region.

    4. chr_id: chromosome of the amplicon in the reference genome.

    5. bpstart: start coordinate of the amplicon in the reference genome.

    6. bpend: end coordinate of the amplicon in the reference genome.

    7. Reference_Sequence: sequence in the reference genome for the region mapped for the amplicon.

  2. MAPPED_REGIONS (folder): this folder contains all the fastq.gz files for the discovered regions.

  3. A set of folders with the CRISPResso report on the amplicons with enough reads.

  4. SAMPLES_QUANTIFICATION_SUMMARY.txt: this file contains a summary of the quantification and the alignment statistics for each region analyzed (read counts and percentages for the various classes: Unmodified, NHEJ, point mutations, and HDR).

  5. CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.

The Mixed mode combines the benefits of the two previous running modes. In this mode it is possible to recover in an unbiased way all the genomic regions contained in the library, and hence discover contaminations or mapping artifacts. In addition, by knowing the location of the amplicon with respect to the reference genome, reads not properly trimmed or mapped to pseudogenes or other problematic regions will be automatically discarded, providing the cleanest set of reads to quantify the mutations in the target regions with CRISPResso.

If the focus of the analysis is to obtain the best quantification of editing efficiency for a set of amplicons, we suggest running the tool in the Mixed mode. The Genome mode is instead suggested to check problematic libraries, since a report is generated for each region discovered, even if the region is not mappable to any amplicon (however, his may be time consuming). Finally the Amplicons mode is the fastest, although the least reliable in terms of quantification accuracy.

CRISPRessoPooled Parameters

Fastq R1

-r1, --fastq_r1

Help: First fastq file

Type: str


Fastq R2

-r2, --fastq_r2

Help: Second fastq file for paired end reads

Type: str


Amplicon Sequence

-a, --amplicon_seq

Help: Amplicon Sequence (can be comma-separated list of multiple sequences)

Type: str


Amplicon Name

-an, --amplicon_name

Help: Amplicon Name (can be comma-separated list of multiple names, corresponding to amplicon sequences given in --amplicon_seq

Type: str

Default: Reference


Amplicon Min Alignment Score

-amas, --amplicon_min_alignment_score

Help: Amplicon Minimum Alignment Score; score between 0 and 100; sequences must have at least this homology score with the amplicon to be aligned (can be comma-separated list of multiple scores, corresponding to amplicon sequences given in --amplicon_seq)

Type: str


Default Minimum Alignment Score

--default_min_aln_score, --min_identity_score

Help: Default minimum homology score for a read to align to a reference amplicon

Type: int

Default: 60


Expand Ambiguous Alignments

--expand_ambiguous_alignments

Help: If more than one reference amplicon is given, reads that align to multiple reference amplicons will count equally toward each amplicon. Default behavior is to exclude ambiguous alignments.

Type: bool

Default: False


Assign Ambiguous Alignments To First Reference

--assign_ambiguous_alignments_to_first_reference

Help: If more than one reference amplicon is given, ambiguous reads that align with the same score to multiple amplicons will be assigned to the first amplicon. Default behavior is to exclude ambiguous alignments.

Type: bool

Default: False


Guide Seq

-g, --guide_seq, --sgRNA

Help: sgRNA sequence, if more than one, please separate by commas. Note that the sgRNA needs to be input as the guide RNA sequence (usually 20 nt) immediately adjacent to but not including the PAM sequence (5' of NGG for SpCas9). If the PAM is found on the opposite strand with respect to the Amplicon Sequence, ensure the sgRNA sequence is also found on the opposite strand. The CRISPResso convention is to depict the expected cleavage position using the value of the parameter '--quantification_window_center' nucleotides from the 3' end of the guide. In addition, the use of alternate nucleases besides SpCas9 is supported. For example, if using the Cpf1 system, enter the sequence (usually 20 nt) immediately 3' of the PAM sequence and explicitly set the '--cleavage_offset' parameter to 1, since the default setting of -3 is suitable only for SpCas9.

Type: str


Guide Name

-gn, --guide_name

Help: sgRNA names, if more than one, please separate by commas.

Type: str


Flexiguide Seq

-fg, --flexiguide_seq

Help: sgRNA sequence (flexible) (can be comma-separated list of multiple flexiguides). The flexiguide sequence will be aligned to the amplicon sequence(s), as long as the guide sequence has homology as set by --flexiguide_homology.

Type: str

Default: None


Flexiguide Homology

-fh, --flexiguide_homology

Help: flexiguides will yield guides in amplicons with at least this homology to the flexiguide sequence.

Type: int

Default: 80


Flexiguide Name

-fgn, --flexiguide_name

Help: flexiguide name

Type: str


Flexiguide Gap Open Penalty

--flexiguide_gap_open_penalty

Help:

Type: int

Default: -20


Flexiguide Gap Extend Penalty

--flexiguide_gap_extend_penalty

Help:

Type: int

Default: -2


Discard Guide Positions Overhanging Amplicon Edge

--discard_guide_positions_overhanging_amplicon_edge

Help: If set, for guides that align to multiple positions, guide positions will be discarded if plotting around those regions would included bp that extend beyond the end of the amplicon.

Type: bool

Default: False


Expected HDR Amplicon Sequence

-e, --expected_hdr_amplicon_seq

Help: Amplicon sequence expected after HDR

Type: str


Exon Specification Coding Sequence/s

-c, --coding_seq

Help: Subsequence/s of the amplicon sequence covering one or more coding sequences for frameshift analysis. If more than one (for example, split by intron/s), please separate by commas.

Type: str


Config File

--config_file

Help: File path to JSON file with config elements

Type: str

Default: None


Minimum Average Read Quality (phred33 Scale)

-q, --min_average_read_quality

Help: Minimum average quality score (phred33) to keep a read

Type: int


Minimum Single bp Quality (phred33 Scale)

-s, --min_single_bp_quality

Help: Minimum single bp score (phred33) to keep a read

Type: int


Minimum bp Quality or N (phred33 Scale)

--min_bp_quality_or_N

Help: Bases with a quality score (phred33) less than this value will be set to 'N'

Type: int


File Prefix

--file_prefix

Help: File prefix for output plots and tables

Type: str


Sample Name

-n, --name

Help: Output name of the report (default: the name is obtained from the filename of the fastq file/s used in input)

Type: str


Suppress Amplicon Name Truncation

--suppress_amplicon_name_truncation

Help: If set, amplicon names will not be truncated when creating output filename prefixes. If not set, amplicon names longer than 21 characters will be truncated when creating filename prefixes.

Type: bool

Default: False


Output Folder

-o, --output_folder

Help: Output folder to use for the analysis (default: current folder)

Type: str


Verbosity

-v, --verbosity

Help: Verbosity level of output to the console (1-4) 4 is the most verbose

Type: int

Default: 3


Split Interleaved Input

--split_interleaved_input, --split_paired_end

Help: Splits a single fastq file containing paired end reads into two files before running CRISPResso

Type: bool

Default: False


Trimming Adapter

--trim_sequences

Help: Enable the trimming with fastp

Type: bool

Default: False


Trimmomatic Command

--trimmomatic_command

Help: DEPRECATED in v2.3.0, use --fastp_command

Type: str

Default: None


Trimmomatic Options String

--trimmomatic_options_string

Help: DEPRECATED in v2.3.0, use --fastp_options_string

Type: str


Flash Command

--flash_command

Help: DEPRECATED in v2.3.0, use --fastp_command

Type: str

Default: None


Fastp Command

--fastp_command

Help: Command to run fastp

Type: str

Default: fastp


Fastp Options String

--fastp_options_string

Help: Override options for fastp, e.g. --length_required 70 --umi

Type: str


Min Paired End Reads Overlap

--min_paired_end_reads_overlap

Help: Parameter for the fastp read merging step. Minimum required overlap length between two reads to provide a confident overlap

Type: int

Default: 10


Max Paired End Reads Overlap

--max_paired_end_reads_overlap

Help: DEPRECATED in v2.3.0

Type: str

Default: None


Samtools Exclude Flags

--samtools_exclude_flags

Help: Exclude reads with any of the specified flags set in the SAM/BAM file. Flags can be specified in either base 16 (hex) or base 10. Default is 4 (read unmapped).

Type: str

Default: 4


Stringent Flash Merging

--stringent_flash_merging

Help: DEPRECATED in v2.3.0

Type: bool

Default: False


Quantification Window Size

-w, --quantification_window_size, --window_around_sgrna

Help: Defines the size (in bp) of the quantification window extending from the position specified by the '--cleavage_offset' or '--quantification_window_center' parameter in relation to the provided guide RNA sequence(s) (--sgRNA). Mutations within this number of bp from the quantification window center are used in classifying reads as modified or unmodified. A value of 0 disables this window and indels in the entire amplicon are considered. Default is 1, 1bp on each side of the cleavage position for a total length of 2bp. Multiple quantification window sizes (corresponding to each guide specified by --guide_seq) can be specified with a comma-separated list.

Type: str

Default: 1


Quantification Window Center

-wc, --quantification_window_center, --cleavage_offset

Help: Center of quantification window to use within respect to the 3' end of the provided sgRNA sequence. Remember that the sgRNA sequence must be entered without the PAM. For cleaving nucleases, this is the predicted cleavage position. The default is -3 and is suitable for the Cas9 system. For alternate nucleases, other cleavage offsets may be appropriate, for example, if using Cpf1 this parameter would be set to 1. For base editors, this could be set to -17 to only include mutations near the 5' end of the sgRNA. Multiple quantification window centers (corresponding to each guide specified by --guide_seq) can be specified with a comma-separated list.

Type: str

Default: -3


Exclude bp From Left

--exclude_bp_from_left

Help: Exclude bp from the left side of the amplicon sequence for the quantification of the indels

Type: int

Default: 15


Exclude bp From Right

--exclude_bp_from_right

Help: Exclude bp from the right side of the amplicon sequence for the quantification of the indels

Type: int

Default: 15


Use Legacy Insertion Quantification

--use_legacy_insertion_quantification

Help: If set, the legacy insertion quantification method will be used (i.e. with a 1bp quantification window, indels at the cut site and 1bp away from the cut site would be quantified). By default (if this parameter is not set) with a 1bp quantification window, only insertions at the cut site will be quantified.

Type: bool

Default: False


Ignore Substitutions

--ignore_substitutions

Help: Ignore substitutions events for the quantification and visualization

Type: bool

Default: False


Ignore Insertions

--ignore_insertions

Help: Ignore insertions events for the quantification and visualization

Type: bool

Default: False


Ignore Deletions

--ignore_deletions

Help: Ignore deletions events for the quantification and visualization

Type: bool

Default: False


Discard Indel Reads

--discard_indel_reads

Help: Discard reads with indels in the quantification window from analysis

Type: bool

Default: False


Needleman Wunsch Gap Open

--needleman_wunsch_gap_open

Help: Gap open option for Needleman-Wunsch alignment

Type: int

Default: -20


Needleman Wunsch Gap Extend

--needleman_wunsch_gap_extend

Help: Gap extend option for Needleman-Wunsch alignment

Type: int

Default: -2


Needleman Wunsch Gap Incentive

--needleman_wunsch_gap_incentive

Help: Gap incentive value for inserting indels at cut sites

Type: int

Default: 1


Needleman Wunsch Alignment Matrix Location

--needleman_wunsch_aln_matrix_loc

Help: Location of the matrix specifying substitution scores in the NCBI format (see ftp://ftp.ncbi.nih.gov/blast/matrices/)

Type: str

Default: EDNAFULL


Plot Histogram Outliers

--plot_histogram_outliers

Help: If set, all values will be shown on histograms. By default (if unset), histogram ranges are limited to plotting data within the 99 percentile.

Type: bool

Default: False


Plot Window Size

--plot_window_size, --offset_around_cut_to_plot

Help: Defines the size of the window extending from the quantification window center to plot. Nucleotides within plot_window_size of the quantification_window_center for each guide are plotted.

Type: int

Default: 20


Min Frequency Alleles Around Cut To Plot

--min_frequency_alleles_around_cut_to_plot

Help: Minimum % reads required to report an allele in the alleles table plot.

Type: float

Default: 0.2


Expand Allele Plots By Quantification

--expand_allele_plots_by_quantification

Help: If set, alleles with different modifications in the quantification window (but not necessarily in the plotting window (e.g. for another sgRNA)) are plotted on separate lines, even though they may have the same apparent sequence. To force the allele plot and the allele table to be the same, set this parameter. If unset, all alleles with the same sequence will be collapsed into one row.

Type: bool

Default: False


Allele Plot Percentages Only for Assigned Reference

--allele_plot_pcts_only_for_assigned_reference

Help: If set, in the allele plots, the percentages will show the percentage as a percent of reads aligned to the assigned reference. Default behavior is to show percentage as a percent of all reads.

Type: bool

Default: False


Quantification Window Coordinates

-qwc, --quantification_window_coordinates

Help: Bp positions in the amplicon sequence specifying the quantification window. This parameter overrides values of the '--quantification_window_center', '--cleavage_offset', '--window_around_sgrna' or '--window_around_sgrna' values. Any indels/substitutions outside this window are excluded. Indexes are 0-based, meaning that the first nucleotide is position 0. Ranges are separted by the dash sign (e.g. 'start-stop'), and multiple ranges can be separated by the underscore (_) (can be comma-separated list of values, corresponding to amplicon sequences given in --amplicon_seq e.g. 5-10,5-10_20-30 would specify the 6th-11th bp in the first reference and the 6th-11th and 21st-31st bp in the second reference). A value of 0 disables this filter for a particular amplicon (e.g. 0,90-110 This would disable the quantification window for the first amplicon and specify the quantification window of 90-110 for the second).Note that if there are multiple amplicons provided, and only one quantification window coordinate is provided, the same quantification window will be used for all amplicons and be adjusted to account for insertions/deletions.(default: None)

Type: str


Annotate Wildtype Allele

--annotate_wildtype_allele

Help: Wildtype alleles in the allele table plots will be marked with this string (e.g. **).

Type: str


Keep Intermediate

--keep_intermediate

Help: Keep all the intermediate files

Type: bool

Default: False


Dump

--dump

Help: Dump numpy arrays and pandas dataframes to file for debugging purposes

Type: bool

Default: False


Write Detailed Allele Table

--write_detailed_allele_table

Help: If set, a detailed allele table will be written with the following columns:

  • #Reads: the number of reads this allele represents.
  • Aligned_Sequence: the alignment of the read sequence.
  • Reference_Sequence: the alignment of the amplicon sequence.
  • n_inserted: the number of insertions within the quantification window.
  • n_deleted: the number of deletions within the quantification window.
  • n_mutated: the number of substitutions within the quantification window.
  • Reference_Name: the amplicon name to which this allele is assigned.
  • Read_Status: the bin to which this allele is classified.
  • Aligned_Reference_Names: if there are multiple amplicons, this lists the amplicon names. The order corresponds to the alignment scores in Aligned_Reference_Scores.
  • Aligned_Reference_Scores: the alignment score (out of 100) for each amplicon.
  • ref_positions: this represents the indices in the Aligned_Sequence that map back to the original sequence. Negative values represent places that don't map back to the original reference.
  • all_insertion_positions: all of the indices where there is an insertion regardless of the quantification window.
  • all_insertion_left_positions: for all insertions, the left most index (e.g. where each insertion starts).
  • insertion_positions: the insertion positions within the quantification window.
  • insertion_coordinates: the start and end indices of the insertions within the quantificaiton window.
  • insertion_sizes: the size of each insertion within the quantification window.
  • all_deletion_positions: all of the indices where there is a deletion regardless of the quantification window.
  • deletion_positions: the indices where there is a deletion within the quantification window.
  • deletion_coordinates: the start and end indices of the deletions within the quantification window.
  • deletion_sizes: the size of the deletions within the quantification window.
  • all_substitution_positions: all of the indices where there is a substitution.
  • substitution_positions: the indices where there is a substitution within the quantification window.
  • substitution_values: the nucleotide to which it is substituted within the quantification window.
  • %Reads: the percentage of read this allele represents.

Type: bool

Default: False


Fastq Output

--fastq_output

Help: If set, a fastq file with annotations for each read will be produced.

Type: bool

Default: False


Bam Output

--bam_output

Help: If set, a bam file with alignments for each read will be produced.

Type: bool

Default: False


Bowtie2 Index

-x, --bowtie2_index

Help: Basename of Bowtie2 index for the reference genome

Type: str


Zip Output

--zip_output

Help: If set, the output will be placed in a zip folder.

Type: bool

Default: False


Max Rows Alleles Around Cut To Plot

--max_rows_alleles_around_cut_to_plot

Help: Maximum number of rows to report in the alleles table plot.

Type: int

Default: 50


Suppress Report

--suppress_report

Help: Suppress output report

Type: bool

Default: False


Place Report In Output Folder

--place_report_in_output_folder

Help: If true, report will be written inside the CRISPResso output folder. By default, the report will be written one directory up from the report output.

Type: bool

Default: False


Suppress Plots

--suppress_plots

Help: Suppress output plots

Type: bool

Default: False


Base Editor Output

--base_editor_output

Help: Outputs plots and tables to aid in analysis of base editor studies.

Type: bool

Default: False


Conversion Nuc From

--conversion_nuc_from

Help: For base editor plots, this is the nucleotide targeted by the base editor

Type: str

Default: C


Conversion Nuc To

--conversion_nuc_to

Help: For base editor plots, this is the nucleotide produced by the base editor

Type: str

Default: T


Prime Editing Spacer Sequence

--prime_editing_pegRNA_spacer_seq

Help: pegRNA spacer sgRNA sequence used in prime editing. The spacer should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the given sequence.

Type: str


Prime Editing Extension Sequence

--prime_editing_pegRNA_extension_seq

Help: Extension sequence used in prime editing. The sequence should be given in the RNA 5'->3' order, such that the sequence starts with the RT template including the edit, followed by the Primer-binding site (PBS).

Type: str


Prime Editing pegRNA Extension Quantification Window Size

--prime_editing_pegRNA_extension_quantification_window_size

Help: Quantification window size (in bp) at flap site for measuring modifications anchored at the right side of the extension sequence. Similar to the --quantification_window parameter, the total length of the quantification window will be 2x this parameter. Default: 5bp (10bp total window size)

Type: int

Default: 5


Prime Editing pegRNA Scaffold Sequence

--prime_editing_pegRNA_scaffold_seq

Help: If given, reads containing any of this scaffold sequence before extension sequence (provided by --prime_editing_extension_seq) will be classified as 'Scaffold-incorporated'. The sequence should be given in the 5'->3' order such that the RT template directly follows this sequence. A common value is 'GGCACCGAGUCGGUGC'.

Type: str


Prime Editing pegRNA Scaffold Min Match Length

--prime_editing_pegRNA_scaffold_min_match_length

Help: Minimum number of bases matching scaffold sequence for the read to be counted as 'Scaffold-incorporated'. If the scaffold sequence matches the reference sequence at the incorporation site, the minimum number of bases to match will be minimally increased (beyond this parameter) to disambiguate between prime-edited and scaffold-incorporated sequences.

Type: int

Default: 1


Prime Editing Nicking Guide Sequence

--prime_editing_nicking_guide_seq

Help: Nicking sgRNA sequence used in prime editing. The sgRNA should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the sequence

Type: str


Prime Editing Override Prime Edited Reference Sequence

--prime_editing_override_prime_edited_ref_seq

Help: If given, this sequence will be used as the prime-edited reference sequence. This may be useful if the prime-edited reference sequence has large indels or the algorithm cannot otherwise infer the correct reference sequence.

Type: str


Prime Editing Override Sequence Checks

--prime_editing_override_sequence_checks

Help: If set, checks to assert that the prime editing guides and extension sequence are in the proper orientation are not performed. This may be useful if the checks are failing inappropriately, but the user is confident that the sequences are correct.

Type: bool

Default: False


CRISPResso 1 Mode

--crispresso1_mode

Help: Parameter usage as in CRISPResso 1

Type: bool

Default: False


dsODN

--dsODN

Help: Label reads with the dsODN sequence provided

Type: str


Auto

--auto

Help: Infer amplicon sequence from most common reads

Type: bool

Default: False


Debug

--debug

Help: Show debug messages

Type: bool

Default: False


No Rerun

--no_rerun

Help: Don't rerun CRISPResso2 if a run using the same parameters has already been finished.

Type: bool

Default: False


Number of Processes

-p, --n_processes

Help: Specify the number of processes to use for analysis. Please use with caution since increasing this parameter will significantly increase the memory required to run CRISPResso. Can be set to 'max'.

Type: str

Default: 1


Bam Input

--bam_input

Help: Aligned reads for processing in bam format

Type: str


BAM Chromosome Location

--bam_chr_loc

Help: Chromosome location in bam for reads to process. For example: 'chr1:50-100' or 'chrX'.

Type: str


Skip Failed

--skip_failed

Help: Continue with batch analysis even if one sample fails

Type: bool

Default: False


CRISPResso Command

--crispresso_command

Help: CRISPResso command to call

Type: str

Default: CRISPResso


Amplicons File

-f, --amplicons_file

Help: Amplicons description file. This file is a tab-delimited text file with up to 14 columns (2 required):

  • amplicon_name: an identifier for the amplicon (must be unique).
  • amplicon_seq: amplicon sequence used in the experiment.
  • guide_seq (OPTIONAL): sgRNA sequence used for this amplicon without the PAM sequence. Multiple guides can be given separated by commas and not spaces.
  • expected_hdr_amplicon_seq (OPTIONAL): expected amplicon sequence in case of HDR.
  • coding_seq (OPTIONAL): Subsequence(s) of the amplicon corresponding to coding sequences. If more than one separate them by commas and not spaces.
  • prime_editing_pegRNA_spacer_seq (OPTIONAL): pegRNA spacer sgRNA sequence used in prime editing. The spacer should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the given sequence.
  • prime_editing_nicking_guide_seq (OPTIONAL): Nicking sgRNA sequence used in prime editing. The sgRNA should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the sequence.
  • prime_editing_pegRNA_extension_seq (OPTIONAL): Extension sequence used in prime editing. The sequence should be given in the RNA 5'->3' order, such that the sequence starts with the RT template including the edit, followed by the Primer-binding site (PBS).
  • prime_editing_pegRNA_scaffold_seq (OPTIONAL): If given, reads containing any of this scaffold sequence before extension sequence (provided by --prime_editing_extension_seq) will be classified as 'Scaffold-incorporated'. The sequence should be given in the 5'->3' order such that the RT template directly follows this sequence. A common value ends with 'GGCACCGAGUCGGUGC'.
  • prime_editing_pegRNA_scaffold_min_match_length (OPTIONAL): Minimum number of bases matching scaffold sequence for the read to be counted as 'Scaffold-incorporated'. If the scaffold sequence matches the reference sequence at the incorporation site, the minimum number of bases to match will be minimally increased (beyond this parameter) to disambiguate between prime-edited and scaffold-incorporated sequences.
  • prime_editing_override_prime_edited_ref_seq (OPTIONAL): If given, this sequence will be used as the prime-edited reference sequence. This may be useful if the prime-edited reference sequence has large indels or the algorithm cannot otherwise infer the correct reference sequence.
  • quantification_window_coordinates (OPTIONAL): Bp positions in the amplicon sequence specifying the quantification window. This parameter overrides values of the '--quantification_window_center', '-- cleavage_offset', '--window_around_sgrna' or '-- window_around_sgrna' values. Any indels/substitutions outside this window are excluded. Indexes are 0-based, meaning that the first nucleotide is position 0. Ranges are separated by the dash sign like 'start-stop', and multiple ranges can be separated by the underscore (_). A value of 0 disables this filter. (can be comma-separated list of values, corresponding to amplicon sequences given in --amplicon_seq e.g. 5-10,5-10_20-30 would specify the 5th-10th bp in the first reference and the 5th-10th and 20th-30th bp in the second reference) (default: None)
  • quantification_window_size (OPTIONAL): Defines the size (in bp) of the quantification window extending from the position specified by the '--cleavage_offset' or '--quantification_window_center' parameter in relation to the provided guide RNA sequence(s) (--sgRNA). Mutations within this number of bp from the quantification window center are used in classifying reads as modified or unmodified. A value of 0 disables this window and indels in the entire amplicon are considered. Default is 1, 1bp on each side of the cleavage position for a total length of 2bp.
  • quantification_window_center (OPTIONAL): Center of quantification window to use within respect to the 3' end of the provided sgRNA sequence. Remember that the sgRNA sequence must be entered without the PAM. For cleaving nucleases, this is the predicted cleavage position. The default is -3 and is suitable for the Cas9 system. For alternate nucleases, other cleavage offsets may be appropriate, for example, if using Cpf1 this parameter would be set to 1. For base editors, this could be set to -17.

Type: str


Gene Annotations

--gene_annotations

Help: Gene Annotation Table from UCSC Genome Browser Tables (http://genome.ucsc.edu/cgi-bin/hgTables?command=start), please select as table 'knownGene', as output format 'all fields from selected table' and as file returned 'gzip compressed'

Type: str


Bowtie2 Options String

--bowtie2_options_string

Help: Override options for the Bowtie2 alignment command. By default, this is ' --end-to-end -N 0 --np 0 -mp 3,2 --score-min L,-5,-3(1-H)' where H is the default homology score.

Type: str


Use Legacy Bowtie2 Options String

--use_legacy_bowtie2_options_string

Help: Use legacy (more stringent) Bowtie2 alignment parameters: ' -k 1 --end-to-end -N 0 --np 0 '.

Type: bool

Default: False


Minimum Reads to Use Region

--min_reads_to_use_region

Help: Minimum number of reads that align to a region to perform the CRISPResso analysis

Type: float

Default: 1000


Skip Reporting Problematic Regions

--skip_reporting_problematic_regions

Help: Skip reporting of problematic regions. By default, when both amplicons (-f) and genome (-x) are provided, problematic reads that align to the genome but to positions other than where the amplicons align are reported as problematic

Type: bool

Default: False


Compile Postrun References

--compile_postrun_references

Help: If set, a file will be produced which compiles the reference sequences of frequent amplicons.

Type: bool

Default: False


Compile Postrun Reference Allele Cutoff

--compile_postrun_reference_allele_cutoff

Help: Only alleles with at least this percentage frequency in the population will be reported in the postrun analysis. This parameter is given as a percent, so 30 is 30%.

Type: float

Default: 30


Alternate Alleles

--alternate_alleles

Help: Path to tab-separated file with alternate allele sequences for pooled experiments. This file has the columns 'region_name','reference_seqs', and 'reference_names' and gives the reference sequences of alternate alleles that will be passed to CRISPResso for each individual region for allelic analysis. Multiple reference alleles and reference names for a given region name are separated by commas (no spaces).

Type: str


Limit Open Files For Demux

--limit_open_files_for_demux

Help: If set, only one file will be opened during demultiplexing of read alignment locations. This will be slightly slower as the reads must be sorted, but may be necessary if the number of amplicons is greater than the number of files that can be opened due to OS constraints.

Type: bool

Default: False


Aligned Pooled Bam

--aligned_pooled_bam

Help: Path to aligned input for CRISPRessoPooled processing. If this parameter is specified, the alignments in the given bam will be used to demultiplex reads. If this parameter is not set (default), input reads provided by --fastq_r1 (and optionally --fastq_r2) will be aligned to the reference genome using bowtie2. If the input bam is given, the corresponding reference fasta must also be given to extract reference genomic sequences via the parameter --bowtie2_index. Note that if the aligned reads are paired-end sequenced, they should already be merged into 1 read (e.g. via Flash) before alignment.

Type: str


Demultiplex Only At Amplicons

--demultiplex_only_at_amplicons

Help: DEPRECATED in v2.3.2, see demultiplex_at_amplicons_and_genome

Type: bool

Default: False


Demultiplex Genome Wide

--demultiplex_genome_wide

Help: If set, and an amplicon file (--amplicons_file) and reference sequence (--bowtie2_index) are provided, the entire genome will be demultiplexed and reads with the exact same start and stop coordinates as an amplicon will be assigned to that amplicon. If this flag is not set, reads overlapping alignment positions of amplicons will be demultiplexed and assigned to that amplicon.

Type: bool

Default: False


Disable Guardrails

--disable_guardrails

Help: Disable guardrail warnings

Type: bool

Default: False


Use Matplotlib

--use_matplotlib

Help: Use matplotlib for plotting instead of plotly/d3 when CRISPRessoPro is installed

Type: bool

Default: False


Halt On Plot Fail

--halt_on_plot_fail

Help: Halt execution if a plot fails to generate

Type: bool

Default: False


CRISPRessoPooled Examples

Example:

Using Bioconda:

CRISPRessoPooled -r1 SRR1046762_1.fastq.gz -r2 SRR1046762_2.fastq.gz -f AMPLICONS_FILE.txt --name ONLY_AMPLICONS_SRR1046762 --gene_annotations gencode_v19.gz

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoPooled -r1 SRR1046762_1.fastq.gz -r2 SRR1046762_2.fastq.gz -f AMPLICONS_FILE.txt --name ONLY_AMPLICONS_SRR1046762 --gene_annotations gencode_v19.gz

The output of CRISPRessoPooled Amplicons mode consists of:

  1. REPORT_READS_ALIGNED_TO_AMPLICONS.txt: this file contains the same information provided in the input description file, plus some additional columns:

    a. Demultiplexed_fastq.gz_filename: name of the files containing the raw reads for each amplicon.

    b. n_reads: number of reads recovered for each amplicon.

  2. A set of fastq.gz files, one for each amplicon.

  3. A set of folders, one for each amplicon, containing a full CRISPResso report.

  4. SAMPLES_QUANTIFICATION_SUMMARY.txt: this file contains a summary of the quantification and the alignment statistics for each region analyzed (read counts and percentages for the various classes: Unmodified, NHEJ, point mutations, and HDR).

  5. CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.

Genome mode: In this mode the tool aligns each read to the best location in the genome. Then potential amplicons are discovered looking for regions with enough reads (the default setting is to have at least 1000 reads, but the parameter can be adjusted with the option --min_reads_to_use_region). If a gene annotation file from UCSC is provided, the tool also reports the overlapping gene/s to the region. In this way it is possible to check if the amplified regions map to expected genomic locations and/or also to pseudogenes or other problematic regions. Finally CRISPResso is run in each region discovered.

To run the tool in this mode the user must provide:

  1. Paired-end reads (two files) or single-end reads (single file) in FASTQ format (fastq.gz files are also accepted)

  2. The full path of the reference genome in bowtie2 format (e.g. /genomes/human_hg19/hg19). Instructions on how to build a custom index or precomputed index for human and mouse genome assembly can be downloaded from the bowtie2 website: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml.

  3. Optionally the full path of a gene annotations file from UCSC. The user can download this file from the UCSC Genome Browser ( http://genome.ucsc.edu/cgi-bin/hgTables?command=start ) selecting as table "knownGene", as output format "all fields from selected table" and as file returned "gzip compressed". (e.g. /genomes/human_hg19/gencode_v19.gz)

Example:

Using Bioconda:

CRISPRessoPooled -r1 SRR1046762_1.fastq.gz -r2 SRR1046762_2.fastq.gz -x /GENOMES/hg19/hg19 --name ONLY_GENOME_SRR1046762 --gene_annotations gencode_v19.gz

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoPooled -r1 SRR1046762_1.fastq.gz -r2 SRR1046762_2.fastq.gz -x /GENOMES/hg19/hg19 --name ONLY_GENOME_SRR1046762 --gene_annotations gencode_v19.gz

The output of CRISPRessoPooled Genome mode consists of:

  1. REPORT_READS_ALIGNED_TO_GENOME_ONLY.txt: this file contains the list of all the regions discovered, one per line with the following information:

    • chr_id: chromosome of the region in the reference genome.

    • bpstart: start coordinate of the region in the reference genome.

    • bpend: end coordinate of the region in the reference genome.

    • fastq_file: location of the fastq.gz file containing the reads mapped to the region.

    • n_reads: number of reads mapped to the region.

    • sequence: the sequence, on the reference genome for the region.

  2. MAPPED_REGIONS (folder): this folder contains all the fastq.gz files for the discovered regions.

  3. A set of folders with the CRISPResso report on the regions with enough reads.

  4. SAMPLES_QUANTIFICATION_SUMMARY.txt: this file contains a summary of the quantification and the alignment statistics for each region analyzed (read counts and percentages for the various classes: Unmodified, NHEJ, point mutations, and HDR).

  5. CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.

    This running mode is particularly useful to check for mapping artifacts or contamination in the library. In an optimal experiment, the list of the regions discovered should contain only the regions for which amplicons were designed.

Mixed mode (Amplicons + Genome): in this mode, the tool first aligns reads to the genome and, as in the Genome mode, discovers aligning regions with reads exceeding a tunable threshold. Next it will align the amplicon sequences to the reference genome and will use only the reads that match both the amplicon locations and the discovered genomic locations, excluding spurious reads coming from other regions, or reads not properly trimmed. Finally CRISPResso is run using each of the surviving regions.

To run the tool in this mode the user must provide:

  • Paired-end reads (two files) or single-end reads (single file) in FASTQ format (fastq.gz files are also accepted)

  • A description file containing the amplicon sequences used to enrich regions in the genome and some additional information (as described in the Amplicons mode section).

  • The reference genome in bowtie2 format (as described in Genome mode section).

  • Optionally the gene annotations from UCSC (as described in Genome mode section).

Example:

Using Bioconda:

CRISPRessoPooled -r1 SRR1046762_1.fastq.gz -r2 SRR1046762_2.fastq.gz -f AMPLICONS_FILE.txt -x /GENOMES/hg19/hg19 --name AMPLICONS_AND_GENOME_SRR1046762 --gene_annotations gencode_v19.gz

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoPooled -r1 SRR1046762_1.fastq.gz -r2 SRR1046762_2.fastq.gz -f AMPLICONS_FILE.txt -x /GENOMES/hg19/hg19 --name AMPLICONS_AND_GENOME_SRR1046762 --gene_annotations gencode_v19.gz

The output of CRISPRessoPooled Mixed Amplicons + Genome mode consists of these files:

  1. REPORT_READS_ALIGNED_TO_GENOME_AND_AMPLICONS.txt: this file contains the same information provided in the input description file, plus some additional columns:

    a. Amplicon_Specific_fastq.gz_filename: name of the file containing the raw reads recovered for the amplicon.

    b. n_reads: number of reads recovered for the amplicon.

    c. Gene_overlapping: gene/s overlapping the amplicon region.

    d. chr_id: chromosome of the amplicon in the reference genome.

    e. bpstart: start coordinate of the amplicon in the reference genome.

    f. bpend: end coordinate of the amplicon in the reference genome.

    g. Reference_Sequence: sequence in the reference genome for the region mapped for the amplicon.

  2. MAPPED_REGIONS (folder): this folder contains all the fastq.gz files for the discovered regions.

  3. A set of folders with the CRISPResso report on the amplicons with enough reads.

  4. SAMPLES_QUANTIFICATION_SUMMARY.txt: this file contains a summary of the quantification and the alignment statistics for each region analyzed (read counts and percentages for the various classes: Unmodified, NHEJ, point mutations, and HDR).

  5. CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.

The Mixed mode combines the benefits of the two previous running modes. In this mode it is possible to recover in an unbiased way all the genomic regions contained in the library, and hence discover contaminations or mapping artifacts. In addition, by knowing the location of the amplicon with respect to the reference genome, reads not properly trimmed or mapped to pseudogenes or other problematic regions will be automatically discarded, providing the cleanest set of reads to quantify the mutations in the target regions with CRISPResso.

If the focus of the analysis is to obtain the best quantification of editing efficiency for a set of amplicons, we suggest running the tool in the Mixed mode. The Genome mode is instead suggested to check problematic libraries, since a report is generated for each region discovered, even if the region is not mappable to any amplicon (however, his may be time consuming). Finally the Amplicon mode is the fastest, although the least reliable in terms of quantification accuracy.

CRISPRessoWGS

CRISPRessoWGS is a utility for the analysis of genome editing experiment from whole genome sequencing (WGS) data. CRISPRessoWGS allows exploring any region of the genome to quantify targeted editing or potentially off-target effects. The intended use case for CRISPRessoWGS is the analysis of targeted regions, and WGS reads from those regions will be realigned using CRISPResso's alignment aligorithm for more accurate genome editing quantification. To scan the entire genome for mutations VarScan or MuTect are more suitable, and identified regions can be analyzed and visualized using CRISPRessoWGS.

CRISPRessoWGS Inputs

To run CRISPRessoWGS you must provide:

A genome aligned BAM file. To align reads from a WGS experiment to the genome there are many options available, we suggest using either Bowtie2 or BWA.

A FASTA file containing the reference sequence used to align the reads and create the BAM file (the reference files for the most common organism can be download from UCSC: http://hgdownload.soe.ucsc.edu/downloads.html. Download and uncompress only the file ending with .fa.gz, for example for the last version of the human genome download and uncompress the file hg38.fa.gz)

Descriptions file (--region_file) containing the coordinates of the regions to analyze and some additional information. In particular, this file is a tab delimited text file with up to 7 columns (4 required):

  • chr_id: chromosome of the region in the reference genome.

  • bpstart: start coordinate of the region in the reference genome.

  • bpend: end coordinate of the region in the reference genome.

  • REGION_NAME: an identifier for the region (must be unique).

  • sgRNA_SEQUENCE (OPTIONAL): sgRNA sequence used for this genomic segment without the PAM sequence. If not available, enter NA.

  • EXPECTED_SEGMENT_AFTER_HDR (OPTIONAL): expected genomic segment sequence in case of HDR. If more than one, separate by commas and not spaces. If not available, enter NA.

  • CODING_SEQUENCE (OPTIONAL): Subsequence(s) of the genomic segment corresponding to coding sequences. If more than one, separate by commas and not spaces. If not available, enter NA.

CRISPRessoWGS Parameters

CRISPRessoWGS Examples

CRISPRessoWGS‑‑amplicon_min_alignment_score ‑‑default_min_aln_score ‑‑expand_ambiguous_alignments ‑‑assign_ambiguous_alignments_to_first_reference ‑‑guide_seq ‑‑guide_name ‑‑flexiguide_seq ‑‑flexiguide_homology ‑‑flexiguide_name ‑‑flexiguide_gap_open_penalty ‑‑flexiguide_gap_extend_penalty ‑‑discard_guide_positions_overhanging_amplicon_edge ‑‑expected_hdr_amplicon_seq ‑‑coding_seq ‑‑config_file ‑‑min_average_read_quality ‑‑min_single_bp_quality ‑‑min_bp_quality_or_N ‑‑file_prefix ‑‑name ‑‑suppress_amplicon_name_truncation ‑‑output_folder ‑‑verbosity ‑‑trim_sequences ‑‑trimmomatic_command ‑‑trimmomatic_options_string ‑‑flash_command ‑‑fastp_command ‑‑fastp_options_string ‑‑min_paired_end_reads_overlap ‑‑max_paired_end_reads_overlap ‑‑samtools_exclude_flags ‑‑stringent_flash_merging ‑‑quantification_window_size ‑‑quantification_window_center ‑‑exclude_bp_from_left ‑‑exclude_bp_from_right ‑‑use_legacy_insertion_quantification ‑‑ignore_substitutions ‑‑ignore_insertions ‑‑ignore_deletions ‑‑discard_indel_reads ‑‑needleman_wunsch_gap_open ‑‑needleman_wunsch_gap_extend ‑‑needleman_wunsch_gap_incentive ‑‑needleman_wunsch_aln_matrix_loc ‑‑plot_histogram_outliers ‑‑plot_window_size ‑‑min_frequency_alleles_around_cut_to_plot ‑‑expand_allele_plots_by_quantification ‑‑allele_plot_pcts_only_for_assigned_reference ‑‑quantification_window_coordinates ‑‑annotate_wildtype_allele ‑‑keep_intermediate ‑‑dump ‑‑write_detailed_allele_table ‑‑fastq_output ‑‑bam_output ‑‑bowtie2_index ‑‑zip_output ‑‑max_rows_alleles_around_cut_to_plot ‑‑suppress_report ‑‑place_report_in_output_folder ‑‑suppress_plots ‑‑base_editor_output ‑‑conversion_nuc_from ‑‑conversion_nuc_to ‑‑prime_editing_pegRNA_spacer_seq ‑‑prime_editing_pegRNA_extension_seq ‑‑prime_editing_pegRNA_extension_quantification_window_size ‑‑prime_editing_pegRNA_scaffold_seq ‑‑prime_editing_pegRNA_scaffold_min_match_length ‑‑prime_editing_nicking_guide_seq ‑‑prime_editing_override_prime_edited_ref_seq ‑‑prime_editing_override_sequence_checks ‑‑crispresso1_mode ‑‑dsODN ‑‑auto ‑‑debug ‑‑no_rerun ‑‑n_processes ‑‑skip_failed ‑‑crispresso_command ‑‑gene_annotations ‑‑bam_file ‑‑region_file ‑‑reference_file ‑‑min_reads_to_use_region_wgs ‑‑disable_guardrails ‑‑use_matplotlib ‑‑halt_on_plot_fail

CRISPRessoWGS Parameters

Amplicon Min Alignment Score

-amas, --amplicon_min_alignment_score

Help: Amplicon Minimum Alignment Score; score between 0 and 100; sequences must have at least this homology score with the amplicon to be aligned (can be comma-separated list of multiple scores, corresponding to amplicon sequences given in --amplicon_seq)

Type: str


Default Minimum Alignment Score

--default_min_aln_score, --min_identity_score

Help: Default minimum homology score for a read to align to a reference amplicon

Type: int

Default: 60


Expand Ambiguous Alignments

--expand_ambiguous_alignments

Help: If more than one reference amplicon is given, reads that align to multiple reference amplicons will count equally toward each amplicon. Default behavior is to exclude ambiguous alignments.

Type: bool

Default: False


Assign Ambiguous Alignments To First Reference

--assign_ambiguous_alignments_to_first_reference

Help: If more than one reference amplicon is given, ambiguous reads that align with the same score to multiple amplicons will be assigned to the first amplicon. Default behavior is to exclude ambiguous alignments.

Type: bool

Default: False


Guide Seq

-g, --guide_seq, --sgRNA

Help: sgRNA sequence, if more than one, please separate by commas. Note that the sgRNA needs to be input as the guide RNA sequence (usually 20 nt) immediately adjacent to but not including the PAM sequence (5' of NGG for SpCas9). If the PAM is found on the opposite strand with respect to the Amplicon Sequence, ensure the sgRNA sequence is also found on the opposite strand. The CRISPResso convention is to depict the expected cleavage position using the value of the parameter '--quantification_window_center' nucleotides from the 3' end of the guide. In addition, the use of alternate nucleases besides SpCas9 is supported. For example, if using the Cpf1 system, enter the sequence (usually 20 nt) immediately 3' of the PAM sequence and explicitly set the '--cleavage_offset' parameter to 1, since the default setting of -3 is suitable only for SpCas9.

Type: str


Guide Name

-gn, --guide_name

Help: sgRNA names, if more than one, please separate by commas.

Type: str


Flexiguide Seq

-fg, --flexiguide_seq

Help: sgRNA sequence (flexible) (can be comma-separated list of multiple flexiguides). The flexiguide sequence will be aligned to the amplicon sequence(s), as long as the guide sequence has homology as set by --flexiguide_homology.

Type: str

Default: None


Flexiguide Homology

-fh, --flexiguide_homology

Help: flexiguides will yield guides in amplicons with at least this homology to the flexiguide sequence.

Type: int

Default: 80


Flexiguide Name

-fgn, --flexiguide_name

Help: flexiguide name

Type: str


Flexiguide Gap Open Penalty

--flexiguide_gap_open_penalty

Help:

Type: int

Default: -20


Flexiguide Gap Extend Penalty

--flexiguide_gap_extend_penalty

Help:

Type: int

Default: -2


Discard Guide Positions Overhanging Amplicon Edge

--discard_guide_positions_overhanging_amplicon_edge

Help: If set, for guides that align to multiple positions, guide positions will be discarded if plotting around those regions would included bp that extend beyond the end of the amplicon.

Type: bool

Default: False


Expected HDR Amplicon Sequence

-e, --expected_hdr_amplicon_seq

Help: Amplicon sequence expected after HDR

Type: str


Exon Specification Coding Sequence/s

-c, --coding_seq

Help: Subsequence/s of the amplicon sequence covering one or more coding sequences for frameshift analysis. If more than one (for example, split by intron/s), please separate by commas.

Type: str


Config File

--config_file

Help: File path to JSON file with config elements

Type: str

Default: None


Minimum Average Read Quality (phred33 Scale)

-q, --min_average_read_quality

Help: Minimum average quality score (phred33) to keep a read

Type: int


Minimum Single bp Quality (phred33 Scale)

-s, --min_single_bp_quality

Help: Minimum single bp score (phred33) to keep a read

Type: int


Minimum bp Quality or N (phred33 Scale)

--min_bp_quality_or_N

Help: Bases with a quality score (phred33) less than this value will be set to 'N'

Type: int


File Prefix

--file_prefix

Help: File prefix for output plots and tables

Type: str


Sample Name

-n, --name

Help: Output name of the report (default: the name is obtained from the filename of the fastq file/s used in input)

Type: str


Suppress Amplicon Name Truncation

--suppress_amplicon_name_truncation

Help: If set, amplicon names will not be truncated when creating output filename prefixes. If not set, amplicon names longer than 21 characters will be truncated when creating filename prefixes.

Type: bool

Default: False


Output Folder

-o, --output_folder

Help: Output folder to use for the analysis (default: current folder)

Type: str


Verbosity

-v, --verbosity

Help: Verbosity level of output to the console (1-4) 4 is the most verbose

Type: int

Default: 3


Trimming Adapter

--trim_sequences

Help: Enable the trimming with fastp

Type: bool

Default: False


Trimmomatic Command

--trimmomatic_command

Help: DEPRECATED in v2.3.0, use --fastp_command

Type: str

Default: None


Trimmomatic Options String

--trimmomatic_options_string

Help: DEPRECATED in v2.3.0, use --fastp_options_string

Type: str


Flash Command

--flash_command

Help: DEPRECATED in v2.3.0, use --fastp_command

Type: str

Default: None


Fastp Command

--fastp_command

Help: Command to run fastp

Type: str

Default: fastp


Fastp Options String

--fastp_options_string

Help: Override options for fastp, e.g. --length_required 70 --umi

Type: str


Min Paired End Reads Overlap

--min_paired_end_reads_overlap

Help: Parameter for the fastp read merging step. Minimum required overlap length between two reads to provide a confident overlap

Type: int

Default: 10


Max Paired End Reads Overlap

--max_paired_end_reads_overlap

Help: DEPRECATED in v2.3.0

Type: str

Default: None


Samtools Exclude Flags

--samtools_exclude_flags

Help: Exclude reads with any of the specified flags set in the SAM/BAM file. Flags can be specified in either base 16 (hex) or base 10. Default is 4 (read unmapped).

Type: str

Default: 4


Stringent Flash Merging

--stringent_flash_merging

Help: DEPRECATED in v2.3.0

Type: bool

Default: False


Quantification Window Size

-w, --quantification_window_size, --window_around_sgrna

Help: Defines the size (in bp) of the quantification window extending from the position specified by the '--cleavage_offset' or '--quantification_window_center' parameter in relation to the provided guide RNA sequence(s) (--sgRNA). Mutations within this number of bp from the quantification window center are used in classifying reads as modified or unmodified. A value of 0 disables this window and indels in the entire amplicon are considered. Default is 1, 1bp on each side of the cleavage position for a total length of 2bp. Multiple quantification window sizes (corresponding to each guide specified by --guide_seq) can be specified with a comma-separated list.

Type: str

Default: 1


Quantification Window Center

-wc, --quantification_window_center, --cleavage_offset

Help: Center of quantification window to use within respect to the 3' end of the provided sgRNA sequence. Remember that the sgRNA sequence must be entered without the PAM. For cleaving nucleases, this is the predicted cleavage position. The default is -3 and is suitable for the Cas9 system. For alternate nucleases, other cleavage offsets may be appropriate, for example, if using Cpf1 this parameter would be set to 1. For base editors, this could be set to -17 to only include mutations near the 5' end of the sgRNA. Multiple quantification window centers (corresponding to each guide specified by --guide_seq) can be specified with a comma-separated list.

Type: str

Default: -3


Exclude bp From Left

--exclude_bp_from_left

Help: Exclude bp from the left side of the amplicon sequence for the quantification of the indels

Type: int

Default: 15


Exclude bp From Right

--exclude_bp_from_right

Help: Exclude bp from the right side of the amplicon sequence for the quantification of the indels

Type: int

Default: 15


Use Legacy Insertion Quantification

--use_legacy_insertion_quantification

Help: If set, the legacy insertion quantification method will be used (i.e. with a 1bp quantification window, indels at the cut site and 1bp away from the cut site would be quantified). By default (if this parameter is not set) with a 1bp quantification window, only insertions at the cut site will be quantified.

Type: bool

Default: False


Ignore Substitutions

--ignore_substitutions

Help: Ignore substitutions events for the quantification and visualization

Type: bool

Default: False


Ignore Insertions

--ignore_insertions

Help: Ignore insertions events for the quantification and visualization

Type: bool

Default: False


Ignore Deletions

--ignore_deletions

Help: Ignore deletions events for the quantification and visualization

Type: bool

Default: False


Discard Indel Reads

--discard_indel_reads

Help: Discard reads with indels in the quantification window from analysis

Type: bool

Default: False


Needleman Wunsch Gap Open

--needleman_wunsch_gap_open

Help: Gap open option for Needleman-Wunsch alignment

Type: int

Default: -20


Needleman Wunsch Gap Extend

--needleman_wunsch_gap_extend

Help: Gap extend option for Needleman-Wunsch alignment

Type: int

Default: -2


Needleman Wunsch Gap Incentive

--needleman_wunsch_gap_incentive

Help: Gap incentive value for inserting indels at cut sites

Type: int

Default: 1


Needleman Wunsch Alignment Matrix Location

--needleman_wunsch_aln_matrix_loc

Help: Location of the matrix specifying substitution scores in the NCBI format (see ftp://ftp.ncbi.nih.gov/blast/matrices/)

Type: str

Default: EDNAFULL


Plot Histogram Outliers

--plot_histogram_outliers

Help: If set, all values will be shown on histograms. By default (if unset), histogram ranges are limited to plotting data within the 99 percentile.

Type: bool

Default: False


Plot Window Size

--plot_window_size, --offset_around_cut_to_plot

Help: Defines the size of the window extending from the quantification window center to plot. Nucleotides within plot_window_size of the quantification_window_center for each guide are plotted.

Type: int

Default: 20


Min Frequency Alleles Around Cut To Plot

--min_frequency_alleles_around_cut_to_plot

Help: Minimum % reads required to report an allele in the alleles table plot.

Type: float

Default: 0.2


Expand Allele Plots By Quantification

--expand_allele_plots_by_quantification

Help: If set, alleles with different modifications in the quantification window (but not necessarily in the plotting window (e.g. for another sgRNA)) are plotted on separate lines, even though they may have the same apparent sequence. To force the allele plot and the allele table to be the same, set this parameter. If unset, all alleles with the same sequence will be collapsed into one row.

Type: bool

Default: False


Allele Plot Percentages Only for Assigned Reference

--allele_plot_pcts_only_for_assigned_reference

Help: If set, in the allele plots, the percentages will show the percentage as a percent of reads aligned to the assigned reference. Default behavior is to show percentage as a percent of all reads.

Type: bool

Default: False


Quantification Window Coordinates

-qwc, --quantification_window_coordinates

Help: Bp positions in the amplicon sequence specifying the quantification window. This parameter overrides values of the '--quantification_window_center', '--cleavage_offset', '--window_around_sgrna' or '--window_around_sgrna' values. Any indels/substitutions outside this window are excluded. Indexes are 0-based, meaning that the first nucleotide is position 0. Ranges are separted by the dash sign (e.g. 'start-stop'), and multiple ranges can be separated by the underscore (_) (can be comma-separated list of values, corresponding to amplicon sequences given in --amplicon_seq e.g. 5-10,5-10_20-30 would specify the 6th-11th bp in the first reference and the 6th-11th and 21st-31st bp in the second reference). A value of 0 disables this filter for a particular amplicon (e.g. 0,90-110 This would disable the quantification window for the first amplicon and specify the quantification window of 90-110 for the second).Note that if there are multiple amplicons provided, and only one quantification window coordinate is provided, the same quantification window will be used for all amplicons and be adjusted to account for insertions/deletions.(default: None)

Type: str


Annotate Wildtype Allele

--annotate_wildtype_allele

Help: Wildtype alleles in the allele table plots will be marked with this string (e.g. **).

Type: str


Keep Intermediate

--keep_intermediate

Help: Keep all the intermediate files

Type: bool

Default: False


Dump

--dump

Help: Dump numpy arrays and pandas dataframes to file for debugging purposes

Type: bool

Default: False


Write Detailed Allele Table

--write_detailed_allele_table

Help: If set, a detailed allele table will be written with the following columns:

  • #Reads: the number of reads this allele represents.
  • Aligned_Sequence: the alignment of the read sequence.
  • Reference_Sequence: the alignment of the amplicon sequence.
  • n_inserted: the number of insertions within the quantification window.
  • n_deleted: the number of deletions within the quantification window.
  • n_mutated: the number of substitutions within the quantification window.
  • Reference_Name: the amplicon name to which this allele is assigned.
  • Read_Status: the bin to which this allele is classified.
  • Aligned_Reference_Names: if there are multiple amplicons, this lists the amplicon names. The order corresponds to the alignment scores in Aligned_Reference_Scores.
  • Aligned_Reference_Scores: the alignment score (out of 100) for each amplicon.
  • ref_positions: this represents the indices in the Aligned_Sequence that map back to the original sequence. Negative values represent places that don't map back to the original reference.
  • all_insertion_positions: all of the indices where there is an insertion regardless of the quantification window.
  • all_insertion_left_positions: for all insertions, the left most index (e.g. where each insertion starts).
  • insertion_positions: the insertion positions within the quantification window.
  • insertion_coordinates: the start and end indices of the insertions within the quantificaiton window.
  • insertion_sizes: the size of each insertion within the quantification window.
  • all_deletion_positions: all of the indices where there is a deletion regardless of the quantification window.
  • deletion_positions: the indices where there is a deletion within the quantification window.
  • deletion_coordinates: the start and end indices of the deletions within the quantification window.
  • deletion_sizes: the size of the deletions within the quantification window.
  • all_substitution_positions: all of the indices where there is a substitution.
  • substitution_positions: the indices where there is a substitution within the quantification window.
  • substitution_values: the nucleotide to which it is substituted within the quantification window.
  • %Reads: the percentage of read this allele represents.

Type: bool

Default: False


Fastq Output

--fastq_output

Help: If set, a fastq file with annotations for each read will be produced.

Type: bool

Default: False


Bam Output

--bam_output

Help: If set, a bam file with alignments for each read will be produced.

Type: bool

Default: False


Bowtie2 Index

-x, --bowtie2_index

Help: Basename of Bowtie2 index for the reference genome

Type: str


Zip Output

--zip_output

Help: If set, the output will be placed in a zip folder.

Type: bool

Default: False


Max Rows Alleles Around Cut To Plot

--max_rows_alleles_around_cut_to_plot

Help: Maximum number of rows to report in the alleles table plot.

Type: int

Default: 50


Suppress Report

--suppress_report

Help: Suppress output report

Type: bool

Default: False


Place Report In Output Folder

--place_report_in_output_folder

Help: If true, report will be written inside the CRISPResso output folder. By default, the report will be written one directory up from the report output.

Type: bool

Default: False


Suppress Plots

--suppress_plots

Help: Suppress output plots

Type: bool

Default: False


Base Editor Output

--base_editor_output

Help: Outputs plots and tables to aid in analysis of base editor studies.

Type: bool

Default: False


Conversion Nuc From

--conversion_nuc_from

Help: For base editor plots, this is the nucleotide targeted by the base editor

Type: str

Default: C


Conversion Nuc To

--conversion_nuc_to

Help: For base editor plots, this is the nucleotide produced by the base editor

Type: str

Default: T


Prime Editing Spacer Sequence

--prime_editing_pegRNA_spacer_seq

Help: pegRNA spacer sgRNA sequence used in prime editing. The spacer should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the given sequence.

Type: str


Prime Editing Extension Sequence

--prime_editing_pegRNA_extension_seq

Help: Extension sequence used in prime editing. The sequence should be given in the RNA 5'->3' order, such that the sequence starts with the RT template including the edit, followed by the Primer-binding site (PBS).

Type: str


Prime Editing pegRNA Extension Quantification Window Size

--prime_editing_pegRNA_extension_quantification_window_size

Help: Quantification window size (in bp) at flap site for measuring modifications anchored at the right side of the extension sequence. Similar to the --quantification_window parameter, the total length of the quantification window will be 2x this parameter. Default: 5bp (10bp total window size)

Type: int

Default: 5


Prime Editing pegRNA Scaffold Sequence

--prime_editing_pegRNA_scaffold_seq

Help: If given, reads containing any of this scaffold sequence before extension sequence (provided by --prime_editing_extension_seq) will be classified as 'Scaffold-incorporated'. The sequence should be given in the 5'->3' order such that the RT template directly follows this sequence. A common value is 'GGCACCGAGUCGGUGC'.

Type: str


Prime Editing pegRNA Scaffold Min Match Length

--prime_editing_pegRNA_scaffold_min_match_length

Help: Minimum number of bases matching scaffold sequence for the read to be counted as 'Scaffold-incorporated'. If the scaffold sequence matches the reference sequence at the incorporation site, the minimum number of bases to match will be minimally increased (beyond this parameter) to disambiguate between prime-edited and scaffold-incorporated sequences.

Type: int

Default: 1


Prime Editing Nicking Guide Sequence

--prime_editing_nicking_guide_seq

Help: Nicking sgRNA sequence used in prime editing. The sgRNA should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the sequence

Type: str


Prime Editing Override Prime Edited Reference Sequence

--prime_editing_override_prime_edited_ref_seq

Help: If given, this sequence will be used as the prime-edited reference sequence. This may be useful if the prime-edited reference sequence has large indels or the algorithm cannot otherwise infer the correct reference sequence.

Type: str


Prime Editing Override Sequence Checks

--prime_editing_override_sequence_checks

Help: If set, checks to assert that the prime editing guides and extension sequence are in the proper orientation are not performed. This may be useful if the checks are failing inappropriately, but the user is confident that the sequences are correct.

Type: bool

Default: False


CRISPResso 1 Mode

--crispresso1_mode

Help: Parameter usage as in CRISPResso 1

Type: bool

Default: False


dsODN

--dsODN

Help: Label reads with the dsODN sequence provided

Type: str


Auto

--auto

Help: Infer amplicon sequence from most common reads

Type: bool

Default: False


Debug

--debug

Help: Show debug messages

Type: bool

Default: False


No Rerun

--no_rerun

Help: Don't rerun CRISPResso2 if a run using the same parameters has already been finished.

Type: bool

Default: False


Number of Processes

-p, --n_processes

Help: Specify the number of processes to use for analysis. Please use with caution since increasing this parameter will significantly increase the memory required to run CRISPResso. Can be set to 'max'.

Type: str

Default: 1


Skip Failed

--skip_failed

Help: Continue with batch analysis even if one sample fails

Type: bool

Default: False


CRISPResso Command

--crispresso_command

Help: CRISPResso command to call

Type: str

Default: CRISPResso


Gene Annotations

--gene_annotations

Help: Gene Annotation Table from UCSC Genome Browser Tables (http://genome.ucsc.edu/cgi-bin/hgTables?command=start), please select as table 'knownGene', as output format 'all fields from selected table' and as file returned 'gzip compressed'

Type: str


Bam File

-b, --bam_file

Help: WGS aligned bam file

Type: str

Default: bam filename


Region File

-f, --region_file

Help: Regions description file. A BED format file containing the regions to analyze, one per line. The REQUIRED columns are:

  • chr_id (chromosome name)
  • bpstart (start position)
  • bpend (end position)

The optional columns are:

  • name (an unique indentifier for the region)
  • guide_seq
  • expected_hdr_amplicon_seq
  • coding_seq See CRISPResso --help for more details on these last 3 parameters

Type: str


Reference File

-r, --reference_file

Help: A FASTA format reference file (for example hg19.fa for the human genome)

Type: str


Minimum Reads to Use Region

--min_reads_to_use_region

Help: Minimum number of reads that align to a region to perform the CRISPResso analysis for WGS

Type: float

Default: 10


Disable Guardrails

--disable_guardrails

Help: Disable guardrail warnings

Type: bool

Default: False


Use Matplotlib

--use_matplotlib

Help: Use matplotlib for plotting instead of plotly/d3 when CRISPRessoPro is installed

Type: bool

Default: False


Halt On Plot Fail

--halt_on_plot_fail

Help: Halt execution if a plot fails to generate

Type: bool

Default: False


CRISPRessoWGS Examples

Example:

Using Bioconda:

CRISPRessoWGS -b WGS/50/50_sorted_rmdup_fixed_groups.bam -f WGS_TEST.txt -r /GENOMES/mm9/mm9.fa --gene_annotations ensemble_mm9.txt.gz --name CRISPR_WGS_SRR1542350

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoWGS -b WGS/50/50_sorted_rmdup_fixed_groups.bam -f WGS_TEST.txt -r /GENOMES/mm9/mm9.fa --gene_annotations ensemble_mm9.txt.gz --name CRISPR_WGS_SRR1542350

The output from these files will consist of:

  1. REPORT_READS_ALIGNED_TO_SELECTED_REGIONS_WGS.txt: this file contains the same information provided in the input description file, plus some additional columns:

    a. sequence: sequence in the reference genome for the region specified.

    b. gene_overlapping: gene/s overlapping the region specified.

    c. n_reads: number of reads recovered for the region.

    d. bam_file_with_reads_in_region: file containing only the subset of the reads that overlap, also partially, with the region. This file is indexed and can be easily loaded for example on IGV for visualization of single reads or for the comparison of two conditions. For example, in the figure below (fig X) we show reads mapped to a region inside the coding sequence of the gene Crygc subjected to NHEJ (CRISPR_WGS_SRR1542350) vs reads from a control experiment (CONTROL_WGS_SRR1542349).

    e. fastq.gz_file_trimmed_reads_in_region: file containing only the subset of reads fully covering the specified regions, and trimmed to match the sequence in that region. These reads are used for the subsequent analysis with CRISPResso.

  2. ANALYZED_REGIONS (folder): this folder contains all the BAM and FASTQ files, one for each region analyzed.

  3. A set of folders with the CRISPResso report on the regions provided in input with enough reads (the default setting is to have at least 10 reads, but the parameter can be adjusted with the option

    --min_reads_to_use_region).

  4. CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.

This utility is particular useful to investigate and quantify mutation frequency in a list of potential target or off-target sites, coming for example from prediction tools, or from other orthogonal assays.

CRISPRessoCompare

CRISPRessoCompare is a utility for the comparison of a pair of CRISPResso analyses. CRISPRessoCompare produces a summary of differences between two conditions, for example a CRISPR treated and an untreated control sample (see figure below). Informative plots are generated showing the differences in editing rates and localization within the reference amplicon,

Example figure for CRISPRessoCompare

CRISPRessoCompare Inputs

To run CRISPRessoCompare you must provide:

  1. Two output folders generated with CRISPResso using the same reference amplicon and settings but on different datasets.
  2. Optionally a name for each condition to use for the plots, and the name of the output folder.

CRISPRessoCompare Parameters

CRISPRessoCompare Examples

CRISPRessoCompare Parameters

Sample Name

-n, --name

Help: Output name of the report (default: the name is obtained from the filename of the fastq file/s used in input)

Type: str


Output Folder

-o, --output_folder

Help: Output folder to use for the analysis (default: current folder)

Type: str


Verbosity

-v, --verbosity

Help: Verbosity level of output to the console (1-4) 4 is the most verbose

Type: int

Default: 3


Min Frequency Alleles Around Cut To Plot

--min_frequency_alleles_around_cut_to_plot

Help: Minimum % reads required to report an allele in the alleles table plot.

Type: float

Default: 0.2


Zip Output

--zip_output

Help: If set, the output will be placed in a zip folder.

Type: bool

Default: False


Max Rows Alleles Around Cut To Plot

--max_rows_alleles_around_cut_to_plot

Help: Maximum number of rows to report in the alleles table plot.

Type: int

Default: 50


Suppress Report

--suppress_report

Help: Suppress output report

Type: bool

Default: False


Place Report In Output Folder

--place_report_in_output_folder

Help: If true, report will be written inside the CRISPResso output folder. By default, the report will be written one directory up from the report output.

Type: bool

Default: False


Debug

--debug

Help: Show debug messages

Type: bool

Default: False


Crispresso Output Folder 1

crispresso_output_folder_1 (Positional Argument)

Help: First output folder with CRISPResso analysis

Type: str


Crispresso Output Folder 2

crispresso_output_folder_2 (Positional Argument)

Help: Second output folder with CRISPResso analysis

Type: str


Sample 1 Name

-n1, --sample_1_name

Help: Sample 1 name


Sample 2 Name

-n2, --sample_2_name

Help: Sample 2 name


Reported Qvalue Cutoff

--reported_qvalue_cutoff

Help: Q-value cutoff for significance in tests for differential editing. Each base position is tested (for insertions, deletions, substitutions, and all modifications) using Fisher's exact test, followed by Bonferroni correction. The number of bases with significance below this threshold in the quantification window are counted and reported in the output summary.

Type: float

Default: 0.05


Disable Guardrails

--disable_guardrails

Help: Disable guardrail warnings

Type: bool

Default: False


Use Matplotlib

--use_matplotlib

Help: Use matplotlib for plotting instead of plotly/d3 when CRISPRessoPro is installed

Type: bool

Default: False


Halt On Plot Fail

--halt_on_plot_fail

Help: Halt execution if a plot fails to generate

Type: bool

Default: False


CRISPRessoCompare Examples

Example:

Using Bioconda:

CRISPRessoCompare -n1 "VEGFA CRISPR" -n2 "VEGFA CONTROL"  -n VEGFA_Site_1_SRR10467_VS_SRR1046787 CRISPResso_on_VEGFA_Site_1_SRR1046762/ CRISPResso_on_VEGFA_Site_1_SRR1046787/

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoCompare -n1 "VEGFA CRISPR" -n2 "VEGFA CONTROL"  -n VEGFA_Site_1_SRR10467_VS_SRR1046787 CRISPResso_on_VEGFA_Site_1_SRR1046762/ CRISPResso_on_VEGFA_Site_1_SRR1046787/

The output will consist of:

  1. Comparison_Efficiency.pdf: a figure containing a comparison of the edit frequencies for each category (NHEJ, MIXED NHEJ-HDR and HDR) and as well the net effect subtracting the second sample (second folder in the command line) provided in the analysis from the first sample (first folder in the command line).
  2. Comparison_Combined_Insertion_Deletion_Substitution_Locations.pdf: a figure showing the average profile for the mutations for the two samples in the same scale and their difference with the same convention used in the previous figure (first sample – second sample).
  3. CRISPRessoCompare_significant_base_counts.txt: a text file reporting the number of bases for each amplicon and in the quantification window for each amplicon that were significantly enriched for Insertions, Deletions, and Substitutions, as well as All Modifications (Fisher's exact test, Bonferonni corrected p-values).
  4. CRISPRessoCompare_RUNNING_LOG.txt: detailed execution log.

CRISPRessoPooledWGSCompare

CRISPRessoPooledWGSCompare is an extension of the CRIPRessoCompare utility allowing the user to run and summarize multiple CRISPRessoCompare analyses where several regions are analyzed in two different conditions, as in the case of the CRISPRessoPooled or CRISPRessoWGS utilities.

To run CRISPRessoPooledWGSCompare you must provide:

  1. Two output folders generated with CRISPRessoPooled or CRISPRessoWGS using the same reference amplicon and settings but on different datasets.
  2. Optionally a name for each condition to use for the plots, and the name of the output folder

CRISPRessoPooledWGSCompare Examples

CRISPRessoPooledWGSCompare Examples

Example:

Using Bioconda:

CRISPRessoPooledWGSCompare CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762/ CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046787/ -n1 SRR1046762 -n2 SRR1046787 -n AMPLICONS_AND_GENOME_SRR1046762_VS_SRR1046787

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoPooledWGSCompare CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762/ CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046787/ -n1 SRR1046762 -n2 SRR1046787 -n AMPLICONS_AND_GENOME_SRR1046762_VS_SRR1046787

The output from these files will consist of:

  1. COMPARISON_SAMPLES_QUANTIFICATION_SUMMARIES.txt: this file contains a summary of the quantification for each of the two conditions for each region and their difference (read counts and percentages for the various classes: Unmodified, NHEJ, MIXED NHEJ-HDR and HDR).
  2. A set of folders with CRISPRessoCompare reports on the common regions with enough reads in both conditions.
  3. CRISPRessoPooledWGSCompare_significant_base_count_summary.txt: a text file summarizing for each sample and amplicon in both conditions the number of bases for each amplicon and in the quantification window for each amplicon that were significantly enriched for Insertions, Deletions, and Substitutions, as well as All Modifications (Fisher's exact test, Bonferonni corrected p-values).
  4. CRISPRessoPooledWGSCompare_RUNNING_LOG.txt: detailed execution log.

CRISPRessoAggregate

CRISPRessoAggregate is a tool to aggregate output from multiple CRISPResso runs. This is useful for aggregating runs that have been processed previously, including single CRISPResso runs, as well as runs from CRISPRessoBatch, CRISPRessoPooled and CRISPRessoWGS modes.

CRISPRessoAggregate Examples

CRISPRessoAggregate Examples

Example:

Using Bioconda:

CRISPRessoAggregate --name "VEGFA" --prefix CRISPRessoRuns/VEGFA/

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoAggregate --name "VEGFA" --prefix CRISPRessoRuns/VEGFA/

The output will consist of:

  1. CRISPResso2Aggregate_report.html: a html file containing links to all aggregated runs.
  2. CRISPRessoAggregate_amplicon_information.txt: A tab-separated file with a line for each amplicon that was found in any run. The 'Amplicon Name' column shows the unique name for this amplicon sequence. 'Number of sources' shows how many runs the amplicon was found in, and 'Amplicon sources' show which run folders the amplicon was found in, as well as the name of the amplicon in that run.
  3. CRISPRessoAggregate_mapping_statistics.txt: A tab-separated file showing the number of reads sequenced and mapped for each run.
  4. CRISPRessoAggregate_quantification_of_editing_frequency.txt: A tab-separated with the number of reads and edits for each run folder. Data from run folders with multiple amplicons show the sum totals for all amplicons.
  5. CRISPRessoAggregate_quantification_of_editing_frequency_by_amplicon.txt: A tab-separated file showing the number of reads and edits for each amplicon for each run folder. Data from run folders with multiple amplicons will appear on multiple lines, with one line per amplicon.

CRISPRessoPro

CRISPRessoPro is a Python package that extends the functionality of the CRISPResso suite of tools, offering additional features tailored for commercial users.

While CRISPResso is an open-source tool available for free academic use, it is maintained and supported by a team of full-time developers. This support is made possible through licenses purchased by for-profit companies using CRISPResso. Please see the license terms here.

Currently, CRISPRessoPro adds the following functionality:

  • Creation of new interactive plots in reports using d3.

  • Customization of plot colors

  • Modification of guardrail warning trigger cutoffs.

We are actively developing additional features for CRISPRessoPro and expect to release updates frequently to meet the evolving needs of our commercial partners.

CRISPRessoPro is available at no extra charge to all commercial users with a valid license. For more information on accessing CRISPRessoPro, please contact support@edilytics.com. For licensing inquiries, reach out to licensing@edilytics.com.

Customizable Colors and Guardrails

If CRISPRessoPro is installed, by default the colors and guardrails will remain the same as CRISPResso. To alter this, use the --config_file argument and a filepath to a .json file with the following format:

{
  "colors": {
    "Substitution": "#0000FF",
    "Insertion": "#008000",
    "Deletion": "#FF0000",
    "A": "#7FC97F",
    "T": "#BEAED4",
    "C": "#FDC086",
    "G": "#FFFF99",
    "N": "#C8C8C8",
    "-": "#1E1E1E"
  },
  "guardrails": {
    "min_total_reads": 10000,
    "aligned_cutoff": 0.9,
    "alternate_alignment": 0.3,
    "min_ratio_of_mods_in_to_out": 0.01,
    "modifications_at_ends": 0.01,
    "outside_window_max_sub_rate": 0.002,
    "max_rate_of_subs": 0.3,
    "guide_len": 19,
    "amplicon_len": 50,
    "amplicon_to_read_length": 1.5
  }
}

Above are the default values as an example, change the values as desired to any color or guardrail specification.

Troubleshooting

Please check that your input file(s) are in FASTQ format (compressed fastq.gz also accepted).

If you get an empty report, please double check that your amplicon sequence is correct and in the correct orientation. It can be helpful to inspect the first few lines of your FASTQ file - the start of the amplicon sequence should match the start of your sequences. If not, check to see if the files are trimmed (see point below).

It is important to determine whether your reads are trimmed or not. CRISPResso2 assumes that the reads ARE ALREADY TRIMMED! If reads are not already trimmed, select the adapters used for trimming under the ‘Trimming Adapter’ heading under the ‘Optional Parameters’. This is FUNDAMENTAL to CRISPResso analysis. Failure to trim adaptors may result in false positives. This will result in a report where you will observe an unrealistic 100% modified alleles and a sharp peak at the edges of the reference amplicon in figure 4.

The quality filter assumes that your reads uses the Phred33 scale, and it should be adjusted for each user’s specific application. A reasonable value for this parameter is 30.

If your amplicon sequence is longer than your sequenced read length, the R1 and R2 reads should overlap by at least 10bp. For example, if you sequence using 150bp reads, the maximum amplicon length should be 290 bp.

Especially in repetitive regions, multiple alignments may have the best score. If you want to investigate alternate best-scoring alignments, you can view all alignments using this tool: http://rna.informatik.uni-freiburg.de/Teaching/index.jsp?toolName=Gotoh. As input, sequences from the 'Alleles_frequency_table.txt' can be used. Specifically, for a given row, the value in the 'Aligned_Sequence' should be entered into the 'Sequence a' box after removing any dashes, and the value in the 'Reference_Sequence' should be entered into the 'Sequence b' box after removing any dashes. The alternate alignments can be selected in the 'Results' panel in the Output section.

Cite

For more on how CRISPResso works read the freely available published paper here.

If you like CRISPResso please support us by citing it in your work:

Clement K, Rees H, Canver MC, Gehrke JM, Farouni R, Hsu JY, Cole MA, Liu DR, Joung JK, Bauer DE, Pinello L.
CRISPResso2 provides accurate and rapid genome editing sequence analysis.
Nat Biotechnol. 2019 Mar; 37(3):224-226. doi: 10.1038/s41587-019-0032-3. PubMed PMID: 30809026.

@article{clement2019crispresso2,
  title={CRISPResso2 provides accurate and rapid genome editing sequence analysis},
  author={Clement, Kendell and Rees, Holly and Canver, Matthew C and Gehrke, Jason M and Farouni, Rick and Hsu, Jonathan Y and Cole, Mitchel A and Liu, David R and Joung, J Keith and Bauer, Daniel E and others},
  journal={Nature biotechnology},
  volume={37},
  number={3},
  pages={224--226},
  year={2019},
  publisher={Nature Publishing Group US New York}
}
Pinello L, Canver MC, Hoban MD, Orkin SH, Kohn DB, Bauer DE, Yuan GC.
Analyzing CRISPR genome-editing experiments with CRISPResso.
Nature biotechnology. 2016 Jul;34(7):695-7.

@article{pinello2016analyzing,
  title={Analyzing CRISPR genome-editing experiments with CRISPResso},
  author={Pinello, Luca and Canver, Matthew C and Hoban, Megan D and Orkin, Stuart H and Kohn, Donald B and Bauer, Daniel E and Yuan, Guo-Cheng},
  journal={Nature biotechnology},
  volume={34},
  number={7},
  pages={695--697},
  year={2016},
  publisher={Nature Publishing Group US New York}
}

License

CRISPResso2 is made available for free to academic researchers under this limited license for non-commercial use.

IMPORTANT: If you plan to use the CRISPResso2 for-profit, you will need to purchase a license. Please contact licensing@edilytics.com for more information.

CRISPResso2 END USER LICENSE AGREEMENT

BEFORE PROCEEDING, PLEASE READ THE END USER LICENSE AGREEMENT BELOW.

BY USING THIS SOFTWARE TOOL YOU ATTEST TO (I) BEING AN ACADEMIC RESEARCHER, (II) USING IT SOLELY FOR RESEARCH PURPOSES AND (III) YOUR ACCEPTANCE OF THE END USER LICENSE AGREEMENT.

  1. General. As used herein, the term “you” or “your” means any individual or entity accessing this site or using the software tool “CRISPResso2” (the “Software Tool”) pursuant to this End-User License Agreement (“EULA”).

  2. License to Use. The Software Tool is free for your use subject to the terms and conditions set forth below. The General Hospital Corporation, dba Massachusetts General Hospital (“MGH”) reserves the right to change, from time to time and at its sole discretion, this EULA. Your continued use of the Software Tool after any such modification constitutes your agreement and acceptance of such changes.

MGH owns all right, title and interest in the Software Tool. MGH grants to you, the “Licensee,” a royalty-free, non-exclusive, non-transferable, revocable license to use the Software Tool for non-commercial research or academic purposes only; it is NOT made available here as a free tool or download for any commercial or clinical use. You may not copy or distribute the Software Tool in any form. This license is limited to the individual that accesses the Software Tool. No right to sublicense or assign this EULA is granted herein.

The Software Tool optionally makes calls to unmodified versions of fastp https://github.com/OpenGene/fastp software, which is covered under its own license (MIT).

By using this Software Tool, you agree to allow MGH the right to collect data and statistics (i) on system usage patterns and (ii) to improve this Software Tool.

  1. Limitations on Use. THE SOFTWARE TOOL HAS NOT BEEN REGISTERED OR APPROVED BY THE U.S. FOOD AND DRUG AGENCY, OR ANY OTHER GOVERNMENTAL AGENCY. THE SOFTWARE TOOL MAY BE USED ONLY AS A REFERENCE TOOL AND FOR CLINICAL EDUCATION, SIMILAR TO THE USE OF A TEXTBOOK OR A JOURNAL ARTICLE. THE SOFTWARE TOOL SHALL NOT BE USED AS A DIAGNOSTIC DECISION MAKING SYSTEM AND MUST NOT BE USED TO MAKE A CLINICAL DIAGNOSIS OR REPLACE OR OVERRULE A LICENSED HEALTH CARE PROFESSIONAL'S JUDGMENT OR CLINICAL DIAGNOSIS.

  2. Disclaimer of Warranties. TO THE FULLEST EXTENT PERMITTED BY LAW, MGH PROVIDES THE SOFTWARE TOOL "AS IS" AND “AS AVAILABLE” WITH ALL FAULTS, ERRORS AND DEFECTS, AND NEITHER MGH NOR ANY OF ITS PERSONNEL NOR ANY OF ITS AFFILIATES IS RESPONSIBLE FOR ENSURING THAT ANY USE OF SOFTWARE TOOL WILL BE CLINICALLY SOUND, WITHOUT ERROR, UNINTERRUPTED OR OTHERWISE SUCCESSFUL. THE RIGHTS GRANTED IN THIS EULA ARE MADE AVAILABLE WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT.

  3. Limitation of Liability. TO THE FULLEST EXTENT PERMITTED BY LAW, MGH SHALL NOT BE LIABLE TO YOU FOR ANY INDIRECT, INCIDENTAL, SPECIAL OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT WITHOUT LIMITATION, ANY DAMAGES RESULTING FROM LOSS OF USE OR LOST BUSINESS, REVENUE, PROFITS, DATA OR GOODWILL) ARISING IN CONNECTION WITH YOUR USE OF THE SOFTWARE TOOL OR OTHERWISE, WHETHER IN AN ACTION IN CONTRACT, TORT, STRICT LIABILITY, NEGLIGENCE OR OTHERWISE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

  4. Indemnification. You agree to defend, indemnify and hold harmless MGH and its affiliates, trustees, officers, employees, staff members, agents or contractors from and against any claim, charge, demand, action or suit, whether in contract, tort, strict liability, negligence or otherwise, for any and all losses, costs, charges, claims, demands, fees, expenses or damages of any nature or kind arising out of, connected with or resulting from (i) the use of the Software Tool by you, your affiliates, employees, staff, faculty, students, agents or (ii) relating in any way to this EULA.

In consideration of MGH providing access to the Software Tool free of charge, you agree not to bring any claim, lawsuit, or action (“Claim”) for any damages, costs, liabilities, settlement amounts and/or expenses (including attorneys’ fees) against MGH or its affiliates, trustees, officers, employees, staff members, agents or contractors arising out of or related to your use of the Software Tool.

  1. No Other Rights. You do not have the right to use the name, trademark, service mark, logo or other identifying characteristics of MGH or any of its affiliates or employees. All rights not expressly granted herein are reserved by MGH.

MGH may terminate your access to and use of the Software Tool at any time, with or without notice, for any reason or for no reason at all.

  1. Governing Law. The construction and performance of this EULA will be governed by the laws of the Commonwealth of Massachusetts, without regard to conflicts of laws principles.

  2. Entire Agreement. This EULA sets forth all of the covenants, provisions, agreements, conditions, and understandings between the parties regarding the subject matter herein, and there are no covenants, promises, agreements, conditions, or understandings, either oral or written, between them other than those set forth herein.

Should you have any concerns regarding this EULA contact us at licensing@edilytics.com.