Capture-C

design.Capture

Description

The image below shows a schematic of how the Capture pipeline designs oligos adjacent to the first restriction site of a specified restriction enzyme (DpnII in this example), on the left- and right-hand sides. In this case the user has supplied viewpoint coordinates at chr2:5500000-5500001 and chr5:63223000-63223001.

_images/capture_oligo_gen.png

Schematic of oligo design by Capture

It is possible for the designed oligos to overlap, up to within 1bp of each other, when the restriction fragment is less than twice the length of the oligo. If the viewpoint coordinate is in a fragment with length less than the specified oligo length, no oligos will be generated for that fragment. If the fragment length exactly equals the oligo length, only one oligo will be generated.

Note

For full functionality, Capture should be run from the command line in order to test the efficiency of the generated oligos. This involves a pipeline that incorporates methods from the Tools class.

Usage

When run from the command line, oligo capture takes the following parameters:

-h, --help

(flag) Show this help message and exit

-f <reference fasta>, --fasta <reference fasta>

(str) The path to the reference genome fasta

-g <genome>, --genome <genome>

({mm9, mm10, hg18, hg19, hg38}) The name of the genome build

-b <bed file>, --bed <bed file>

(str) The path to the bed file containing the capture viewpoint coordinates

-o <oligo length>, --oligo <oligo length>

(int, optional) The length (bp) of the oligos to design, default=70

-e <enzyme>, --enzyme <enzyme>

({DpnII, NlaIII, HindIII}, optional) Name of the restriction enzyme to be used for fragment digestion, default=DpnII

-s <STAR index>, --star_index <STAR index>

(str) The path to the STAR index directory; omit this option if running with BLAT (--blat)

--blat

(flag) Detect off-target binding using BLAT instead of STAR

Examples

Below are examples using the Capture pipeline for different scenarios

50bp oligos for NlaIII fragments in hg19 build, using STAR to check off-target binding
$ python -m oligo -cfg ./config.txt capture -f ~/hg19/Sequence/genome.fa -g hg19 -b viewpoints.bed -o 50 -e NlaIII -s ~/hg19/STAR/
70bp oligos for HindIII fragments in mm10 build, using BLAT to check off-target binding
$ python -m oligo -cfg ./config.txt capture -f ~/mm10/Sequence/genome.fa -g mm10 -b mouse_viewpoints.bed -e HindIII --blat

Specifics

Bed file (-b, --bed)

A 4-column, tab-delimited bed file containing the coordinates and names of the viewpoints you want to capture from. This must be in the format chr, start, stop, viewpoint_name (the file should not have a header row). Typically, the coordinates each span 1bp as shown below:

chr7

20205

20206

geneX

chr8

1310000

1310001

geneY

Caution

Names in the last column should be unique so that the oligos can be unambiguosuly linked back to a named viewpoint.

Restriction enzyme (-e, --enzyme)

The restriction enzyme being used in the Capture-C experiment. This determines the recogition sequence used to define the fragment boundaries and hence the starts and ends of the oligos. The current version supports DpnII (GATC), NlaIII (CATG) and HindIII (AAGCTT). If this option is omitted, DpnII will be used by default.

STAR (-s, --star_index) or BLAT (--blat)

To check for off-target binding, either the sequence aligner STAR or the BLAST-like Alignment Tool (BLAT) can be used. By default, STAR is used, unless design.py Capture is run with the --blat flag. Since BLAT is more widely used to detect off-target binding events, this might be preferred by the user. However, BLAT can be particulary slow for large designs, especially for the human reference genomes. STAR’s exceptional speed is better suited for designs with >500 viewpoints (1000 oligos). If the --blat flag is not selected, the path to the STAR index must be supplied after the -s (or --star_index) flag.

API

As well as being run as a full pipeline from the command line, the design classes have been written such that the individual methods can be easily run in a python shell. The Capture pipeline implements methods from design.Capture. The following examples show the order in which the class methods are implemented:

Create an instance of the Capture class
>>> from oligo.design import Capture
>>> c = Capture(genome='mm9', fa='mm9_genome.fa', config_path="config.txt")
Loading reference fasta file...
    ...complete
Generate oligos and write to fasta file
>>> c.gen_oligos(bed='capture_sites.bed').write_oligos()
Generating oligos...
    ...complete.
Oligos stored in the oligo_seqs attribute
Wrote oligos to oligo_seqs.fa
Check for repeats in oligo sequences and align oligos to genome
>>> c.detect_repeats().align_to_genome(s_idx='/mm9/STAR')
Checking for repeat sequences in oligos, with RepeatMasker...
    ...complete. Output written to oligo_seqs.fa.out
Aligning oligos to the genome, with STAR...
    ...complete. Output written to oligos_Aligned.out.sam
Calculate longest repeat length and number of off-target alignments for each oligo; write to oligo_info.txt
>>> c.extract_repeats().calculate_density().write_oligo_info()
Repeat scores calculated
Density scores calculated
Oligo information written to oligo_info.txt

See design.Capture for more detailed information

Top of Page