QuickStart Guide¶
Installation¶
To use pdp
you will need to install it on your machine (laptop, desktop, server, or cluster). Installation is easiest when using one of the two popular Python package managers:
2. PyPI
¶
pdp
is available via the PyPI package manager for Python:
pip install pdp
Tip
pdp
can also be installed directly from source. More detailed installation instructions can be found on the Installation Guide page.
pdp
Walkthrough for qPCR diagnostic design¶
An example qPCR primer set design using pdp
is provided as a walkthrough below. The general procedure for any pdp
analysis is:
- Create a configuration file that describes which input sequences will be used for design, the paths to the sequence files, and labels describing which diagnostic classes or groups the sequences belong to
- Fix the input sequences (if necessary) by stitching multiple contigs together and removing ambiguity symbols
- Design primers to each input sequence
- Deduplicate primer designs (this reduces computation time)
- Screen primers against a
BLAST
orDIAMOND
database of off-target sequences - Perform in silico cross-hybridisation of primers against each input sequence
- Classify each primer set by specificity to the input sequence group labels, to generate candidate diagnostic primer sets
Tip
To see options available for the pdp
program, or any subcommand, use the -h
or --help
option, e.g.:
pdp -help
pdp config -h
1. Create configuration file¶
We will work with three bacterial genomes in the tests/walkthrough/sequences
subdirectory of the find_differential_primers
repository. These genomes represent three different Pectobacterium species, and are provided as .fasta
files containing complete genome sequences. We can see these files using the ls
command:
$ ls tests/walkthrough/sequences/
GCF_000011605.1.fasta GCF_000291725.1.fasta GCF_000749845.1.fasta
To use these files in our analysis, we need to construct a configuration file. In this quick walkthrough will use a pre-prepared configuration file at the location tests/walkthrough/pectoconf.tab
. The contents are shown below (note the extensive commenting, with lines preceded by a hash/octothorpe).
$ cat tests/walkthrough/pectoconf.tab
# Pectobacterium genomes downloaded from GenBank/NCBI; genomovars inferred from ANIm
# Annotated Pba: genomovar 1
Pba_SCRI1043 Pectobacterium,atrosepticum_NCBI,gv1 tests/walkthrough/sequences/GCF_000011605.1.fasta -
# Annotated Pwa: genomovars 2, 3
Pwa_CFBP_3304 Pectobacterium,wasabiae_NCBI,gv2 tests/walkthrough/sequences/GCF_000291725.1.fasta -
# Annotated Pb : genomovar 7
Pbe_NCPPB_2795 Pectobacterium,betavasculorum_NCBI,gv7 tests/walkthrough/sequences/GCF_000749845.1.fasta -
The first line describing an input sequence tells us that its name is Pba_SCRI1043
, that it belongs to classes/groups Pectobacterium
, atrosepticum_NCBI
, and gv1
, and that the sequence’s FASTA file can be found at tests/walkthrough/sequences/GCF_000011605.1.fasta
. There are no features associated with the sequence.
Validate the configuration file¶
To confirm that the configuration file can be used in the rest of the design process, use the command pdp config --validate
on that file:
$ pdp config --validate tests/walkthrough/pectoconf.tab
WARNING: Validation problems
Pbe_NCPPB_2795 requires stitch (tests/walkthrough/sequences/GCF_000749845.1.fasta)
Pwa_CFBP_3304 requires stitch (tests/walkthrough/sequences/GCF_000291725.1.fasta)
Pwa_CFBP_3304 has non-N ambiguities (tests/walkthrough/sequences/GCF_000291725.1.fasta)
2. Prepare the input sequences¶
Attention
To generate diagnostic primers and metabarcoding markers, the input sequences must each be “stitched” so that there is only a single contiguous sequence corresponding to each input file. Also, any IUPAC ambiguity symbols (e.g. W, Y, etc.) must be replaced with N.
pdp config --fix_sequences tests/walkthrough/fixed.json tests/walkthrough/pectoconf.tab
3. Design primers to each input sequence¶
Now we can design primer sets against each input sequence, using the EMBOSS package ePrimer3
.
pdp eprimer3 --outdir tests/walkthrough/eprimer3 \
tests/walkthrough/fixed.json \
tests/walkthrough/with_primers.json
The new tests/walkthrough/eprimer3
directory now contains files describing primers designed to each input sequence, and corresponding JSON
files describing the primer sets.
$ tree tests/walkthrough/eprimer3/
tests/walkthrough/eprimer3/
├── GCF_000011605.1.eprimer3
├── GCF_000011605.1_named.eprimer3
├── GCF_000011605.1_named.json
├── GCF_000291725.1_concat_noambig.eprimer3
├── GCF_000291725.1_concat_noambig_named.eprimer3
├── GCF_000291725.1_concat_noambig_named.json
├── GCF_000749845.1_concat.eprimer3
├── GCF_000749845.1_concat_named.eprimer3
└── GCF_000749845.1_concat_named.json
4. Deduplicate primer sets (optional)¶
Attention
This step is recommended, but not necessary, when designing diagnostic primer sets
When designing primers to groups of closely-related genomes, it is usual to have a large number of identical primer sets that originate from different genomes. We only need to test one of these redundant primer sets to know whether it may be diagnostically useful, so we can remove duplicates with the pdp dedupe
command:
pdp dedupe --dedupedir tests/walkthrough/deduped \
tests/walkthrough/with_primers.json \
tests/walkthrough/deduped_primers.json
The complete set of nonredundant primers is written to tests/walkthrough/deduped
, and a new JSON
configuration file recording only the deduplicated primers for each input sequence is written to deduped_primers.json
.
5. Screen primers against a local sequence database (optional)¶
Attention
This step is recommended, but not necessary, when designing diagnostic primer sets
Prescreening the primers we have just designed against a local database of off-target sequences allows us to remove primer sets that do not specifically amplify our input sequences without having to perform computationally costly in silico cross-hybridisation.
pdp blastscreen --db tests/walkthrough/blastdb/e_coli_screen.fna \
--outdir tests/walkthrough/blastn \
tests/walkthrough/deduped_primers.json \
tests/walkthrough/screened.json
6. Perform in silico cross-hybridisation¶
This is the critical step in determining the predicted diagnostic specificity of the candidate primer sets. Each candidate primer set is tested in turn against all the input sequences to determine whether it has the potential to amplify that sequence. This is the most computationally-demanding step of the analysis.
pdp primersearch \
--outdir tests/walkthrough/primersearch \
tests/walkthrough/screened.json \
tests/walkthrough/primersearch.json
7. Classify primer sets by specificity¶
The final step in determining qPCR primer set specificity is to analyse the in silico hybridisation results to determing which primer sets amplify exactly the members of each class/group defined in the initial configuration file.
pdp classify \
tests/walkthrough/primersearch.json \
tests/walkthrough/classify
The output directory contains .json
and .ePrimer3
format files for each set of candidate primers that were determined to be specific to a class/group named in the initial configuration file, and two summary files (results.json
and summary.tab
):
$ tree tests/walkthrough/classify/
tests/walkthrough/classify/
├── Pectobacterium_primers.ePrimer3
├── Pectobacterium_primers.json
├── atrosepticum_NCBI_primers.ePrimer3
├── atrosepticum_NCBI_primers.json
├── betavasculorum_NCBI_primers.ePrimer3
├── betavasculorum_NCBI_primers.json
├── gv1_primers.ePrimer3
├── gv1_primers.json
├── gv2_primers.ePrimer3
├── gv2_primers.json
├── gv7_primers.ePrimer3
├── gv7_primers.json
├── results.json
├── summary.tab
├── wasabiae_NCBI_primers.ePrimer3
└── wasabiae_NCBI_primers.json
The summary.tab
file is a tab-separated plain text file that describes how many primer sets were determined to potentially be diagnostic for each input class, and describes a path to the JSON
file describing their results:
$ cat tests/walkthrough/classify/summary.tab
Group NumPrimers Primers
Pectobacterium 4 tests/walkthrough/classify/Pectobacterium_primers.json
atrosepticum_NCBI 1 tests/walkthrough/classify/atrosepticum_NCBI_primers.json
betavasculorum_NCBI 2 tests/walkthrough/classify/betavasculorum_NCBI_primers.json
gv1 1 tests/walkthrough/classify/gv1_primers.json
gv2 2 tests/walkthrough/classify/gv2_primers.json
gv7 2 tests/walkthrough/classify/gv7_primers.json
wasabiae_NCBI 2 tests/walkthrough/classify/wasabiae_NCBI_primers.json