Q1: What is Champuru?
Champuru is an interactive, user-friendly web software
that facilitates the deconvolution of mixed chromatogram sequence data
in the simplest case of a mixture of two sequences of unequal lengths.
It takes as input two strings of characters describing the forward and
reverse chromatograms as obtained by direct sequencing, and returns,
most often after several iterations aiming at correcting basecalling
errors, the two sequences present in the mixture. This provides a
cheap, fast and reliable alternative to cloning that works best
whenever sequences differ in length only by a small insertion/deletion.
Q2: What does "Champuru" means?
In the dialect of Okinawa, "champuru" means "to mix up";
it is also the name of a popular local dish, a sautéed mixture of
diverse ingredients.
Q3: How does it work?
After the user hits the "Submit" button on the web form,
Champuru starts by computing, for all possible alignments of the two
sequences provided, a compatibility score reflecting the number of
compatible positions in the alignment: for instance, W (A or T) is
compatible with A, T, K (G or T), D (A, T or G), but not with S (C or
G). Champuru then finds the two best alignments, and combines the two
sequences in each alignment into strict consensus sequences (discarding
nucleotide positions outside the zone of overlap). As a result of this
computing, a web page is generated that displays the three best
compatibility scores obtained, along with the two consensus sequences.
Along with each compatibility score, the user is provided
with a number representing the offset between the forward and reverse
sequences in the corresponding alignment, expressed as the position of
the first base of the forward chromatogram by reference to the first
base of the reverse chromatogram. If incompatibilities between the
forward and reverse base sequences are detected, they are represented
as underscores ("_") in the consensus sequences and the user is invited
to check his input data for basecalling errors (most often undetected
double peaks) before resubmitting corrected sequences. As a final
verification step, Champuru simulates the result of sequencing
directly, in the forward and reverse directions, a mixture of the two
consensus sequences it has reconstructed, and compares it with the
actual forward and reverse input sequences. If some peaks represented
in the input sequences are not found in the superposition of the two
consensus sequences, the user is invited again to check his input data
(usually for spurious double-base callings of single peaks, but also
sometimes for undetected double peaks in other positions)Along with each compatibility score, the user is provided
with a number representing the offset between the forward and reverse
sequences in the corresponding alignment, expressed as the position of
the first base of the forward chromatogram by reference to the first
base of the reverse chromatogram. If incompatibilities between the
forward and reverse base sequences are detected, they are represented
as underscores ("_") in the consensus sequences and the user is invited
to check his input data for basecalling errors (most often undetected
double peaks) before resubmitting corrected sequences. As a final
verification step, Champuru simulates the result of sequencing
directly, in the forward and reverse directions, a mixture of the two
consensus sequences it has reconstructed, and compares it with the
actual forward and reverse input sequences. If some peaks represented
in the input sequences are not found in the superposition of the two
consensus sequences, the user is invited again to check his input data
(usually for spurious double-base callings of single peaks, but also
sometimes for undetected double peaks in other positions).
Q4: How are sequences entered and/or exported?
Champuru takes as input two strings of characters
describing the forward and reverse chromatograms as obtained by direct
sequencing. It is essential, however, that double peaks be represented
by their corresponding one-letter codes. The one-letter codes for three
nucleotides (such as H for A, C or T) are also accepted to represent
eventual triple peaks. Most chromatogram analysis programs, such as
Phred, Sequencing Analysis 5.2 (Applied Biosystems) and CEQ 2000XL DNA
Analysis System (Beckman Coulter), are capable of detecting and
analyzing double peaks, and commercial sequencing companies routinely
provide such reanalyzed sequences upon request. Moreover, sequence
alignment programs such as Sequencher (Gene Codes) can also be used to
call secondary peaks and produce chromatogram descriptions that meet
Champuru's requirements. As all automatic basecalling programs make
mistakes, especially when dealing with long stretches of double peaks,
it is a good idea to check visually for forgotten peaks and other
errors at this stage; to ensure efficient haplotype reconstruction,
low-quality trace data found at the beginning and/or at the end of a
chromatogram should also be discarded.
Input sequences should be copy/pasted into the
corresponding fields of the web interface (example sequences are
provided on the webpage). As most users will prefer to use Champuru
while displaying chromatogram alignments in another software, the
default option considers that the reverse sequence is entered as it is
when aligned with the forward sequence, i.e. as the reverse complement
of the sequence obtained from direct sequencing. If this is not the
case, the checkbox "Reverse-complement reverse sequence" should be
ticked.
Output sequences can be exported by copying and pasting
them from the webpage into other applications, such as alignment
softwares. Alternatively, a FASTA file containing both output sequences
is automatically generated to facilitate downstream analyses.
Q5: Are there any known issues?
Champuru works best when dealing with mixtures of
sequences that are closely similar to each other, such as homologous
copies of a gene whose lengths differ only by a small
insertion/deletion. When sequences are very divergent, differences in
electrophoretic mobility among bases may cause misalignment of the
peaks or even complete phase shift of one or several nucleotides. In
such situation, one cannot summarize input chromatograms as strings of
IUB code letters and Champuru cannot be used.
Q6: Champuru keeps telling me that there are some incompatible positions but I cannot find them. What can I do?
To locate incompatible positions, use the "Find" command of your web
browser and look for underscore ("_") on the Champuru output webpage
(this is the character that Champuru inserts in the consensus whenever
there is an incompatibility between the two sequences it is trying to
align). Then copy the sequence adjacent to the incompatibility and
paste it in the "Find bases" window of Sequencher in order to be taken
directly to the corresponding positions in the chromatograms (you may
have to tick the "Any ambiguous base" option in the "Find bases" window
in order for Sequencher to be able to locate the consensus bases).
Q7: How should I cite Champuru?
Flot et al. (2006)
Phase determination from direct sequencing of length-variable DNA regions. Molecular Ecology Notes 6 (3), 627-630
link
Flot (2007)
Champuru 1.0: a computer software for unraveling mixtures of two DNA sequences of unequal lengths. Molecular Ecology Notes 7 (6), 974-977
link
For instance: Phase determination in length variant heterozygotes
was performed by direct sequencing (Flot et al., 2006) with the help of
Champuru 2 (Flot 2007) (Spöri & Flot, in prep.; available online at https://eeg-ebe.github.io/Champuru/)