Champuru

Flot (2007) Champuru 1.0: a computer software for unraveling mixtures of two DNA sequences of unequal lengths. Molecular Ecology Notes 7 (6), 974-977 [link]

Frequently Asked Questions

Q1: What is Champuru?
Champuru is an interactive, user-friendly web software that facilitates the deconvolution of mixed chromatogram sequence data in the simplest case of a mixture of two sequences of unequal lengths. It takes as input two strings of characters describing the forward and reverse chromatograms as obtained by direct sequencing, and returns, most often after several iterations aiming at correcting basecalling errors, the two sequences present in the mixture. This provides a cheap, fast and reliable alternative to cloning that works best whenever sequences differ in length only by a small insertion/deletion.
Q2: What does "Champuru" means?
In the dialect of Okinawa, "champuru" means "to mix up"; it is also the name of a popular local dish, a sautéed mixture of diverse ingredients.
Q3: How does it work?
After the user hits the "Submit" button on the web form, Champuru starts by computing, for all possible alignments of the two sequences provided, a compatibility score reflecting the number of compatible positions in the alignment: for instance, W (A or T) is compatible with A, T, K (G or T), D (A, T or G), but not with S (C or G). Champuru then finds the two best alignments, and combines the two sequences in each alignment into strict consensus sequences (discarding nucleotide positions outside the zone of overlap). As a result of this computing, a web page is generated that displays the three best compatibility scores obtained, along with the two consensus sequences.

Along with each compatibility score, the user is provided with a number representing the offset between the forward and reverse sequences in the corresponding alignment, expressed as the position of the first base of the forward chromatogram by reference to the first base of the reverse chromatogram. If incompatibilities between the forward and reverse base sequences are detected, they are represented as underscores ("_") in the consensus sequences and the user is invited to check his input data for basecalling errors (most often undetected double peaks) before resubmitting corrected sequences. As a final verification step, Champuru simulates the result of sequencing directly, in the forward and reverse directions, a mixture of the two consensus sequences it has reconstructed, and compares it with the actual forward and reverse input sequences. If some peaks represented in the input sequences are not found in the superposition of the two consensus sequences, the user is invited again to check his input data (usually for spurious double-base callings of single peaks, but also sometimes for undetected double peaks in other positions)Along with each compatibility score, the user is provided with a number representing the offset between the forward and reverse sequences in the corresponding alignment, expressed as the position of the first base of the forward chromatogram by reference to the first base of the reverse chromatogram. If incompatibilities between the forward and reverse base sequences are detected, they are represented as underscores ("_") in the consensus sequences and the user is invited to check his input data for basecalling errors (most often undetected double peaks) before resubmitting corrected sequences. As a final verification step, Champuru simulates the result of sequencing directly, in the forward and reverse directions, a mixture of the two consensus sequences it has reconstructed, and compares it with the actual forward and reverse input sequences. If some peaks represented in the input sequences are not found in the superposition of the two consensus sequences, the user is invited again to check his input data (usually for spurious double-base callings of single peaks, but also sometimes for undetected double peaks in other positions).
Q4: How are sequences entered and/or exported?
Champuru takes as input two strings of characters describing the forward and reverse chromatograms as obtained by direct sequencing. It is essential, however, that double peaks be represented by their corresponding one-letter codes. The one-letter codes for three nucleotides (such as H for A, C or T) are also accepted to represent eventual triple peaks. Most chromatogram analysis programs, such as Phred, Sequencing Analysis 5.2 (Applied Biosystems) and CEQ 2000XL DNA Analysis System (Beckman Coulter), are capable of detecting and analyzing double peaks, and commercial sequencing companies routinely provide such reanalyzed sequences upon request. Moreover, sequence alignment programs such as Sequencher (Gene Codes) can also be used to call secondary peaks and produce chromatogram descriptions that meet Champuru's requirements. As all automatic basecalling programs make mistakes, especially when dealing with long stretches of double peaks, it is a good idea to check visually for forgotten peaks and other errors at this stage; to ensure efficient haplotype reconstruction, low-quality trace data found at the beginning and/or at the end of a chromatogram should also be discarded.

Input sequences should be copy/pasted into the corresponding fields of the web interface (example sequences are provided on the webpage). As most users will prefer to use Champuru while displaying chromatogram alignments in another software, the default option considers that the reverse sequence is entered as it is when aligned with the forward sequence, i.e. as the reverse complement of the sequence obtained from direct sequencing. If this is not the case, the checkbox "Reverse-complement reverse sequence" should be ticked.

Output sequences can be exported by copying and pasting them from the webpage into other applications, such as alignment softwares. Alternatively, a FASTA file containing both output sequences is automatically generated to facilitate downstream analyses.
Q5: Are there any known issues?
Champuru works best when dealing with mixtures of sequences that are closely similar to each other, such as homologous copies of a gene whose lengths differ only by a small insertion/deletion. When sequences are very divergent, differences in electrophoretic mobility among bases may cause misalignment of the peaks or even complete phase shift of one or several nucleotides. In such situation, one cannot summarize input chromatograms as strings of IUB code letters and Champuru cannot be used.
Q6: Champuru keeps telling me that there are some incompatible positions but I cannot find them. What can I do?
To locate incompatible positions, use the "Find" command of your web browser and look for underscore ("_") on the Champuru output webpage (this is the character that Champuru inserts in the consensus whenever there is an incompatibility between the two sequences it is trying to align). Then copy the sequence adjacent to the incompatibility and paste it in the "Find bases" window of Sequencher in order to be taken directly to the corresponding positions in the chromatograms (you may have to tick the "Any ambiguous base" option in the "Find bases" window in order for Sequencher to be able to locate the consensus bases).
Q7: How should I cite Champuru?
Flot et al. (2006) Phase determination from direct sequencing of length-variable DNA regions. Molecular Ecology Notes 6 (3), 627-630 link
Flot (2007) Champuru 1.0: a computer software for unraveling mixtures of two DNA sequences of unequal lengths. Molecular Ecology Notes 7 (6), 974-977 link

For instance: Phase determination in length variant heterozygotes was performed by direct sequencing (Flot et al., 2006) with the help of Champuru 2 (Flot 2007) (Spöri & Flot, in prep.; available online at https://eeg-ebe.github.io/Champuru/)