Built around LDC's champollion sentence aligner kernel, Champollion Tool Kit (CTK) aims to providing ready-to-use parallel text sentence alignment tools for as many language pairs as possible.
Champollion depends heavily on lexical information, but uses sentence length information as well. A translation lexicon is required. Past experiments indicate that champollion's performance improves as the translation lexicon become larger.
All source code was written in perl.
Your contribution is very welcomed, especially the following:
Please be aware that although champollion is designed for aligning noisy (deletions/insertions) parallel text, it's not capable of aligning comparable text.
[2005-08-25] CTK 1.1 released!
[2004-07-01] CTK 1.0 released!
All software is available from: http://sourceforge.net/projects/champollion/.
All software is licensed under the OSI-approved GNU General Public License. Please contact us if you would like the software under another license.
Minimum documentation is available in the README file in the distribution package. Full documentation is coming soon!
Xiaoyi Ma Champollion: A Robust Parallel Text Sentence Aligner LREC 2006: Fifth International Conference on Language Resources and Evaluation, Genova, Italy, 2006
The CTK related efforts are based at the Linguistic Data Consortium at the University of Pennsylvania. The research and development of CTK is funded by TIDES Machine Translation Project.