CTK: Champollion Tool Kit

Built around LDC's champollion sentence aligner kernel, Champollion Tool Kit (CTK) aims to providing ready-to-use parallel text sentence alignment tools for as many language pairs as possible.

Champollion depends heavily on lexical information, but uses sentence length information as well. A translation lexicon is required. Past experiments indicate that champollion's performance improves as the translation lexicon become larger.

All source code was written in perl.

Your contribution is very welcomed, especially the following:

Please be aware that although champollion is designed for aligning noisy (deletions/insertions) parallel text, it's not capable of aligning comparable text.

What's New

[2005-08-25] CTK 1.1 released!

[2004-07-01] CTK 1.0 released!


All software is available from: http://sourceforge.net/projects/champollion/.

All software is licensed under the OSI-approved GNU General Public License. Please contact us if you would like the software under another license.

Available Language Pairs

Content of Distributions

Mailing Lists


  • champollion-announce - sign up for CTK announcements (moderated, low volume)
  • champollion-devel - send any questions and bug reports to this list (unmoderated)


    Minimum documentation is available in the README file in the distribution package. Full documentation is coming soon!

    Research Papers

  • Linguistic Data Consortium

    The CTK related efforts are based at the Linguistic Data Consortium at the University of Pennsylvania. The research and development of CTK is funded by TIDES Machine Translation Project.

    Who is Champollion anyway?