The Wayback Machine - https://web.archive.org/web/20220122171525/https://github.com/bitextor/
Skip to content
@bitextor

Bitextor Team

Translation memories generator

Pinned

  1. bitextor Public

    Bitextor generates translation memories from multilingual websites

    Python 199 40

  2. Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.

    Python 92 18

  3. bifixer Public

    Tool to fix bitexts and tag near-duplicates for removal

    Python 12 1

  4. biroamer Public

    Utility that will help you to ROAM (Random Omit Anonymize and Mix) your parallel corpus.

    Python 5 2

  5. PDF parser and converter to HTML

    Java 50 13

  6. Extracts plain text, language identification and more metadata from WARC records

    C++ 3 1

Repositories

Top languages

Loading…

Most used topics

Loading…