The Wayback Machine - https://web.archive.org/web/20200916162445/https://github.com/jfilter
Skip to content
Avatar

Highlights

  • Arctic Code Vault Contributor
  • Pro

Organizations

@code-for-magdeburg @codeforberlin @open-data-potsdam @beyondopen

Pinned

  1. 🧹 Python package for text cleaning

    Python 226 11

  2. 🗂 Split folders with files (i.e. images) into training, validation and test (dataset) folders

    Python 158 33

  3. 👩‍🏫 Pre-trained German Language Model with sub-word tokenization for ULMFIT

    Jupyter Notebook 13

  4. 📑 Scripts to repair, verify, OCR, compress (etc.) PDFs

    Shell 14 2

  5. 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

    HTML 30

  6. 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF

    Python 11

1,023 contributions in the last year

Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Mon Wed Fri
You can’t perform that action at this time.