The Wayback Machine - https://web.archive.org/web/20200602205655/https://github.com/tesseract-ocr/tesseract/issues/2363
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-column document with ordered lists lose numbers #2363

Open
IdiosApps opened this issue Apr 1, 2019 · 0 comments
Open

Two-column document with ordered lists lose numbers #2363

IdiosApps opened this issue Apr 1, 2019 · 0 comments

Comments

@IdiosApps
Copy link

@IdiosApps IdiosApps commented Apr 1, 2019

Environment

  • 4.0.0:
  • Platform: Ubuntu 16.04 Xenial

Current Behavior:

Ordered/unordered lists of growing lengths affect other column + bullet points in two-column image. This is with --psm 1 & -l eng

Input 1:
input_2_columns_ol
tessDebug_ol

And a slightly different Input 2:
inout_2_columns_ul_ol
tessDebug_ul_ol

Expected Behavior:

Tesseract should segment the text into two columns, and:

  1. identify all the bulletpoint numbers (in both columns),
  2. identify the text on lines even with little text (maybe too sparse for recognition?). It seems that 4 characters are needed on a line (but then, the two-line bullet 1. under section 5. should be readable).

Suggested Fix:

I don't have a suggestion for this.

@IdiosApps IdiosApps changed the title Two-column document with ordered lists lose number info Two-column document with ordered lists lose numbers Apr 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
1 participant
You can’t perform that action at this time.