0

So I'm trying to loop over a bunch of PDF files, grab their character count and divide that by 5. So output should be something like this:

PDF1.pdf
400

PDF2.pdf
1000

Assuming PDF1.pdf has 2000 characters and PDF2.pdf has 5000 characters. This is what I'm currently doing:

for %%f in (*.pdf) do (
    echo %%~nf.pdf
    pdftotext %%~nf.pdf -enc UTF-8 - | wc -m
)

What I really seek help with, is grabbing the value from wc -m, divide that by 5 and echo it out.

I've tried various things such as SET /A total=(wc -m) / 5 but nothing really seems to work out.

1

1 Answer 1

3
for %%f in (*.pdf) do (
 echo %%~nf.pdf
 for /f %%c in ('pdftotext %%~nf.pdf -enc UTF-8 - ^| wc -m') do set /a words=%%c / 5
 call echo %%words%%
)

should do the trick. Single-quote the command yielding the word-count - note the caret to escape the pipe. Assign the required value to words and use the call echo trick to produce the result.

There are other possibilities...


With decimals...

 for /f %%c in ('pdftotext %%~nf.pdf -enc UTF-8 - ^| wc -m') do set /a words=%%c*2
 call echo %%words:~0,-1%%.%%words:~-1%%.

Multiply by 2=required value *10; show all-bar-last-char, dot, last-char.

Sign up to request clarification or add additional context in comments.

6 Comments

Hell yeah! That did it. Any clue how I could get it out with decimals? Right now it rounds it down.
The decimals didn't really work. I got 14100.6. when I should get about 29.1114 in this case.
Er, which case? What was %%c to display that? Batch is actually limited to integers. How do you get 29.1114 by dividing an integer by 5? It would mean that wc is returning 145.557 words.
145557 words sounds about right yeah. I tagged this post with bash as well, as I'm using Git Bash to run the commands, otherwise won't pdftotext and wc work
145557 gave me 29111.4 using the method I've shown. What thousands/decimal separators are you using?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.