0

I found this helpful post on how to extract the text from a DOCX file, and I wanted to make it into a little shell script. My attempt is as follows

#!/bin/sh

if [[ $# -eq 0 ]]; then
    echo "pass in a docx file to get the text within"
    exit 1
fi

text="$(unzip -p $1 word/document.xml | sed -e 's/<\/w:p>/\n/g; s/<[^>]\{1,\}>//g; s/[^[:print:]\n]\{1,\}//g')"
echo $text

However, this does not print the result as expected.

Any suggestions?

3
  • What result did you expect, and what result did you get? Commented Jul 14, 2021 at 0:58
  • 1
    You should run this through shellcheck.net first. Commented Jul 14, 2021 at 0:59
  • shellcheck.net figured it out. Needed to put quotes around $1 Commented Jul 14, 2021 at 1:06

1 Answer 1

1

Thanks to shellcheck.net, I found that I needed to put quotes around the $1. The final script, as approved by shellcheck, is:

#!/bin/sh

if [ $# -eq 0 ]; then
    echo "pass in a docx file to get the text within"
    exit 1
fi

text=$(unzip -p "$1" word/document.xml | sed -e 's/<\/w:p>/\n/g; s/<[^>]\{1,\}>//g; s/[^[:print:]\n]\{1,\}//g')
echo "$text"
Sign up to request clarification or add additional context in comments.

1 Comment

Good job solving the issue. Both command and process substitution are subshells. They are there own separate environment. As such, you must quote within them just the same as you would quote any command on the command line.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.