0

I have a txt: a.txt

A
a
B
b
Ş
ş

I try this command and I get false output:

$ uniq -ic a.txt 
      2     A
      2     B
      1     Ş
      1     ş

How can I solve the non-ascii character problem with uniq?

here is my full-code:

function show_authors() {
    id=0;
sqlite3 $db_file "SELECT author FROM books;" > /tmp/.list.txt
sort /tmp/.list.txt | uniq -ic > /tmp/.listed.txt
while IFS=" " read -r count author
do
cat <<EOT
<a href="#$id">$author</a> ($count), 
EOT
id=$(($id + 1))
done < /tmp/.listed.txt
}

cat <<EOT
<div id="author">
$(show_authors)
</div>
EOT

My code is correctly working in shell but not correctly working in a CGI-Bash subshell.

2 Answers 2

0

You might need to change the locale (if you haven't already), at least for that command. For a en_US locale, Ş and ş are not related.

LC_ALL=tr_TR will bring in a locale where Ş is the uppercase of ş and it will be considered correctly.

But the command might still not work if you are using UTF-8. For it to work you need to go back to a single byte encoding that uniq can handle and then go back to UTF-8:

So, if this does not do it:

$ LC_ALL=tr_TR uniq -ic a.txt

You can try:

$ LC_ALL=tr_TR iconv -f UTF-8 -t ISO-8859-3 < a.txt | tr '[:upper:]' '[:lower:]' | uniq -c | iconv -f ISO-8859-3 -t UTF-8

The command line goes from UTF-8 (multibyte) to ISO-8859-3 (single byte), then changes everything to uppercase, calls uniq and then moves back to UTF-8.

I know there are other languages and locales using Ş, but I had to choose one them to write the answer. Yours might be different.

4
  • Yes I use tr lang but web server is en_us. I am write a script for CGI-Bash, I don't must generate uppercase to lower case. because strings are correctly must be index. I mast ignore case (uniq -i) with non-ascii characters. LC_ALL=tr_TR doesn't work for me, LC_ALL=tr_TR iconv -f UTF-8 -t ISO-8859-3 < a.txt | tr '[:upper:]' '[:lower:]' | uniq -c | iconv -f ISO-8859-3 -t UTF-8 is false listing. Commented May 15, 2020 at 7:29
  • I am added full code. Can you look this? for solve me problem. Commented May 15, 2020 at 7:46
  • I change the LC_ALL to tr_TR or UTF-8 but CGI-Bash shell give me LC_ALL=C output Commented May 15, 2020 at 16:04
  • You can do export LC_ALL=tr_TR so that the changes apply to your bash session. About the extended code, in this site you should post a question on each new issue. If my answer solves the original problem, good, accept it and then open a new one with the other specific problem (in you case, how to call a command from a function, it seems). That keeps things discoverable for the next person with the same issue. Commented May 15, 2020 at 16:20
-1

How about something like:
cat file.txt | iconv | uniq -i

with the iconv unspecified & before 'sort' or 'uniq'

2
  • it's doesn't help for me. Because my code already correctly working in command line but when I run this code in CGI-Bash shell it's don't give correctly output. Commented May 15, 2020 at 15:59
  • CGI-Bash give me LC_ALL=C output Commented May 15, 2020 at 16:03

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.