Remove accents from characters

Question

I'm quite certain this has been asked and answered before, however, I cannot find the answer to my specific use-case.

I've got this file with accented characters in it:

>  ~ cat file
ë
ê
Ý,text
Ò
É

How would I convert them to their respective non-accented letters? So the outcome would be something along the lines of:

> ~ convert file out.txt
> ~ cat out.txt
e
e
Y,text
O
E

Note that the actual file itself contains more characters.

Those look like accented letters to me: en.wikipedia.org/wiki/Diacritic Of course, if you need to change some other symbols to letters too, by some rule, then that's different. — ilkkachu
– ilkkachu, Commented Jan 29, 2021 at 15:27
Would you change ü to ue (German equivalents) or plain u? Even in English, how would you expect to map æ? — Chris Davies
– Chris Davies, Commented Jan 29, 2021 at 16:11
Your first example is not an accent but a diaeresis. Do you want to convert those, too? Your question is self-contradictory in that regard. — Jörg W Mittag
– Jörg W Mittag, Commented Jan 30, 2021 at 12:53

steeldriver · Accepted Answer · 2021-01-29 15:17:05Z

24

You can try iconv, with the //TRANSLIT (transliteration) option

Ex. given

$ cat file
ë
ê
Ý,text
Ò
É

then

$ iconv -t ASCII//TRANSLIT file
e
e
Y,text
O
E

answered Jan 29, 2021 at 15:17

steeldriver

83.9k12 gold badges124 silver badges175 bronze badges

doesn't work on my mac. but works on CentOS 8. Thanks!

Kevin C
– Kevin C

2021-01-29 15:27:32 +00:00
Commented Jan 29, 2021 at 15:27
@KevinC I wonder if it would work on the mac if you added an appropriate -f value to specify the input encoding? Perhaps obtained using the file command on your input file?

steeldriver
– steeldriver

2021-01-29 15:32:41 +00:00
Commented Jan 29, 2021 at 15:32
1

Do you have the same iconv on both systems?

ctrl-alt-delor
– ctrl-alt-delor

2021-01-29 15:42:20 +00:00
Commented Jan 29, 2021 at 15:42
2

Very useful answer, thanks !! But I am disappointed when I look at the iconv(1) man page. It does not say anything about ascii//TRANSLIT. And iconv --list does not mention TRANSLIT. How can one find all these options for encodings ?

phs
– phs

2022-03-06 09:31:22 +00:00
Commented Mar 6, 2022 at 9:31
3

I don't get the same behavior: echo 'ÉÀîéàç' | iconv -f UTF-8 -t ASCII//TRANSLIT returns 'E`A^i'e`ac. My problem here is i don't want the quotes in the output (i know i may pipe it through tr -d but it would remove the actual quotes from the original text)

Christophe Priieur
– Christophe Priieur

2023-03-07 10:02:20 +00:00
Commented Mar 7, 2023 at 10:02

| Show 7 more comments

chexum · Accepted Answer · 2021-01-29 15:31:03Z

8

The GNU recode package is very useful to convert between character encodings, and it has a special case that does exactly this with the "flat" encoding:

recode -f utf8..flat <textin.txt >flattext.out

edited Jan 29, 2021 at 15:31

answered Jan 29, 2021 at 15:17

chexum

8928 silver badges14 bronze badges

Add a comment |

jubilatious1 · Accepted Answer · 2025-06-30 09:01:17Z

Using Raku (formerly known as Perl_6)

Raku performs NFC normalization by default (everything except file names). If you want to remove accents you need to decompose the character, meaning you need to use either the NFD or NFKD methods:

~$ echo 'été à la plage' | \
   raku -ne 'NFKD($_).map(*.chr.subst(:global, /\c[COMBINING ACUTE ACCENT]/, "")).join.put ;'
ete à la plage

...and...

~$ echo 'été à la plage' | \
   raku -ne 'NFKD($_).map(*.chr.subst(:global, /\c[COMBINING GRAVE ACCENT]/, "")).join.put ;'
été a la plage

...all together...

~$ echo 'été à la plage' | \
   raku -ne 'NFKD($_).map(*.chr.subst(:global, /\c[COMBINING ACUTE ACCENT] | \c[COMBINING GRAVE ACCENT]/, "")).join.put ;'
ete a la plage

Maybe the issue is you need to know what accents are added to your text? You can compare NFC normalization to NFD decomposition below:

...NFC():

~$ echo 'ëêÝÒÉ' | \
   raku -ne 'NFC($_).map( *.uniname).join(" | ").put for .comb;'
LATIN SMALL LETTER E WITH DIAERESIS
LATIN SMALL LETTER E WITH CIRCUMFLEX
LATIN CAPITAL LETTER Y WITH ACUTE
LATIN CAPITAL LETTER O WITH GRAVE
LATIN CAPITAL LETTER E WITH ACUTE

...NFD():

~$ echo 'ëêÝÒÉ' | \
   raku -ne 'NFD($_).map( *.uniname).join(" | ").put for .comb;'
LATIN SMALL LETTER E | COMBINING DIAERESIS
LATIN SMALL LETTER E | COMBINING CIRCUMFLEX ACCENT
LATIN CAPITAL LETTER Y | COMBINING ACUTE ACCENT
LATIN CAPITAL LETTER O | COMBINING GRAVE ACCENT
LATIN CAPITAL LETTER E | COMBINING ACUTE ACCENT

https://docs.raku.org/language/unicode
https://docs.raku.org/type/Uni
https://raku.org

Stack Exchange Network

Remove accents from characters

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

Remove accents from characters

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions