python remove accents

Python -- Posted on June 27, 2022

This code is written in Python and uses the 'unidecode' library to compare two Greek strings, stra and str2. It involves some string normalization and removal of accents before performing the comparison.

Let's break down the code step by step:

  1. import unidecode as ud: This line imports the 'unidecode' library and gives it the alias 'ud'.

  2. stra = 'ΑΘΗΝΑ': This line initializes the variable stra with the Greek string 'ΑΘΗΝΑ'.

  3. str2 = 'Αθήνα': This line initializes the variable str2 with another Greek string 'Αθήνα'.

  4. d = {ord('\N{COMBINING ACUTE ACCENT}'):None}: This line creates a dictionary d with a single entry. The key is the Unicode code point for the 'COMBINING ACUTE ACCENT' character (U+0301), and the value is None.

  5. str2 = ud.normalize('NFD', str2).translate(d).lower(): In this line, the following operations are performed on str2: a. ud.normalize('NFD', str2): This applies Unicode normalization to decompose the accents in str2 into base characters and combining characters. For example, 'Αθήνα' becomes 'Αθήνα'. b. .translate(d): This translates the characters in str2 according to the mapping in dictionary d. In this case, it removes the acute accent from 'Αθήνα', resulting in 'Αθηνα'. c. .lower(): This converts all characters in the string to lowercase, so 'Αθηνα' remains as 'Αθηνα'.

  6. print(stra.lower() == str2): This line compares the lowercase versions of stra and str2 and prints the result. It will output either True or False depending on whether the lowercase versions of the two strings are the same.

Given that stra is set to 'ΑΘΗΝΑ' and str2 is set to 'Αθήνα', the output of the comparison will be True because both strings, when converted to lowercase and after accent removal, become 'αθηνα'.

                import unidecode as ud
stra = 'ΑΘΗΝΑ'
str2 = 'Αθήνα'
d = {ord('\N{COMBINING ACUTE ACCENT}'):None}
str2 = ud.normalize('NFD',str2).translate(d).lower()

Related Posts