ASCII (American Standard Code for Information Interchange) is a character encoding standard that represents 128 characters using 7 bits. These 128 characters include uppercase and lowercase letters, numbers, punctuation marks, and control characters.
While this is the technical definition (source: Google), let’s understand why ASCII is important in Natural Language Processing (NLP).
The Problem
Think of two situations:
- Converting a number to binary
- Converting text to binary
Converting numbers to binary is pretty straightforward:
5 in binary = 101 100 in binary = 1100100
But Converting text to binary add extra step. First convert to number and to binary.
But converting text to binary involves an extra step:
- First, convert each character to a number (using encoding like ASCII)
- Then, convert that number to binary
So yes, this is exactly what we’re doing in NLP and programming — and instead of assigning numbers ourselves, we use ASCII, which is a standardized encoding for characters.
Python Code Example:
print(ord('A')) print(ord('a')) print(ord('1')) print(ord(' ')) print(chr(65))
Output: 65 97 49 32 A
Here:
ord() gives the ASCII value (number) of a character
chr() gives the character from an ASCII value
Now you explain the what's happening here.
name = "John" ascii_values = [ord(char) for char in name] print(ascii_values)
[74, 111, 104, 110]
Top comments (0)