corpus utf-8 encoding using python

Ask Question

Asked 8 years, 6 months ago

Modified 8 years, 6 months ago

Viewed 143 times

I already use movie_reviews corpus to make sentiment analysis. I replaced the existing text files with Arabic language text files, but I couldn't read and print them; I have a problem at encoding.

My code:

import nltk
from nltk.corpus import movie_reviews

documents = []

for category in movie_reviews.categories():
    for fileid in movie_reviews.fileids(category):
        documents.append([movie_reviews.words(fileid),category])   

print(documents[0])

I have this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)

edited May 1, 2017 at 6:41

m0nhawk

24.5k9 gold badges50 silver badges74 bronze badges

asked Apr 30, 2017 at 22:51

Karim

12 bronze badges

3

Possible duplicate of Python: UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)

DYZ
– DYZ

2017-04-30 22:57:32 +00:00
Commented Apr 30, 2017 at 22:57
i can solve the problem with one text file by determine the path and change encoding to utf , but i couldn't with corpus , could u give me suggestions!!!

Karim
– Karim

2017-04-30 23:00:43 +00:00
Commented Apr 30, 2017 at 23:00
This is an NLTK thing? Can you post the full stack trace? That looks like a Microsoft byte-order mark (BOM) which suggests that its a problem where a file is opened.

tdelaney
– tdelaney

2017-04-30 23:25:39 +00:00
Commented Apr 30, 2017 at 23:25
yes i import movie_reviews as corpus from nltk

Karim
– Karim

2017-04-30 23:42:54 +00:00
Commented Apr 30, 2017 at 23:42
NO Answers :((((((

Karim
– Karim

2017-05-01 13:11:00 +00:00
Commented May 1, 2017 at 13:11

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

corpus utf-8 encoding using python

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked