1

I need to do lots of work about powerpoint files , I use win32com to handle the job, I'm a Taiwanese , My OS is WIndows 7-64bit traditional version, and there will some chinese character in file name, however some character are not allowed to be a file_name in windows.

like '\xe8\xaa\xb2\x0b\xe6\xb0\xb4'

the above line contains invalid character in windows, how could i remover the invalid character? actually, \xe8\xaa\xb2\x0b\xe6\xb0\xb4 is a string , if i print it out,it will show strange symbol in my console. however i don't know which character is the strange symbol.

thank you very much in advance!!

2 Answers 2

2

try this:

dosnames=['CON', 'PRN', 'AUX', 'NUL', 'COM1', 'COM2', 'COM3', 'COM4', 'COM5', 'COM6', 'COM7', 'COM8', 'COM9', 'LPT1', 'LPT2', 'LPT3', 'LPT4', 'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9']
final=''
string='th\xe8is i\xaas \xb2a te><s\x0b\xe6t\xb0.\xb4'
for char in string:
  if not (char in '<>:"/\|?*'):
    if ord(char)>31:
      final+=char
if final in dosnames:
  #oh dear...
  raise SystemError('final string is a DOS name!')
elif final.replace('.', '')=='':
  raise SystemError('final string is all periods!')

this checks for dos names and unallowed ascii chars. then, print final gives me 'this is a test.'

Sign up to request clarification or add additional context in comments.

2 Comments

I see two problems with this answer: 1) it won't remove characters in the range \x00 - \x20, which are not allowed. 2) it will remove characters whose hex representation is greater then \x7F, which are allowed. If you run your program on the example the OP provided, it will return '\x0b', which is a vertical tab, not allowed as a file name.
No problem, that's kind of what this site's for. I just happened to see the question first :)
1

It is failing because of the \x0b byte, which represents a vertical tab, which is not allowed to be part of a file name in windows.

You may use any unicode character as part of a file name under Windows, except for:

  • < > : " / \ | ? *
  • Characters whose integer representations are 0-31(less than ASCII space)
  • Any other character that the target file system does not allow (say, trailing periods or spaces)
  • Any of the DOS names: CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9 (and avoid AUX.txt, etc)
  • The file name is all periods

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.