The Wayback Machine - https://web.archive.org/web/20210121093105/https://github.com/turicas/rows/issues/248
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document file-magic dependencies in different operating systems #248

Open
rjdp opened this issue Sep 29, 2017 · 10 comments
Open

Document file-magic dependencies in different operating systems #248

rjdp opened this issue Sep 29, 2017 · 10 comments
Labels

Comments

@rjdp
Copy link

@rjdp rjdp commented Sep 29, 2017

import magic
Traceback (most recent call last):
File "", line 1, in
File "/Users/rajdeepsharma/.virtualenvs/charenc/lib/python2.7/site-packages/magic.py", line 61, in
_open = _libraries['magic'].magic_open
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ctypes/init.py", line 379, in getattr
func = self.getitem(name)
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ctypes/init.py", line 384, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(RTLD_DEFAULT, magic_open): symbol not found

@rjdp
Copy link
Author

@rjdp rjdp commented Sep 29, 2017

ok I did not have libmagic installed, I did brew install libmagic now it works 👍 , may be when install file-magic should have errored out saying the dependency libmagic is missing, thanks

@rjdp rjdp closed this Sep 29, 2017
@rjdp
Copy link
Author

@rjdp rjdp commented Sep 29, 2017

file-magic dont seem to be as perfect chardet as for one of my csv file
output from filemagic: FileMagic(mime_type='text/plain', encoding='unknown-8bit', name='Non-ISO extended-ASCII text, with very long lines, with CRLF line terminators')

output from chardet : {'confidence': 0.8184309481349372, 'language': 'Greek', 'encoding': 'Windows-1253'}

chardet's output was very helpful while decoding the file.

@turicas
Copy link
Owner

@turicas turicas commented Oct 1, 2017

@rjdp, for text files chardet could be better detecting the file type, but there are some cases in which chardet wrongly detects an encoding - for this reason I prefer to use file-magic. Note that currently only the command-line interface is using file-magic to automatically detect file types.
I'm going to reopen this issue because the documentation about this part is missing.

@turicas turicas reopened this Oct 1, 2017
@turicas turicas changed the title "import magic" errors out on macOS Document file-magic dependencies in different operating systems Oct 1, 2017
@turicas turicas added the docs label Oct 1, 2017
@turicas
Copy link
Owner

@turicas turicas commented Oct 1, 2017

@cuducos, could you please check if information provided here solves your problem also?

@rjdp
Copy link
Author

@rjdp rjdp commented Oct 1, 2017

@turicas do you have cases where chardet predicts wrongly but file-magic does predict correctly, this knowledge will be helpful for me, I just want you to know I am currently using cchardet which seems to be ~40x faster than chardet as it uses c extension.

@rjdp
Copy link
Author

@rjdp rjdp commented Oct 1, 2017

oh I understand now, my program expects only text csv files so I am safe I guess, @turicas but as rows will also deal with text CSVs will it not be better to use cchardet instead of file ?

@turicas
Copy link
Owner

@turicas turicas commented Oct 1, 2017

@rjdp, You can see on this video I made (it's in Portuguese but you can understand the code): https://youtu.be/BTMj5bDXByc?t=412 (look the code history also)
I still need to figure out the best options on detection (was thinking in creating a library to deal with it, which would use chardet/cchardet and file-magic, in the best cases for each one).

@rjdp
Copy link
Author

@rjdp rjdp commented Oct 1, 2017

@turicas in case of text files, when will chardet/cchardet fail to predict correctly and file-magic predicts correctly?

@turicas
Copy link
Owner

@turicas turicas commented Oct 1, 2017

@rjdp, I don't have this information in details for you. Look the example in the video, check the mime types supported by file and do your tests. ;)

@cuducos
Copy link
Contributor

@cuducos cuducos commented Oct 12, 2017

@cuducos, could you please check if information provided here solves your problem also?

Yep, it did. I mean… I had to create a bare virtualenv (no site packages) in addition to brew install libmagic, but this fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.