Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upFeature Request: Python Executables for training, encoding and decoding #385
Comments
|
Isn't it enough to use Python API? They support all the features of spm_train and spm_encode.
|
|
That is also useful, I am just saying that a Python executable would make the library even more accessible, because there would be no need to check carefully which functions in https://github.com/google/sentencepiece/blob/master/python/sentencepiece.py are equivalent to a call to, for instance, |
|
spm_{train,endode} are pure C++ program/binary so we should not release them as Python packages. If we really want to have them, we would like to re-implement these tools in python and sentencepiece python wrapper. |
|
Thank you for your answer - and feel free to close the issue if this is not feasible or reasonable. |
|
If we decide to release these command line tools, we want to introduce py_spm_train or py_spm_encode, which are re-implementation of C++ library and installed under python site-package directory. We don't have enough bandwidht for working it. Any contributions are appreciated. |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

It would be helpful if installing the Python package via pip would also install Python binaries. What I mean is that the following should be possible:
This would make the package much more usable out of the box.
Compiling the source code leads to such binaries, but has many dependencies that are trickier to install.
Thanks for considering!