Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upBetter built-in lexers #668
Comments
|
If you feel it would help, after all it is a Wiki. But I'm not sure how many people actually read these pages. For discussion an issue (like this one) is probably more helpful? Maybe we could add a general page regarding syntax highlighting (similar to the README in There are some known problems/limitations of the current approach to syntax highlighting (e.g. #110, #603). Eventually I would like to address these, but performance and accuracy needs to be weighted against the introduced complexity. |
|
If you think an issue is better, then I'm happy with that. I'll try to get acquainted with how LPeg works when I have some time and start working on the issues I mentioned initially. |
|
Have you considered tree-sitter? While there is a lot of infrastructure around them, the parsers themselves are pure C, and designed for integration in text editors. Tree-sitter parsers are probably the highest quality syntax highlighting parsers available today, and are only going to get better because they are maintained by the creators of Atom. I haven't seen any performance benchmarks yet, but I fully expect tree-sitter parsers to outperform all other incremental parsers of comparable quality eventually. Whether they are faster than vis' current lexers is hard to predict, but then those lexers are quite primitive and don't even come close to the accuracy that tree-sitter provides. Having parsing of such quality in vis would also make new editing commands possible through accurate scope detection. See the examples in this Atom PR. Thus far tree-sitter parsers exist only for the most popular languages, but integrating those seems like a potentially huge win for vis: High-quality parsers that are maintained by someone else. |
|
Disclaimer/Warning: All of this is my opinion and is As Far As I Know. I’m also quite bad at phrasing this kind of stuff but I don’t want to ignore things and I don’t want things to escalate.
The current keystrokes and regexes are already making it possible, I don’t really see something new here…
LPeg lexers are coming from/shared with Scintillua. |
|
A few misunderstandings there:
Tree-sitter offers a DOM-style interface, not DOM-style output. That is purely an API convention, and it is very convenient to use. The interface is specifically designed for the needs of syntax highlighting in editors and has nothing to do with the "web DOM" (or with terminals, for that matter).
Tree-sitter uses a compiler written in C++ that processes JSON input to generate a pure C parser without any non-C dependencies. The compiler and the JSON file are not needed for parsing, and don't have to be shipped with the editor.
Every single tree-sitter grammar is versioned, tagged and unit tested. The compiler currently doesn't have versioned releases, but language bindings that depend on the compiler again are versioned, and pin a specific commit for their tree-sitter submodule.
Tree-sitter has nothing whatsoever to do with the "webapp universe". It is used by the Atom editor (which some people might consider a "webapp") as a dependency via Node.js bindings that are independent of the core parser and compiler. There are also bindings for Haskell and Ruby, and of course the native C API. Using tree-sitter in vis doesn't bring vis any closer to the webapp universe than hosting its code on GitHub does.
The whole reason this issue even exists is because vis/Scintillua parsers are of poor quality and do not even recognize lots of standard syntax. If structured text editing is to be reliable, quality syntax parsing is required first, and I don't ever see this happening with the kind of grammars that vis uses now. |
|
I'm skeptical of whether tree-sitter would work for vis. To use parsers written in C would either require recompilation every time you want to add a new language, or use of dynamic loading, neither of which seem appealing to me. If you want the benefits provided by tree-sitter, perhaps writing a Lua plugin to add support for it would be better. That way, it's optional and people who don't want to use it don't have to. |
|
Finally had a chance to work on this. I've patched the Bash lexer to support variable expansion inside heredocs and double-quoted strings. Next I'll add highlighting for I'll submit a PR when I'm happy with it. |
|
I would strongly caution against working around n issues rather than addressing the root problem. You don't want to end up in a situation where the amount of effort invested in moving from a 20% solution to a 40% or 60% solution leads a project to stick to the former instead of moving to a 90% alternative. The problem with C-based parsers is, however, a proper concern. One option is to offer "core" lexers powered by tree-sitter and compiled into vis, with fallback / user-defined lua-based lexers so a recompile wouldn't be necessary, but that's not ideal. What would be cool is if it were possible to use llvm to compile that parser to lua, but some googling does not reveal any attempts at making such an endeavour generally possible (and it would significantly complicate the build toolchain, unless a 3rd party project maintaining a lua-compiled version of all tree-sitter parsers were used, which would definitively address that issue). Given that lexers aren't "core" functionality, I don't think ruling out dynamic loading without some hefty consideration is the way to go, but still, users having to compile C code each time they want to add a lexer is a burden... |
|
|
That PR is pretty old, neovim/neovim#11724 is a better reference. But yes https://github.com/neovim/neovim/blob/master/src/nvim/lua/treesitter.c is relatively self-contained, except for directly reading neovim buffers in one function. |


The syntax highlighting lexers are currently missing support for various things (eg. labels in C, proper string highlighting for shell scripts, etc.)
I'm wondering whether it would be worth creating a wiki page with a list, so that people who want to contribute can look there.