The Wayback Machine - https://web.archive.org/web/20200626095127/https://github.com/martanne/vis/issues/668
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better built-in lexers #668

Open
vktec opened this issue Feb 10, 2018 · 10 comments
Open

Better built-in lexers #668

vktec opened this issue Feb 10, 2018 · 10 comments
Labels

Comments

@vktec
Copy link

@vktec vktec commented Feb 10, 2018

The syntax highlighting lexers are currently missing support for various things (eg. labels in C, proper string highlighting for shell scripts, etc.)
I'm wondering whether it would be worth creating a wiki page with a list, so that people who want to contribute can look there.

@martanne
Copy link
Owner

@martanne martanne commented Feb 11, 2018

If you feel it would help, after all it is a Wiki. But I'm not sure how many people actually read these pages. For discussion an issue (like this one) is probably more helpful? Maybe we could add a general page regarding syntax highlighting (similar to the README in lua/lexers). It could then also feature/reference a list of known issues.

There are some known problems/limitations of the current approach to syntax highlighting (e.g. #110, #603). Eventually I would like to address these, but performance and accuracy needs to be weighted against the introduced complexity.

@vktec
Copy link
Author

@vktec vktec commented Feb 11, 2018

If you think an issue is better, then I'm happy with that. I'll try to get acquainted with how LPeg works when I have some time and start working on the issues I mentioned initially.

@p-e-w
Copy link
Contributor

@p-e-w p-e-w commented Feb 18, 2018

Have you considered tree-sitter?

While there is a lot of infrastructure around them, the parsers themselves are pure C, and designed for integration in text editors. Tree-sitter parsers are probably the highest quality syntax highlighting parsers available today, and are only going to get better because they are maintained by the creators of Atom.

I haven't seen any performance benchmarks yet, but I fully expect tree-sitter parsers to outperform all other incremental parsers of comparable quality eventually. Whether they are faster than vis' current lexers is hard to predict, but then those lexers are quite primitive and don't even come close to the accuracy that tree-sitter provides.

Having parsing of such quality in vis would also make new editing commands possible through accurate scope detection. See the examples in this Atom PR.

Thus far tree-sitter parsers exist only for the most popular languages, but integrating those seems like a potentially huge win for vis: High-quality parsers that are maintained by someone else.

@lanodan
Copy link

@lanodan lanodan commented Feb 19, 2018

Disclaimer/Warning: All of this is my opinion and is As Far As I Know. I’m also quite bad at phrasing this kind of stuff but I don’t want to ignore things and I don’t want things to escalate.

Have you considered tree-sitter?

  • The ~advertised output is DOM-style, while we need something for Terminals
  • Another language (C++, JSON, …)
  • I’m not sure efficiency is that much needed
  • There seems to be no release or tags !
  • This is apparently coming from the webapp universe which is pretty remote from where vis is now. (which I would describe as a traditional but modern Unix/Plan9 editor)

new editing commands possible through accurate scope detection.

The current keystrokes and regexes are already making it possible, I don’t really see something new here…

maintained by someone else.

LPeg lexers are coming from/shared with Scintillua.

@p-e-w
Copy link
Contributor

@p-e-w p-e-w commented Feb 19, 2018

A few misunderstandings there:

The ~advertised output is DOM-style, while we need something for Terminals

Tree-sitter offers a DOM-style interface, not DOM-style output. That is purely an API convention, and it is very convenient to use. The interface is specifically designed for the needs of syntax highlighting in editors and has nothing to do with the "web DOM" (or with terminals, for that matter).

Another language (C++, JSON, …)

Tree-sitter uses a compiler written in C++ that processes JSON input to generate a pure C parser without any non-C dependencies. The compiler and the JSON file are not needed for parsing, and don't have to be shipped with the editor.

There seems to be no release or tags !

Every single tree-sitter grammar is versioned, tagged and unit tested. The compiler currently doesn't have versioned releases, but language bindings that depend on the compiler again are versioned, and pin a specific commit for their tree-sitter submodule.

This is apparently coming from the webapp universe which is pretty remote from where vis is now. (which I would describe as a traditional but modern Unix/Plan9 editor)

Tree-sitter has nothing whatsoever to do with the "webapp universe". It is used by the Atom editor (which some people might consider a "webapp") as a dependency via Node.js bindings that are independent of the core parser and compiler. There are also bindings for Haskell and Ruby, and of course the native C API.

Using tree-sitter in vis doesn't bring vis any closer to the webapp universe than hosting its code on GitHub does.

The current keystrokes and regexes are already making it possible, I don’t really see something new here…

The whole reason this issue even exists is because vis/Scintillua parsers are of poor quality and do not even recognize lots of standard syntax. If structured text editing is to be reliable, quality syntax parsing is required first, and I don't ever see this happening with the kind of grammars that vis uses now.

@vktec
Copy link
Author

@vktec vktec commented Feb 19, 2018

I'm skeptical of whether tree-sitter would work for vis. To use parsers written in C would either require recompilation every time you want to add a new language, or use of dynamic loading, neither of which seem appealing to me.

If you want the benefits provided by tree-sitter, perhaps writing a Lua plugin to add support for it would be better. That way, it's optional and people who don't want to use it don't have to.

@vktec
Copy link
Author

@vktec vktec commented Feb 19, 2018

Finally had a chance to work on this. I've patched the Bash lexer to support variable expansion inside heredocs and double-quoted strings. Next I'll add highlighting for $() and correct the highlighting for ``, which is currently highlighted like a single-quoted string.

I'll submit a PR when I'm happy with it.

@mqudsi
Copy link

@mqudsi mqudsi commented Apr 26, 2018

I would strongly caution against working around n issues rather than addressing the root problem. You don't want to end up in a situation where the amount of effort invested in moving from a 20% solution to a 40% or 60% solution leads a project to stick to the former instead of moving to a 90% alternative.

The problem with C-based parsers is, however, a proper concern. One option is to offer "core" lexers powered by tree-sitter and compiled into vis, with fallback / user-defined lua-based lexers so a recompile wouldn't be necessary, but that's not ideal. What would be cool is if it were possible to use llvm to compile that parser to lua, but some googling does not reveal any attempts at making such an endeavour generally possible (and it would significantly complicate the build toolchain, unless a 3rd party project maintaining a lua-compiled version of all tree-sitter parsers were used, which would definitively address that issue).

Given that lexers aren't "core" functionality, I don't think ruling out dynamic loading without some hefty consideration is the way to go, but still, users having to compile C code each time they want to add a lexer is a burden...

@mcepl
Copy link

@mcepl mcepl commented Feb 29, 2020

  1. tree-sitter is being added to neovim (neovim/neovim#9219 and many other issues/PRs)

  2. I wonder whether @bfredl actually wrote Lua bindings for tree-sitter and whether it couldn’t be used by vis.

@bfredl
Copy link

@bfredl bfredl commented Feb 29, 2020

That PR is pretty old, neovim/neovim#11724 is a better reference. But yes https://github.com/neovim/neovim/blob/master/src/nvim/lua/treesitter.c is relatively self-contained, except for directly reading neovim buffers in one function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
7 participants
You can’t perform that action at this time.