0

I think it would be great if we have a more interactive terminal which could speedup some actions by adding the interactiveness to the output of the programs, by converting the elements of the text to a clickable widgets like simple links or context menus.

Examples:

  1. Integration with git:

git branch

, which is converted to the clickable branch names. Clicking on any on these will do switching to the given branch.

  1. Integration with standard commands:

ls -al

, and then clicking on the directory names will make you ls this directory. And right clicking will show the context menu with "cd to...", "ls the directory..." etc.

Any existing solution that would support such scripting and creating a layer of UI elements on top of the terminal's text pane?

1 Answer 1

1

I've seen once or twice a video about a terminal emulator (TE) that offers features like these. I can't recall its name, and I'm fairly certain that I haven't heard about it again. Probably for a good reason.

It's utterly hopeless to implement this behavior in the TE. I'd like to demonstrate it by asking some questions. These are rhetorical questions to demonstrate the nature of the problems, I'm not expecting answers, to every answer I could probably ask dozens of followup question of similar nature.


Let's first assume that somehow the TE knows that a piece of output is from ls or ls -l or alike. How would it parse it?

Take a look at how many formatting options ls has. The long listing can add or remove some of the columns. There are multiple date formats, including your own custom, potentially two utterly different formats for recent vs. old files. There are multiple ways to denote special characters in filenames, including ways that are ambiguous to parse.

Sure you might try to teach the parser all these formats, but will it know which one to apply? How? Would it look at the command like that was executed, e.g. find that prompt$ ls -l line and then know from this? No, it couldn't know because ls is often aliased to include some command line switches (other utilities can similarly be controlled via environment variables which are again not seen by the TE).

Maybe the user even has multiple ls implementations installed with different output formats, maybe he executes ~/bin/foo-ls, what to do with its output?

Many users have aliases like l or ll standing for ls -l or something alike. How would the TE recognize if this is the case?

Maybe the user executes ls $dir or ls $(some-command-that-prints-a-directory-name). How would the TE know which directory's listing it sees?


git branch for me brings up the less pager, temporarily hiding everything else that was visible in the terminal. I'm sure it can be configured to just print the name of the branches (without invoking less), or less can be told to immediately quit if and only if the contents fit on one page. If inside less, you can start to scroll the contents horizontally, long branch names can partially disappear, or you can invoke the help page and then get back to viewing the actual contents. How could the TE keep track in all these cases and know which words on the screen are branch names?

How would you recognize git aliases, such as git br if the user has configured such an alias? E.g. would the TE look into and parse git's configs which then won't work over ssh?


How would you know which command is being executed? The closest guess is to parse the prompt (which might result in wrong guesses as I've demonstrated), but also... How do you locate the prompt? Users, presuambly especially the ones asking for fancy features like you do, tend to heavily customize their prompt. So how to locate the command that's being executed? You have to enter the territory of parsing a heavily and uncontrollably customizable prompt. (Well, here iTerm2's shell integration introduces a nice solution via explicit escape sequences that could be copied.)

Also, how to know what shell is being used? Different shells have different grammars for more complex command lines.


How to handle more complex commands? E.g. execute ls; git branch, how to tell from the output where the file listing ends and the git branch listing begins?

How to handle directory changes, e.g. how to track what directories a command such as cd foo; ls; cd ..; git branch operates on? Taking into account $CDPATH, and cd possibly being aliased to cd -P, and so on?


Now, assume that you somehow overcome all these problems, and the TE presents a dropdown box for certain words. The user clicks, and the TE wishes to perform an action.

How to do that?

There's no proper way for a TE to send an insturction to the application (e.g. shell) inside. It might synthesize keypresses, e.g. pretend that the user typed ls -l that-particular-directory followed by an Enter, but it only works if the shell is waiting for a command to be entered and its command line is currently empty. If another interactive app is running, or if some partial command is already entered, or if you've terminated the ssh connection where that-particular-directory was printed, etc. then synthesizing these keypresses might break things big time. How would the TE know whether currently it's safe to inject those keypresses?

Or would the TE itself execute the desired command (without the shell in between)? Then where should it insert the output on the screen, and wouldn't it be confusing that there's no prompt and command line belonging to it? Also it couldn't work across ssh.


The above questions just barely scratch the surface of how things can go wrong.

Terminal emulation and apps on top of that have been evolving for maybe like 40-50 years now (I'm not sure exactly), and this kind of advanced usability has never been a goal. For this goal, everything would have needed to be done completely differently.

Also there's the developer resource side of the story. How many utilities are there that would need such treatment? You mentioned ls and git, I guess you could come up with many-many more. How many modes of operations do these utilities have? These two each have an enormous amount. So are we talking about hundreds, thousands, maybe even tens of thousands of command output formats that the TE would need to be able to parse, decide what actions to offer, and decide how to perform those actions (taking into account circumstances like e.g. the working directory of the shell might have changed since).

Who is going to implement this? Knowing that by the very nature of things, this is all just a bunch of complex heuristics that no matter how hard you try, are probably going to break more often than work correctly? Knowing that parsing is one of the areas of computer programming that sucks the most, especially when the output wasn't designed to be parsed, let alone when when it's not unambiguously parseable? Who's going to design and implement this in a way that's fairly easily extendible by the user, so that users can teach their frequent aliases and such so that they work as expected? Who's going to maintain this, keep this up to date with newer versions and newer features of all these utilities? Who's going to address the inevitably neverending stream of bugreports? It's doomed to fail big time.


Forget this approach. Get used to using TEs and TE-based applications the way they work currently.

Or look for alternative solutions.

Maybe you want to use a graphical git frontend for your work.

If you frequently need a file listing and then the listing of a subdirectory then maybe you're looking for Midnight Commander or some other graphical file manager.

Learn to touch type, so that it becomes much faster for you to tell to the computer whatever you want to tell it, and more importantly, it becomes a fully automatic unconscious task not taking away any focus from what you're actually trying to achieve. Make typing just as natural as speaking.

8
  • 1
    +1 for a good answer; Alongside the already existing user interfaces, you can create your own aliases and small shellscripts to perform actions that you need often (and you can use short (yet unique) names to make them easy to type). Commented Jun 15, 2023 at 9:22
  • 1
    Additionally, it seems the wanted features can largely be provided by tab completion Commented Jun 15, 2023 at 16:47
  • @egmond these are all good question. For sure the solution should be smart enough to understand the scope of the execution. There is a lot of obstacles to be solved if we want a high class support for some app, but for some basic stuff it may be just a set of regexps mapped to the command to run. As to "where to run the command" question - the simplest would be to disable the interactive widgets if the terminal is not in prompt mode (yes, not very trivial task but solvable). As a last resort that can be a problem to be solved by ML. IMO, none of the issues listed above are hard blockers. Commented Jun 15, 2023 at 21:56
  • The best you can achieve is to invest a giant (as in, like, really giant) amount of work and get something that sometimes works, and by its very nature, can hardly be improved to work correctly a little bit more frequently. I, for one, would hate every second of having to work towards such a goal. Also, at many places the way to overcome the obstacles is to invent new stuff (e.g. helping escape sequences to be emitted by the shell): maybe easy if you do it just for yourself, but extremely hard if you want to get that accepted mainstream and ship for others out of the box... Commented Jun 16, 2023 at 5:02
  • Terminal emulation is sort of an "exact" science. Not that there aren't differences, disagreements, bugs etc., but if a TE does something, you can pretty much count on it doing this a 100% correctly. We've designed OSC 8 hyperlinking in a way that's 100% reliable. I've improved gnome-terminal's URL autodetection, spent one week on it, it's extremely complicated, works very well, but isn't 100% correct -- by the very nature of parsing a text flow for URLs isn't an exact science, there's no 100% correct solution, and just addressing all of the the already reported bugs at once took me one week. Commented Jun 16, 2023 at 5:07

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.