Cosmo Myzrail Gorynych aka CoMiGo

Posted on Jun 19

Designing your own node-based visual programming language

#webdev #programming #ux #nocode

Original post with additional dev commentary can be found on my blog: comigo.games/nodes

Visual node-based programming is great! In some cases. Poorly-made ones only hinder your performance compared to text-based languages, but in overall they are much easier to learn and simpler for a bystander to comprehend, which are two very linked advantages but the latter makes it attractive and empowering for those who didn't even considered coding as something they needed—so it's not only about the learning curve but about the "aha" moment, too.

The holy grail of node-based languages for me is Blueprints from Unreal Engine. You can get many good ideas from it, and when I was making a new programming language for my tiny game engine called Whimsy, it was a heavy inspiration. But, Whimsy is also a much smaller in scope than most engines you know, targeted for more casual creators, so I made some simplifications and tried out ideas to make a distinct, easy to use language. Here I tell how I made the language and what are the design considerations you should take if you want to create something similar, too.

But, why even make a node-based language? Why not just use an existing text-based language?

The common reason of making any custom language is that you can create a domain-specific one. Existing mainstream languages are usually generalist ones, with features and complex syntax that can just be unneeded in your project. And they may lack features, too. Making a visual programming language in your project is about making a tool that fits it perfectly—maybe just for you and your in-house tools, maybe for your users to create content. Both cases are valid to at least try making.

First of all: we should separate block-based and node-based languages, because…

…the difference is not just graphical, but also in the way they work and what they allow to code.

Block-based, Scratch-like languages use what I call "linear grammar"—which is how every text-based language works. Every top-level script has only one entry point (an event, or a function declaration, or just the beginning of a file), and the code compiles (if it is not directly interpreted) to another language from top to bottom, with minor adjustements, with a 1-to-1 match. A compiled block-based language has its each block correspond to a precise line of code to the point where an AST of the compiled code can be converted back to blocks.

Node-based languages defy that. While we still can convert a command to a line of code, the programs as a whole are structured differently.

Firstly, several entry nodes can point to the same node, or several entry points come to several overlapping subgraphs. This means that if we are to compile this code, we should at least find these common nodes and pull them out into functions, or just duplicate the code. (But duplication is terrible for bundle size and will slow compilation.)

And secondly, these programming languages allow cyclic structures—unless you explicitly forbid that. (Which can feel "stiff" and goes against the idea of node-based languages as directional graphs.) But we have loops in text-based languages, so what's the deal, right? It's the same task as pulling out co-referenced subgraphs into a function!

Until your users make intersecting cycles.

Intersecting cycles are impossible to write directly in a text-based language. At least, they're not possible nowadays, as labels/code jumps became a bad practice as programming products became more complicated, and labels—error-prone, and also hard for compilers.

Most people who used only text-based programming before don't even realize that visual scripting languages can do more. Which is understandable—you may have coded everything in, say, C++ before, how is visual scripting different, eh? Because you didn't even thought of a possibility of doing such things. It's like using JS generators or Effection's Operations for the first time.

So, unless you disallow intersecting cycles, you will need to have a more complex approach on how to implement the runtime for this language. Making it all convert to a text-based language is possible, though—just more complex. This all brings us to the first step of designing your language—the runtime.

But before that, a smol terminology explainer!

Here are the terms I will often use when describing node structure:

Anyways,

The runtime

There are two ways you could make a node-based language work in your project as a programming language:

You can interpret it;
Or you can translate it to a language your app's runtime already understands.

The first approach is also called a virtual machine approach, or even an automata in the "cellular automata" sense. It is about traversing nodes and executing them as they are represented in your editors—as a graph. Every graph node has its inputs and the method identifier, and what executes next. There is also a huge dictionary that maps each possible method ID to its implementation.

The second approach is about converting the graphs to a known programming language. To JS, for example.

Though the first approach is usually simpler to make, it has one big flaw—subpar performance. Despite the seemingly perfect linked graph situation, using an automata prevents compilers (including JIT compilers for scripting languages) to optimize your code. For example, they can't optimize a for-if-else structure to be as fast as possible if those for, if, and actual commands with possible value getters are dismembered and spread in a map. This can be crucial, and an example of interpreted/compiled implementations are Scratch and TurboWarp. Mainly the same engine, but the speed increase is measured by magnitudes.

The exception for these rules is when the nodes themselves are so heavy so that the overhead of the VM and lack of optimization across several nodes is insignificant. For example, long before I made Whimsy and Catnip, I released FilterJS, which is an app for procedural generation of textures/images and for automated image filtering, and most nodes had HTMLCanvas operations or WebGL rendering. One node could have 300 lines of code in it with calls to various graphic APIs. The difference between interpretation and translation there would be unperceivable.

For Whimsy, I went with interpretation approach, as it is a tiny engine that won't have computation-heavy scenarios running in it, and I would rather cut corners than to die at my workstation coding a full-blown compiler. A bonus tip of going the interpretation way is that I don't need to worry about how the node graph is formed, whether it has intersections or other funny structures—the runtime works along what you created.

How everything flows

Most of the time, with categories of execution threads and data edges, you will get one-to-many connections:

Any number of nodes can execute the same node after they run, but any block links only to one "next" node;
A data block's output can be connected to any number of inputs, but every data input must have just one data source to eliminate ambiguity.

There is an exception for the first rule: when you're making a multi-threaded language.

With that, we have a back-and-forth structure: execution flows from start to end, and data gets retrieved from nodes that consume data to the very source of it. This will be reflected in your compiler/interpreter, and also how you store edges in users' project files.

But this is what applies to imperative visual languages. With image processing graphs, material editors, and, in some cases, AI workflow constructors, the execution flow may seem reverse. This is because the entry point is the result of your graph execution—which makes any nodes not connected to this entry irrelevant.

Handling data, and pure functions

One thing you might need to consider is how to handle computed values. There can be quick commands like "get hero instance" or "number of enemies", but there can be more complex methods like "get closest enemy" or "get path to a location". And while with text-based languages readability is oftentimes the judging factor of whether you create a variable or just inline it, when designing a language you should think on how to make your language solve this from the point of performance, as users will not have any power to change that themselves. Individual calls can be inlined, for sure, and in most other cases you can find the most common parent and automatically set a variable there.

But what if functions are not pure? What if your script is asynchronous (e.g. has timers) and you need to cache or temporarily store a value somewhere? Or should it maybe get recalculated, like a randomizer function?

A good approach on how Blueprints and Unreal Engine team in general handles these cases is that some blocks are not pure computing blocks and require to be put in an execution flow, meaning that they not only have data outputs but also an input and output for the execution thread. Thus such blocks tell their users that once a block executes, its return values are constant.

Going async

Unreal Engine's Blueprints have timers, animation curves, callbacks for long tasks like "wait till my actor comes to this location", and it is cool. Imagine writing it in C++ instead—each such call would be a lambda function at best or a whole separate execution system at worst. Still, there are tools you can use even if you don't want to or can't support asynchronous tasks in your node-based language.

For first, there are additional events. In YoYo (now Opera) Game Maker, for example, you can wind up one of 12 built-in timers, and each of these timers have events you can fill in, with which you essentially work like with callbacks.

Blueprints have event delegators: you can bind any subgraph to specific commands that run async commands, or just need a callback that would be neater if placed outside of the rest of the flow. Again, these are callbacks technically, but you tie them to specific nodes.

In any way, every async block must be marked as such in some way. It can be an icon, or it can be a rule that node names are extremely transparent in that: "Wait for X", "Do this and wait". This is important as it highlights which data will become stale and even invalid: a simple example is trying to heal a mob after a timer passes, during which the mob has died. And while it can be trivial to you, it is never for beginners—and I can say that with determination as an IT teacher.

The functionality

How sophisticated your language is depends only on your whim, honestly. Even most automatization solutions like Make or n8n make do with just simple connections between node bubbles, with edges only signifying the execution flow. They balance it out with complex node editors, and values from other blocks are used in templates like how you would write a Nunjucks-templated HTML page. A more gamedev-related example is Twine.

But most of the time, you will need at least two edge types: for execution flow and for data. And for nodes themselves, you could have various "classes", too.

Here is a more-or-so extensive list of what you can have in your node representation:

Block types:
- Entry points: events, triggers, function entries or even descriptors with arguments, or just the entry point as used in Twine.
- Regular "command" blocks that fit in execution flow.
- "Computed" blocks that are not included in the execution flow but data of which is used.
- Blocks that are both commands put in execution flow and export data.
- Explicit type conversion or casting blocks.
- Helpers to make connections more neat. (Aka "Connectors".)
Edges:
- Execution edges! Oh wow! These define the order of block execution.
- Data edges. Transfer of variables and returned data.
- Data converting edges.
- Other links like event delegator bindings.

Now, nodes themselves can have various features:

They can all have just one output pin, or they can have multiple;
They can have output/input pins defined by a user. (For example, a switch block that allows to specify multiple cases.)
They most likely will have input fields in themselves, for specifying constant values. Textboxes, input fields, dropdowns, or more complex input widgets like color or 2D vector pickers.
If a node has input pins for data, these pins should probably show an input for a constant value if they are not connected, so you don't need two blocks for "varA * 50" command.

Also,

Do you need a "wildcard" data type? (Like any in TypeScript.) You probably do.
Are there additional affinities where the language accepts several data types besides the main one? Like accepting Integer in a Float slot.
How will you handle complex data structures and arrays? Will a user make them and prepopulate them in the node editor?
You also need to display your node library. Maybe toolbar buttons. How will the scripting panel look as a whole? Does a user get enough screen space for the nodes canvas?

Rich tools are great, but… you also need to code them. Still, planning what you will potentially need will make life easier when time comes to expand the functionality. You can still keep things simple, but not hard-code them and allow for incremental growth.

The nodes

You will need a library of nodes; you will need to write their declarations of what will be shown on a screen. Most of the time, you will have two options:

You can design nodes directly by using your engine's/framework tools. For example, with SvelteFlow, I could describe each node as separate Svelte components and use tags like <Handle> directly. In Unreal Engine, to… make Blueprints in Blueprints, I could subclass a Widget and add other components I would premake in a WYSIWYG widget editor.
But you could instead describe nodes in a more abstract way, as plain data structures, and then make one or several master-components that can render any node given its declaration.

With Whimsy, I'm writing node declarations as TypeScript files with mostly JSON-like structures apart from i18n requests. This gives several advantages:

Making these nodes is just writing text, which is accessible for AI tools. Even with sensible defaults, declarations have tons of boilerplate code, and there are many similar blocks—AI tools can save your time significantly here.
You get TypeScript checks as you type, and depending on how strict you are with your types, you can get a pretty good validation before you even apply your changes.
With node declarations as objects in arrays of known structures, it is pretty easy to write custom linters that consider both the declarations and runtime implementation at the same time, so you don't misuse various configuration options and make a typo when writing a data input name while writing code for the runtime.
As my declarations are separated from their representation, I remain flexible on what I actually use for nodes. Though I picked SvelteFlow and am happy with it, you should consider such risks as migrations to a different engine/framework (or updating to a newer version), end of support for the node library, or even license termination.

The downside is my master-component is quite big in its code and thus less easy to maintain itself. Still, having one master-component for every node ensures consistency in styling and behavior.

By making nodes in your framework or engine directly, you can still keep everything declarative, and you can make simpler solutions for specific features and not worry about systems that would support all. But this is vendor-locking, and also harder to lint.

The user experience

The difference between a horrible and great visual programming languages often is determined by the presence of a searchbox. Yep. Searchboxes are a must.

There are also various handy things you could implement:

UI translations for the blocks;
Disconnecting from nodes in one click;
Node search when dragging from an input to an empty void;
Automatic connection when you place one block near another;
Automated addition of data type converters when you, say, connect a number to a string input;
Automatic linkage of blocks when you delete a middle block in a sequence, so when you remove "B" from "A->B->C" chain it becomes "A->C";
Drag-n-drop palette and click-to-add;
Hotkeys that add blocks when pressed;
Undo/redo states;
Copy+Paste;
Duplicate a node with right-click;
Dark/light theme;
Block groups, collapsible subgraphs, declaring reusable functions;
Annotations (comments for nodes and floating sticky notes);
Easy access for documentation and quick tips on hover;
Color-coding blocks by a user;
Panels and toolbars you can show/hide and tweak their size;
And probably much more!

Did I implement every feature from this list for Whimsy? Hell no. But every one of them I did made writing scripts more intuitive, faster, and more enjoyable. Initially Whimsy only had a sidebar with a block library from which you could drag-n-drop blocks onto a canvas. In the nearest update, you may never touch it by using a new context-aware search window. Which makes showing the library optional, thus users will also have a toggle to hide it, saving the screen space greatly on touch devices.

The user data and variable inputs

Unless you're making the most basic programming language (which was the case for the first Whimsy release), your language will have user-provided data and variables. While most user-provided data can be input as widgets inside the node that can write to node's data dictionary, variables are more complicated than that.
There are two major complications (in your codebase and cognitive space) that rise when variables are added:

Variables are usually of different types;
Most nodes that have data inputs, when these inputs are not connected, should show a widget for a constant input so a user can input a static value.

There are workarounds for that:

The all-known Scratch handles variables with one universal data type that accepts both strings and numbers. Besides this data type, there is also a boolean type that is incompatible with other values. You can go the similar way by converting the input data to a needed type when a node executes—but it bulks the code. You could instead define a hidden expected type in nodes' declaration and delegate the data conversion process to your compiler or interpreter. Or you could make conversions explicit, as nodes or special edges.
Instead of constant inputs for empty pins, you can make separate nodes whose task is outputting individual constant values. This is how it was done in FilterJS, and is also done in Unreal Engine's material editor though its existing Blueprints node system had support for constant inputs.

But okay, if you're going full-throttle, if you're making your own Blueprints™, there is a nuance you can overlook and which took several iterations for me to work perfectly and be neat in my codebase: you will need blocks which inputs and outputs are of a variable type.

These blocks range from the very basic blocks your language has to more specialized or QoL features:

When you write to a variable, the input type must match the type of this variable. Same goes with reading a variable: the output matches the variable's type.
Say you have a selector block similar to a ternary operator: if a condition is true, output input A; otherwise, output input B. This is a very handy block that doesn't need to be in the execution flow as it's pure data, and can simplify users' scripts.
Some blocks are relatively the same and would cause less clutter if there was just one block for every applicable data type: you can sum, multiply and subtract numbers, but also vectors. What if you also differ integers and floats?

This means that the node can change its widgets, inputs and outputs depending on each other. How do you describe that?

Initially I went the extra-explicit way where variable widgets could add an input and/or an output handle, and these handles would change their type to match the variable's one. But this meant that I define input and output handles in two places, which complicates the code in markup, node editing logic, exporter, and linters. Too much.

The way I settled with is simple—which is great:

The type of a pin can be not just a constant string ('number' , 'boolean' , 'string'), but a function that receives current node data (written values) and returns a computed type. As you as a node designer always know what is stored in a node, you can derive that data type based on which variable is picked, or which data type a user has selected.

I have this utility function that simplifies this in markup that is used both for i18n and pin types—if you pass this function another function, it will be called with the given rest parameters; otherwise the data will be used as is. It is also strictly typed, checking the provided function's arguments based on the possible type of T. The T must be a type that can be either a non-function type or a function that returns this type.

const uncompute = <T, Rest extends any[]>(value: ((...rest: Rest) => T) | T, ...rest: Rest): T => {
    if (value instanceof Function) {
        return value.apply(value, rest);
    }
    return value;
};

And then I use it in form of uncompute(node.name), uncompute(handle.type, data). Thus my markup stays clean, and linters stay concise. And you can make it more complex and aware of connected nodes by passing more data into this type-getter function.

The error handling

Implementation of error reporting is usually heavily dependent on your app, but here are some hints from my combined experience with Whimsy, Catnip, and FilterJS:

Prevent errors early in the graph itself. Forbid invalid connections. Always clean up references to now deleted entitites. Mark unfilled pins and empty constant inputs. When saving/running a project, poke the user into these graph layout errors. The less errors a user makes before running a graph, the less debugging is needed in the first place.
Run sanity checks before executing user's code. Check if any input fields don't have default values and are still empty, and route users to these nodes.
Listen for errors in the whole resulting code. If you went with interpretation route, you should try/catch every node your interpreter walks over, in your main walker function;
Keep all the identifying information about nodes when running code—what executes a code, in which event or trigger, which block ID is being run right now. When an error is thrown, use this information to route a user to the exact problematic place. You can clean this out for production, but the identifiers will be of great help during development.
Make errors human. Default error messages can be shown, too, but remember that you can subclass errors and/or affix additional data to them (depending on your language). Provide sensible messages for each error kind with possible solutions—they are very important for beginners. And show ye old stack trace for advanced users.

It is usually easy to do all of that if you're making, say, a game engine, as the game loop itself has the information about what currently executes and in which entity. Putting try/catch guards inside it will be smart as it will allow to catch the error, copy and enrich the existing error information, and allow to bubble it up so it is catched by your IDE or custom error handler, depending whether your engine architecture allows it and whether the game runs inside an editor. In use cases that don't have this convenience, you may need to plan ahead your exporter or the runtime so that it does provide the information.

In Whimsy, there is a pretty aggressive scenario cleanup function that executes on all scenarios when a project gets updated to a newer version, and when a user removes a variable. Remember—my nodes can change pin types based on project and node data, so there's no simple way to ensure everything stays correctly unless you recheck the whole graph. This function preserves only valid edges and nodes—invalid nodes may appear for users in the future in case I remove or change nodes in the library. When assets are removed, the scenarios are changed to reset relevant asset inputs' values to default "none" value. And when a node changes its inputs or outputs based on a selected variable, but doesn't have one selected, its pins get a special "void" type that doesn't connect to anything useful and is also easy to spot.

Here is how Whimsy reacts when an error does slips through—for example, when a user hasn't set a required variable in a block. A custom message is shown with a button "Go to problem", which opens the affected actor and focuses onto the problematic node. To not overwhelm the users, only minimal information is shown initially in the error message, with stack trace and IDs hidden under a <details> tag.

The innovations ✨

Your language can be a copycat—can't blame that, really; there're enough examples of great node-based programming languages, and it would be foolish to ignore them. Still, maybe there is a room for improvement? Can you think of something new and potentially cool?

With Whimsy, I tried to make node programming more 2D—though you can put nodes anywhere, they all form a horizontal or vertical tree: either inputs are on the left and outputs are on the right, or they are on top and bottom sides. Blender, Blueprints, Comfy UI, all follow this pattern. With Whimsy, I tried to make interface where execution frow goes from left to right, but data flows on a perpendicular axis, coming from below and connecting to nodes' bottom edges. Though it sounds cool, there is a subtle complication with it: if an input pin is undefined, it usually shows a field for a constant value instead. With horizontal layout, the input fields can be shown directly next to the input pins and be paired together—it's very intuitive and also easier to implement. With mixed layout, pins get disconnected from their fields, and if you have pins A, B, and C, and B and C are not connected, you need to pay attention on which is which to not mix them up with A. There can be visual helpers, but they won't be as effective as closely-knit, semantically logical horizontal layout.

Still, the "2D graph" approach improves on the visual aspect of node-based programming. It improves readability of a graph because the two-axi layout shows where data flows are at a glance, and they are also separated from commands in a way. To improve readability on what pin corresponds to which constant input field, I color-code them both based on their type and add labels.

The planning phase

It will be wise to at least make a sketch in Figma or Lunacy on how your programming language will look, or at least draw them on paper. The sizing, the colors, how elements inside a node are arranged. You can think of typical or more exquisite scenarios of what will be written in it, or you can display how a code you recently wrote would've looked in a node-based layout. This alone will show if your ideas and planned functionality are enough and will work together—and mockups are always easier to change than the real code.

Think of data structures: how the nodes are described in the node library, how they are stored with all the edge information. How all this gets combined in a compiler/interpreter. Keep things simple but flexible.

I've mentioned that I had to rewrite my approach to variables and dynamic types; looking back, I would also unite execution and data pins into two arrays (input and output pins), and also remove automagic with input/output exec pins—all this, right now, makes validators more complicated than needed. (if (!declaration.autoFlow || declaration.autoFlow === 'both' || declaration.autoFlow === 'first') is what is needed to determine whether there is a regular output pin for the execution thread. Dreadful.)

I also happened to make nodes' pin names include their type to differ the types and enforce connection rules. The name is templated as ${type}-${key}, and by splitting a string with a hyphen into two elements you can derive both back. This happened somewhere during the first attempts to make dynamic pin types. This is obviously neither error-proof nor performant, and maybeee I'll rewrite it to read the type from a declaration—for some reason I skipped the most obvious and sane solution initially. It would work very well with unified input/output pins, too.

The tools, and the code

Once you've planned out what you want to make and what features you're gonna include, you have at least subconscious criteria to decide which tools and/or frameworks to use. For me, it was SvelteFlow: it has tons of practical examples on QoL features besides the basic functionality, and most of these QoL stuff were beyond just a framework's reference; they were written in userland code and this allowed me to add them quickly and tailor for my engine.

You can also go more low-level. For example, in FilterJS, I made my own node framework, and it wasn't a terrible experience if I still remember anything about that 2019 project. I used Riot.js v3 components for the nodes and CanvasContext2D for rendering edges behind them.

Conclusion

Once you know how a thing is done, what's left is allocating time to do it. Obviously programming all this is outside of this post's scope, but I hope that with it you will have better understanding about how node-based languages work, what are the pitfalls and solutions—and maybe you actually will make your own node-based language and will impress me and others with it. Have fun and happy coding for you!

DEV Community