Background
I am working on a project to display and manipulate 3D puzzle cubes (e.g. Rubik's Cubes).
As the next step in the project, I want to be able to parse algorithm notation and then execute those actions. I currently have this supported, but I've been playing around with a way to make it more type-safe.
Ohm provides typings for TypeScript, but in a handful of places just uses any or generic Record<string, any> types. This isn't bad per-se, but I wouldn't mind a little bit of extra type-safety. This is most noticable when using Semantic Actions, which are Ohm's way of translating a parse tree into actions via visitor pattern.
There are three layers to this process (in my implementation at least):
- The parts of my code where I call
semantics(matchResult).<action>(). The semantics adapter object that is returned bysemantics(matchResult)is basically just aRecord<string, () => any>, so any consumers of this section of code just have to know (a) what semantic actions are available and (b) what their return types should be - The parts of my code where I create the
Action<T>objects to define the semantics for a given action. Each rule + label combination gets a single function, that basically has the signature(...subNodes: ohm.Node[]) => any, where eachohm.Nodehas access to the sameRecord<string, () => any>object that is on the adapter. - The middle layer between these two where I actually construct a semantics object that uses the actions defined in #2, that will then be used by #1.
Goals
My goal of the rewrite was to satisfy the following, in decreasing order of importance:
- Consumers of the semantic actions (i.e. anything calling
adapter.<action>()) should be prevented from invoking actions that don't exist, or mis-typing the return types of those actions. - Consumers of the semantic actions should not have to use excessively complex generic types to safely use the adapters; they should just be able to use something like
MySemanticAdapter<T>orMySpecificSemanticAdapter - Implementers of semantic actions should be able to write each individual
${rule}_${label}action in a way that has a unique set of strongly-typed inputs per node, and such that each node can have a distinct return type when callingnode.<action>() - Implementers of semantic actions will need to use generic types, but as much of the cruft as possible should be abstracted from them.
Approach
semantics.ts
import { MatchResult, Node as OhmParseNode, Semantics } from "ohm-js";
/**
* Type-safe semantics object that can take actions on a parse result
* @template TOperations Mapping between operation names and their return types
* @template TAttributes Mapping between attribute names and their value types
*/
export type _SemanticExecutor<
TOperations extends Record<string, unknown>,
TAttributes extends Record<string, unknown>,
> = Semantics & {
(result: MatchResult): SemanticAdapter<TOperations, TAttributes>;
};
type SemanticAdapter<
TOperations extends Record<string, unknown>,
TAttributes extends Record<string, unknown>,
> = {
[TOperationKey in keyof TOperations]: () => TOperations[TOperationKey];
} & {
[TAttributeKey in keyof TAttributes]: TAttributes[TAttributeKey];
};
/**
* Type-safe semantic action parsing node that requires strongly-typed children
* and operations
* @template TCtorName The name of the parse node that applies here
* @template TOperationName The name of the operation
* @template TOperationReturnType The return type of the operation
* @template TChildrenType The child nodes that are expected to be had
*/
export type _SemanticParseNode<
TCtorName extends string,
TOperationName extends string,
TOperationReturnType,
TChildrenType extends OhmParseNode[],
> =
// this looks useless, but actually serves to make this distributive,
// i.e. a given _SemanticParseNode with a union TChildrenType will result in a
// union of possible result types. This means that _GetFrameworkExecuteCallback
// will yield a union type as well. This allows us to have multiple labels per
// node constructor, and have each satisfy a different callback signature
TChildrenType extends OhmParseNode[]
? OhmParseNode & {
ctorName: TCtorName;
children: TChildrenType;
} & {
[Key in TOperationName]: () => TOperationReturnType;
}
: never;
/**
* Assert that a node is the strongly-typed node we think it is
* @param node The node that should be strongly typed
* @param ctorName The name of the constructor of this node
*/
export function _assertNodeIsStronglyTyped<
TNode extends _SemanticParseNode<string, string, unknown, OhmParseNode[]>,
>(node: OhmParseNode, ctorName: TNode["ctorName"]): asserts node is TNode {
if (node.ctorName !== ctorName) {
throw new Error(`Exepcted a ${ctorName} node, but got a ${node.ctorName}`);
}
}
/**
* A terminal iteration node
* @template TChildren The type of node this iteration should have as children
*/
export type _Iter<TChildren extends OhmParseNode> = _SemanticParseNode<
"_iter",
string,
unknown,
TChildren[]
>;
/**
* A terminal node (i.e. one that is just a literal)
*/
export type _Terminal = OhmParseNode;
/**
* Get the appropriate callback type for this particular semantic action's ActionDict
* @template TSemanticParseNode The semantic action node type
*/
export type _FrameworkVisitorCallback<TSemanticParseNode> =
TSemanticParseNode extends _SemanticParseNode<
string,
string,
infer TReturnType,
infer TChildrenType
>
? TChildrenType extends OhmParseNode[]
? (...args: TChildrenType) => TReturnType
: never
: never;
For goals #1 and #2, the most important feature here is the _SemanticExecutor. I've chosen to name it with a leading underscore because I don't intend for this type to be directly exposed to callers, but rather than whoever uses it should probably provide a clean wrapper around this, i.e.
export type CubeExecutor = _SemanticExecutor<{ eval: CubeMove[] }, {}>;
For goals #3 and #4, the most important features are the _SemanticParseNode, _assertNodeIsStronglyTyped, and _FrameworkVisitorCallback.
_SemanticParseNode is what wraps the ohm.Node object in a type-safe fashion, _assertNodeIsStronglyTyped is there is make sure the grammar matches the actions, and _FrameworkVisitorCallback is used to make sure that the semantic actions we provide correctly match one of the signatures of the node. You may note that there are a few places we use extends to ensure that we get distributive conditional types. Without that, we would get this as our signature:
(...args: Node[] | [Node, Node] | [Node, Node, Node]) => number
When what we really wanted was this:
| (...args: Node[]) => number
| (...args: [Node, Node]) => number
| (...args: [Node, Node, Node]) => number
This allows me to write distinct actions for every ${rule}_${label} pair, which is the more idiomatic way in Ohm.
In order to test this I created a simplified grammar of cube moves; it supports one-or-more space-separated face rotations, ignoring cube size, internal slices, or whole-cube rotations.
semantics.spec.ts
const cubeGrammar = String.raw`
Algorithm {
steps = move (space+ move)* space*
move = face "'" -- antiClockwise
| face "2" -- doubleTurn
| face -- clockWise
face = "F" -- front
| "R" -- right
| "L" -- left
| "B" -- back
| "U" -- up
| "D" -- down
}
`;
type CubeExecutor = _SemanticExecutor<{ eval: CubeMove[] }, {}>;
enum CubeFace {
Front,
Right,
Left,
Back,
Top,
Bottom,
}
enum Rotation {
Clockwise,
HalfTurn,
AntiClockwise,
}
type CubeMove = {
face: CubeFace;
rotation: Rotation;
};
The implementation of the semantic action then looks like this:
semantics.spec.ts
const steps: _FrameworkVisitorCallback<StepsNode> = (
firstMove: MoveNode,
_spaces: _Iter<_Terminal>,
restMoves: _Iter<MoveNode>,
_trailSpaces: _Terminal,
) => {
_assertNodeIsStronglyTyped(firstMove, "move");
_assertNodeIsStronglyTyped(restMoves, "_iter");
return [
firstMove.eval(),
...restMoves.asIteration().children.map((child: ohm.Node | MoveNode) => {
_assertNodeIsStronglyTyped<MoveNode>(child, "move");
return child.eval();
}),
];
};
const move_antiClockwise: _FrameworkVisitorCallback<MoveNode> = (
face: FaceNode,
_antiClockwiseInd: _Terminal,
) => {
_assertNodeIsStronglyTyped(face, "face");
return { face: face.eval(), rotation: Rotation.AntiClockwise };
};
const move_doubleTurn: _FrameworkVisitorCallback<MoveNode> = (
face: FaceNode,
_doubleTurnInd: _Terminal,
) => {
_assertNodeIsStronglyTyped(face, "face");
return { face: face.eval(), rotation: Rotation.HalfTurn };
};
const move_clockWise: _FrameworkVisitorCallback<MoveNode> = (
face: FaceNode,
) => {
_assertNodeIsStronglyTyped(face, "face");
return { face: face.eval(), rotation: Rotation.Clockwise };
};
function getFaceAction(face: CubeFace): _FrameworkVisitorCallback<FaceNode> {
return (_face: _Terminal) => face;
}
type StepsNode = _SemanticParseNode<
"steps",
"eval",
CubeMove[],
[MoveNode, _Iter<_Terminal>, _Iter<MoveNode>, _Terminal]
>;
type MoveNode = _SemanticParseNode<
"move",
"eval",
CubeMove,
[FaceNode, _Terminal] | [FaceNode]
>;
type FaceNode = _SemanticParseNode<"face", "eval", CubeFace, [_Terminal]>;
Note that each rule has a strongly typed return value, and one or more expected possible children values, each of which corresponds to a label and callback. When implementing the callbacks for the action dictionary, each of those should satisfy _FrameworkVisitorCallback<T>.
Lastly, we get the actual semantics object like so. This is the part that is the ugliest - we're jamming any in a few spots to smooth over the more strongly typed actions and executor. I would expect to always have a function like this that is what actually gets exported (along with the semantics type) for consumption elsewhere.
semantics.spec.ts
function getCubeExecuteSemantics(parser: ohm.Grammar): CubeExecutor {
return parser.createSemantics().addOperation<any>("eval", {
steps,
move_antiClockwise,
move_doubleTurn,
move_clockWise,
face_front: getFaceAction(CubeFace.Front),
face_right: getFaceAction(CubeFace.Right),
face_left: getFaceAction(CubeFace.Left),
face_back: getFaceAction(CubeFace.Back),
face_up: getFaceAction(CubeFace.Top),
face_down: getFaceAction(CubeFace.Bottom),
} as any) as CubeExecutor;
}
I've also written unit tests both for functionality as well as type-safety (I'm using vitest and fast-check for testing):
semantics.spec.ts
import fc from "fast-check";
import * as ohm from "ohm-js";
import { assertType, describe, expect, expectTypeOf, it } from "vitest";
import {
_FrameworkVisitorCallback,
_Iter,
_SemanticExecutor,
_SemanticParseNode,
_Terminal,
_assertNodeIsStronglyTyped,
} from "./semantics";
describe("Semantics", () => {
const cubeParser = ohm.grammar(cubeGrammar);
const cubeSemantics = getCubeExecuteSemantics(cubeParser);
const match = cubeParser.match("F R' B2");
const adapter = cubeSemantics(match);
describe("type safety", () => {
it("should strongly type the executor", () => {
let callback = () => [
{ face: CubeFace.Front, rotation: Rotation.Clockwise } as CubeMove,
];
expect("eval" in adapter).toBeTruthy();
expectTypeOf(adapter.eval).toMatchTypeOf(callback);
assertType<() => CubeMove[]>(adapter.eval);
});
it("should strongly type actions", () => {
assertType<_FrameworkVisitorCallback<FaceNode>>(
(_face: _Terminal): CubeFace => {
return CubeFace.Front;
},
);
assertType<_FrameworkVisitorCallback<StepsNode>>(
// @ts-expect-error Types of property length are incompatible. Type `4` is not assignable to type `1`.
(_face: _Terminal): CubeFace => {
return CubeFace.Front;
},
);
});
it("should strongly type node callbacks", () => {
const a: MoveNode = {
eval: () => ({
face: CubeFace.Front,
rotation: Rotation.AntiClockwise,
}),
} as MoveNode;
assertType<MoveNode["eval"]>(() => ({
face: CubeFace.Front,
rotation: Rotation.AntiClockwise,
}));
// @ts-expect-error Type `CubeMove` is not assignable to type `number`
const b: number = a.eval();
});
});
it("should work", () => {
const result = cubeSemantics(cubeParser.match("F R' B2")).eval();
expect(result[0]).toStrictEqual({
face: CubeFace.Front,
rotation: Rotation.Clockwise,
});
expect(result[1]).toStrictEqual({
face: CubeFace.Right,
rotation: Rotation.AntiClockwise,
});
expect(result[2]).toStrictEqual({
face: CubeFace.Back,
rotation: Rotation.HalfTurn,
});
});
it("should pass PBT", () =>
fc.assert(
fc.property(
fc.array(
fc.tuple(
fc.constantFrom("F", "R", "B", "L", "U", "D"),
fc.constantFrom("", "'", "2"),
fc.integer({ min: 1, max: 10 }),
),
{ minLength: 1 },
),
steps => {
const toParse = steps
.map(
([face, rotation, spaces]) =>
`${face}${rotation}${" ".repeat(spaces)}`,
)
.join("");
const result = cubeParser.match(toParse);
expect(result.succeeded()).toBeTruthy();
const adapter = cubeSemantics(result);
const moves = adapter.eval();
expect(moves.length).toBe(steps.length);
for (const move of moves) {
expect("face" in move).toBeTruthy();
expect("rotation" in move).toBeTruthy();
}
},
),
));
});
Known Limitations
This works for my use-case (non-recursive rules). Unfortunately, it doesn't work for basic recursive examples (such as the Ohm Tutorial)
const mathGrammar = String.raw`
Math {
Exp = AddExp
AddExp = AddExp "+" MulExp -- plus
| AddExp "-" MulExp -- minus
| MulExp
MulExp = MulExp "*" number -- times
| MulExp "/" number -- div
| number
number = digit+
}
`;
type MathExecutor = _SemanticExecutor<{ eval: number }, {}>;
function getMathExecutor(parser: ohm.Grammar): MathExecutor {
return parser.createSemantics().addOperation<number>("eval", {
AddExp_plus: get_AddExp("+"),
AddExp_minus: get_AddExp("-"),
MulExp_times: get_MulExp("*"),
MulExp_div: get_MulExp("/"),
number: numberCB as ohm.Action<number>,
}) as MathExecutor;
}
function get_AddExp(
operator: "+" | "-",
): _FrameworkVisitorCallback<AddExpNode> {
switch (operator) {
case "+":
return (addExp: AddExpNode, _op: _Terminal, mulExp: MulExpNode) => {
const l = addExp.eval();
const r = mulExp.eval();
return l + r;
};
case "-":
return (addExp: AddExpNode, _op: _Terminal, mulExp: MulExpNode) => {
const l = addExp.eval();
const r = mulExp.eval();
return l - r;
};
}
}
function get_MulExp(
operator: "*" | "/",
): _FrameworkVisitorCallback<MulExpNode> {
switch (operator) {
case "*":
return (mulExp: MulExpNode, _op: _Terminal, number: NumberNode) => {
const l = mulExp.eval();
const r = number.eval();
return l * r;
};
case "/":
return (mulExp: MulExpNode, _op: _Terminal, number: NumberNode) => {
const l = mulExp.eval();
const r = number.eval();
return l / r;
};
}
}
const numberCB: _FrameworkVisitorCallback<NumberNode> = (
digits: _Iter<_Terminal>,
) => {
_assertNodeIsStronglyTyped<_Iter<_Terminal>>(digits, "_iter");
return Number.parseInt(digits.asIteration().children.join(""));
};
type ExpNode = _SemanticParseNode<"Exp", "eval", number, [AddExpNode]>;
type AddExpNode = _SemanticParseNode<
"AddExp",
"eval",
number,
[AddExpNode, _Terminal, MulExpNode] | [MulExpNode]
>;
type MulExpNode = _SemanticParseNode<
"MulExp",
"eval",
number,
[MulExpNode, _Terminal, NumberNode] | [NumberNode]
>;
type NumberNode = _SemanticParseNode<
"number",
"eval",
number,
[_Iter<_Terminal>]
>;
This gives type errors on AddExpNode and MulExpNode:
Error (TS4110) | Tuple type arguments circularly reference themselves.
This is fine for my use-case, but if a reviewer happens to find a way to handle these types of grammars that would be awesome.
Review Scope
This review will encompass my utility types and functions I wrote to support having a strongly typed set of semantic actions. It also includes some unit tests that use a simpler grammar than my full one for the purposes of testing my general semantic action wrappers.
This does not include the full grammar and semantic actions for the cube algorithms. I do not believe this counts as sample or incomplete code, because in this case the semantic action wrappers are complete code on their own. I'm not looking for a review of my full grammar and semantic actions at this time, although if you wanted to poke around the GitHub repo I suppose you're free to do so. If you disagree that this question is considered on-topic please leave a comment explaining why.
The known limitation above is a heads-up to reviewers. I do not consider my implementation to be incomplete or incorrect, because it is not a requirement that this work for recursive grammars. Please do not vote to close on "non-functional code" because of this limitation (or leave a comment explaining why you think this is non-functional).