Wrap Ohm.js to provide more type-safety when writing and using semantic adapters

Question

Background

I am working on a project to display and manipulate 3D puzzle cubes (e.g. Rubik's Cubes).

As the next step in the project, I want to be able to parse algorithm notation and then execute those actions. I currently have this supported, but I've been playing around with a way to make it more type-safe.

Ohm provides typings for TypeScript, but in a handful of places just uses any or generic Record<string, any> types. This isn't bad per-se, but I wouldn't mind a little bit of extra type-safety. This is most noticable when using Semantic Actions, which are Ohm's way of translating a parse tree into actions via visitor pattern.

There are three layers to this process (in my implementation at least):

The parts of my code where I call semantics(matchResult).<action>(). The semantics adapter object that is returned by semantics(matchResult) is basically just a Record<string, () => any>, so any consumers of this section of code just have to know (a) what semantic actions are available and (b) what their return types should be
The parts of my code where I create the Action<T> objects to define the semantics for a given action. Each rule + label combination gets a single function, that basically has the signature (...subNodes: ohm.Node[]) => any, where each ohm.Node has access to the same Record<string, () => any> object that is on the adapter.
The middle layer between these two where I actually construct a semantics object that uses the actions defined in #2, that will then be used by #1.

Goals

My goal of the rewrite was to satisfy the following, in decreasing order of importance:

Consumers of the semantic actions (i.e. anything calling adapter.<action>()) should be prevented from invoking actions that don't exist, or mis-typing the return types of those actions.
Consumers of the semantic actions should not have to use excessively complex generic types to safely use the adapters; they should just be able to use something like MySemanticAdapter<T> or MySpecificSemanticAdapter
Implementers of semantic actions should be able to write each individual ${rule}_${label} action in a way that has a unique set of strongly-typed inputs per node, and such that each node can have a distinct return type when calling node.<action>()
Implementers of semantic actions will need to use generic types, but as much of the cruft as possible should be abstracted from them.

Approach

semantics.ts

import { MatchResult, Node as OhmParseNode, Semantics } from "ohm-js";

/**
 * Type-safe semantics object that can take actions on a parse result
 * @template TOperations Mapping between operation names and their return types
 * @template TAttributes Mapping between attribute names and their value types
 */
export type _SemanticExecutor<
  TOperations extends Record<string, unknown>,
  TAttributes extends Record<string, unknown>,
> = Semantics & {
  (result: MatchResult): SemanticAdapter<TOperations, TAttributes>;
};

type SemanticAdapter<
  TOperations extends Record<string, unknown>,
  TAttributes extends Record<string, unknown>,
> = {
  [TOperationKey in keyof TOperations]: () => TOperations[TOperationKey];
} & {
  [TAttributeKey in keyof TAttributes]: TAttributes[TAttributeKey];
};

/**
 * Type-safe semantic action parsing node that requires strongly-typed children
 * and operations
 * @template TCtorName The name of the parse node that applies here
 * @template TOperationName The name of the operation
 * @template TOperationReturnType The return type of the operation
 * @template TChildrenType The child nodes that are expected to be had
 */
export type _SemanticParseNode<
  TCtorName extends string,
  TOperationName extends string,
  TOperationReturnType,
  TChildrenType extends OhmParseNode[],
> =
  // this looks useless, but actually serves to make this distributive,
  // i.e. a given _SemanticParseNode with a union TChildrenType will result in a
  // union of possible result types. This means that _GetFrameworkExecuteCallback
  // will yield a union type as well. This allows us to have multiple labels per
  // node constructor, and have each satisfy a different callback signature
  TChildrenType extends OhmParseNode[]
    ? OhmParseNode & {
        ctorName: TCtorName;
        children: TChildrenType;
      } & {
        [Key in TOperationName]: () => TOperationReturnType;
      }
    : never;

/**
 * Assert that a node is the strongly-typed node we think it is
 * @param node The node that should be strongly typed
 * @param ctorName The name of the constructor of this node
 */
export function _assertNodeIsStronglyTyped<
  TNode extends _SemanticParseNode<string, string, unknown, OhmParseNode[]>,
>(node: OhmParseNode, ctorName: TNode["ctorName"]): asserts node is TNode {
  if (node.ctorName !== ctorName) {
    throw new Error(`Exepcted a ${ctorName} node, but got a ${node.ctorName}`);
  }
}

/**
 * A terminal iteration node
 * @template TChildren The type of node this iteration should have as children
 */
export type _Iter<TChildren extends OhmParseNode> = _SemanticParseNode<
  "_iter",
  string,
  unknown,
  TChildren[]
>;

/**
 * A terminal node (i.e. one that is just a literal)
 */
export type _Terminal = OhmParseNode;

/**
 * Get the appropriate callback type for this particular semantic action's ActionDict
 * @template TSemanticParseNode The semantic action node type
 */
export type _FrameworkVisitorCallback<TSemanticParseNode> =
  TSemanticParseNode extends _SemanticParseNode<
    string,
    string,
    infer TReturnType,
    infer TChildrenType
  >
    ? TChildrenType extends OhmParseNode[]
      ? (...args: TChildrenType) => TReturnType
      : never
    : never;

For goals #1 and #2, the most important feature here is the _SemanticExecutor. I've chosen to name it with a leading underscore because I don't intend for this type to be directly exposed to callers, but rather than whoever uses it should probably provide a clean wrapper around this, i.e.

export type CubeExecutor = _SemanticExecutor<{ eval: CubeMove[] }, {}>;

For goals #3 and #4, the most important features are the _SemanticParseNode, _assertNodeIsStronglyTyped, and _FrameworkVisitorCallback.

_SemanticParseNode is what wraps the ohm.Node object in a type-safe fashion, _assertNodeIsStronglyTyped is there is make sure the grammar matches the actions, and _FrameworkVisitorCallback is used to make sure that the semantic actions we provide correctly match one of the signatures of the node. You may note that there are a few places we use extends to ensure that we get distributive conditional types. Without that, we would get this as our signature:

(...args: Node[] | [Node, Node] | [Node, Node, Node]) => number

When what we really wanted was this:

| (...args: Node[]) => number 
| (...args: [Node, Node]) => number 
| (...args: [Node, Node, Node]) => number

This allows me to write distinct actions for every ${rule}_${label} pair, which is the more idiomatic way in Ohm.

In order to test this I created a simplified grammar of cube moves; it supports one-or-more space-separated face rotations, ignoring cube size, internal slices, or whole-cube rotations.

semantics.spec.ts

const cubeGrammar = String.raw`
  Algorithm {
    steps = move (space+ move)* space*

    move = face "'" -- antiClockwise
         | face "2" -- doubleTurn
         | face     -- clockWise

    face = "F"    -- front
         | "R"    -- right
         | "L"    -- left
         | "B"    -- back
         | "U"    -- up
         | "D"    -- down
  }
`;

type CubeExecutor = _SemanticExecutor<{ eval: CubeMove[] }, {}>;

enum CubeFace {
  Front,
  Right,
  Left,
  Back,
  Top,
  Bottom,
}

enum Rotation {
  Clockwise,
  HalfTurn,
  AntiClockwise,
}

type CubeMove = {
  face: CubeFace;
  rotation: Rotation;
};

The implementation of the semantic action then looks like this:

semantics.spec.ts

const steps: _FrameworkVisitorCallback<StepsNode> = (
  firstMove: MoveNode,
  _spaces: _Iter<_Terminal>,
  restMoves: _Iter<MoveNode>,
  _trailSpaces: _Terminal,
) => {
  _assertNodeIsStronglyTyped(firstMove, "move");
  _assertNodeIsStronglyTyped(restMoves, "_iter");
  return [
    firstMove.eval(),
    ...restMoves.asIteration().children.map((child: ohm.Node | MoveNode) => {
      _assertNodeIsStronglyTyped<MoveNode>(child, "move");
      return child.eval();
    }),
  ];
};

const move_antiClockwise: _FrameworkVisitorCallback<MoveNode> = (
  face: FaceNode,
  _antiClockwiseInd: _Terminal,
) => {
  _assertNodeIsStronglyTyped(face, "face");
  return { face: face.eval(), rotation: Rotation.AntiClockwise };
};
const move_doubleTurn: _FrameworkVisitorCallback<MoveNode> = (
  face: FaceNode,
  _doubleTurnInd: _Terminal,
) => {
  _assertNodeIsStronglyTyped(face, "face");
  return { face: face.eval(), rotation: Rotation.HalfTurn };
};
const move_clockWise: _FrameworkVisitorCallback<MoveNode> = (
  face: FaceNode,
) => {
  _assertNodeIsStronglyTyped(face, "face");
  return { face: face.eval(), rotation: Rotation.Clockwise };
};

function getFaceAction(face: CubeFace): _FrameworkVisitorCallback<FaceNode> {
  return (_face: _Terminal) => face;
}

type StepsNode = _SemanticParseNode<
  "steps",
  "eval",
  CubeMove[],
  [MoveNode, _Iter<_Terminal>, _Iter<MoveNode>, _Terminal]
>;
type MoveNode = _SemanticParseNode<
  "move",
  "eval",
  CubeMove,
  [FaceNode, _Terminal] | [FaceNode]
>;
type FaceNode = _SemanticParseNode<"face", "eval", CubeFace, [_Terminal]>;

Note that each rule has a strongly typed return value, and one or more expected possible children values, each of which corresponds to a label and callback. When implementing the callbacks for the action dictionary, each of those should satisfy _FrameworkVisitorCallback<T>.

Lastly, we get the actual semantics object like so. This is the part that is the ugliest - we're jamming any in a few spots to smooth over the more strongly typed actions and executor. I would expect to always have a function like this that is what actually gets exported (along with the semantics type) for consumption elsewhere.

semantics.spec.ts

function getCubeExecuteSemantics(parser: ohm.Grammar): CubeExecutor {
  return parser.createSemantics().addOperation<any>("eval", {
    steps,
    move_antiClockwise,
    move_doubleTurn,
    move_clockWise,
    face_front: getFaceAction(CubeFace.Front),
    face_right: getFaceAction(CubeFace.Right),
    face_left: getFaceAction(CubeFace.Left),
    face_back: getFaceAction(CubeFace.Back),
    face_up: getFaceAction(CubeFace.Top),
    face_down: getFaceAction(CubeFace.Bottom),
  } as any) as CubeExecutor;
}

I've also written unit tests both for functionality as well as type-safety (I'm using vitest and fast-check for testing):

semantics.spec.ts

import fc from "fast-check";
import * as ohm from "ohm-js";
import { assertType, describe, expect, expectTypeOf, it } from "vitest";
import {
  _FrameworkVisitorCallback,
  _Iter,
  _SemanticExecutor,
  _SemanticParseNode,
  _Terminal,
  _assertNodeIsStronglyTyped,
} from "./semantics";

describe("Semantics", () => {
  const cubeParser = ohm.grammar(cubeGrammar);
  const cubeSemantics = getCubeExecuteSemantics(cubeParser);
  const match = cubeParser.match("F R' B2");
  const adapter = cubeSemantics(match);

  describe("type safety", () => {
    it("should strongly type the executor", () => {
      let callback = () => [
        { face: CubeFace.Front, rotation: Rotation.Clockwise } as CubeMove,
      ];
      expect("eval" in adapter).toBeTruthy();
      expectTypeOf(adapter.eval).toMatchTypeOf(callback);
      assertType<() => CubeMove[]>(adapter.eval);
    });
    it("should strongly type actions", () => {
      assertType<_FrameworkVisitorCallback<FaceNode>>(
        (_face: _Terminal): CubeFace => {
          return CubeFace.Front;
        },
      );

      assertType<_FrameworkVisitorCallback<StepsNode>>(
        // @ts-expect-error Types of property length are incompatible. Type `4` is not assignable to type `1`.
        (_face: _Terminal): CubeFace => {
          return CubeFace.Front;
        },
      );
    });
    it("should strongly type node callbacks", () => {
      const a: MoveNode = {
        eval: () => ({
          face: CubeFace.Front,
          rotation: Rotation.AntiClockwise,
        }),
      } as MoveNode;
      assertType<MoveNode["eval"]>(() => ({
        face: CubeFace.Front,
        rotation: Rotation.AntiClockwise,
      }));

      // @ts-expect-error Type `CubeMove` is not assignable to type `number`
      const b: number = a.eval();
    });
  });

  it("should work", () => {
    const result = cubeSemantics(cubeParser.match("F R' B2")).eval();
    expect(result[0]).toStrictEqual({
      face: CubeFace.Front,
      rotation: Rotation.Clockwise,
    });
    expect(result[1]).toStrictEqual({
      face: CubeFace.Right,
      rotation: Rotation.AntiClockwise,
    });
    expect(result[2]).toStrictEqual({
      face: CubeFace.Back,
      rotation: Rotation.HalfTurn,
    });
  });

  it("should pass PBT", () =>
    fc.assert(
      fc.property(
        fc.array(
          fc.tuple(
            fc.constantFrom("F", "R", "B", "L", "U", "D"),
            fc.constantFrom("", "'", "2"),
            fc.integer({ min: 1, max: 10 }),
          ),
          { minLength: 1 },
        ),
        steps => {
          const toParse = steps
            .map(
              ([face, rotation, spaces]) =>
                `${face}${rotation}${" ".repeat(spaces)}`,
            )
            .join("");

          const result = cubeParser.match(toParse);
          expect(result.succeeded()).toBeTruthy();
          const adapter = cubeSemantics(result);
          const moves = adapter.eval();
          expect(moves.length).toBe(steps.length);
          for (const move of moves) {
            expect("face" in move).toBeTruthy();
            expect("rotation" in move).toBeTruthy();
          }
        },
      ),
    ));
});

Known Limitations

This works for my use-case (non-recursive rules). Unfortunately, it doesn't work for basic recursive examples (such as the Ohm Tutorial)


const mathGrammar = String.raw`
  Math {
    Exp = AddExp

    AddExp = AddExp "+" MulExp  -- plus
           | AddExp "-" MulExp  -- minus
           | MulExp

    MulExp = MulExp "*" number  -- times
           | MulExp "/" number  -- div
           | number

    number = digit+
  }
`;

type MathExecutor = _SemanticExecutor<{ eval: number }, {}>;

function getMathExecutor(parser: ohm.Grammar): MathExecutor {
  return parser.createSemantics().addOperation<number>("eval", {
    AddExp_plus: get_AddExp("+"),
    AddExp_minus: get_AddExp("-"),
    MulExp_times: get_MulExp("*"),
    MulExp_div: get_MulExp("/"),
    number: numberCB as ohm.Action<number>,
  }) as MathExecutor;
}

function get_AddExp(
  operator: "+" | "-",
): _FrameworkVisitorCallback<AddExpNode> {
  switch (operator) {
    case "+":
      return (addExp: AddExpNode, _op: _Terminal, mulExp: MulExpNode) => {
        const l = addExp.eval();
        const r = mulExp.eval();
        return l + r;
      };
    case "-":
      return (addExp: AddExpNode, _op: _Terminal, mulExp: MulExpNode) => {
        const l = addExp.eval();
        const r = mulExp.eval();
        return l - r;
      };
  }
}
function get_MulExp(
  operator: "*" | "/",
): _FrameworkVisitorCallback<MulExpNode> {
  switch (operator) {
    case "*":
      return (mulExp: MulExpNode, _op: _Terminal, number: NumberNode) => {
        const l = mulExp.eval();
        const r = number.eval();
        return l * r;
      };
    case "/":
      return (mulExp: MulExpNode, _op: _Terminal, number: NumberNode) => {
        const l = mulExp.eval();
        const r = number.eval();
        return l / r;
      };
  }
}
const numberCB: _FrameworkVisitorCallback<NumberNode> = (
  digits: _Iter<_Terminal>,
) => {
  _assertNodeIsStronglyTyped<_Iter<_Terminal>>(digits, "_iter");
  return Number.parseInt(digits.asIteration().children.join(""));
};

type ExpNode = _SemanticParseNode<"Exp", "eval", number, [AddExpNode]>;
type AddExpNode = _SemanticParseNode<
  "AddExp",
  "eval",
  number,
  [AddExpNode, _Terminal, MulExpNode] | [MulExpNode]
>;
type MulExpNode = _SemanticParseNode<
  "MulExp",
  "eval",
  number,
  [MulExpNode, _Terminal, NumberNode] | [NumberNode]
>;
type NumberNode = _SemanticParseNode<
  "number",
  "eval",
  number,
  [_Iter<_Terminal>]
>;

This gives type errors on AddExpNode and MulExpNode:

Error (TS4110) | Tuple type arguments circularly reference themselves.

This is fine for my use-case, but if a reviewer happens to find a way to handle these types of grammars that would be awesome.

Review Scope

This review will encompass my utility types and functions I wrote to support having a strongly typed set of semantic actions. It also includes some unit tests that use a simpler grammar than my full one for the purposes of testing my general semantic action wrappers.

This does not include the full grammar and semantic actions for the cube algorithms. I do not believe this counts as sample or incomplete code, because in this case the semantic action wrappers are complete code on their own. I'm not looking for a review of my full grammar and semantic actions at this time, although if you wanted to poke around the GitHub repo I suppose you're free to do so. If you disagree that this question is considered on-topic please leave a comment explaining why.

The known limitation above is a heads-up to reviewers. I do not consider my implementation to be incomplete or incorrect, because it is not a requirement that this work for recursive grammars. Please do not vote to close on "non-functional code" because of this limitation (or leave a comment explaining why you think this is non-functional).

Stack Exchange Network

Wrap Ohm.js to provide more type-safety when writing and using semantic adapters

Background

Goals

Approach

Known Limitations

Review Scope

0

You must log in to answer this question.

Hot Network Questions

Wrap Ohm.js to provide more type-safety when writing and using semantic adapters

Background

Goals

Approach

Known Limitations

Review Scope

0

You must log in to answer this question.

Related

Hot Network Questions