When using the VA-API hstack filter, you have to put your video frames into VRAM. Normally, with the default software decoder, the video frames aren't uploaded to the GPU's memory/VRAM, but the computer's RAM, so VA-API can't access them from the GPU, and it won't work.
There are two ways of doing this. First, and possibly the "easiest" is to use the hwupload keyword/filter to load the video stream into GPU memory, after which you can then run the hstack_vaapi filter on them.
So your filter_complex might look a little something like this: -filter_complex [0:v]format=nv12,hwupload[0v],[1:v]format=nv12,hwupload[1v],[0v][1v]hstack_vaapi, where [0:v] represents the video stream in the first video (left.mp4), and [1:v] the video stream in the second video (right.mp4).
format=nv12 converts the video streams to the nv12 pixel format, which is one of the formats that GPUs tend to support in hardware, and [0v],[1v] are basically aliases for the output of those streams. You need hwupload, since the video streams are loaded into system RAM for those processes, and you need them to be put into the GPU's VRAM for the GPU to work on them (hence, upload).
The other way is just to decode using the GPU/hardware acceleration using VA-API (-hwaccel vaapi), which might do more or less the same thing in the background. You can then tell ffmpeg to upload those frames to GPU memory in VA-API's internal format with -hwaccel_output_format vaapi, which could then just fed straight to hstack_vaapi.
Something to note, though, is that the hardware-accelerated hstack filters are incredibly picky. You have to make sure that the video streams being fed into hstack_vaapi, for instance, have similar metadata (colourspace, time-base, etc), or else it won't work, with an oft-inexplicable error like Impossible to convert between the formats supplied by the filter 'graph 0 input from stream 1:0' and the filter 'auto_scale_0'.
This happens because ffmpeg automatically uses a scale filter to automatically convert the video streams so that they match (if you have it spit out verbose output, it will mention auto-inserting filter 'auto_scale_0' between the filter 'graph 0 input in stream 1:0' and the filter 'Parsed_hstack_vaapi_0'). This is all fine and dandy when you're using the software filter, since it can take the converted software frames (the ones in RAM), and then use them, but the hardware/GPU-accelerated filter can't use those frames, so ffmpeg's auto-conversion will fail.