2
\$\begingroup\$

Questions

  • Are there edge-cases that I've missed?
  • Do HTTP response header values ever contain JSON-like data?
  • Any style pointers related to code readability?
  • Are there other/better Vim (versions 8 or higher) builtins for what I'm attempting to do?

Story time

Currently I'm working on integrating an API with Vim (classy flavored Vim, not NeoVim), but the API, or proxy I wrote, seems to intermittently return malformed HTTP responses. In an ideal world I'd split on \r\n\r\n to separate the headers from body data of HTTP responses, but this timeline isn't ideal 🤷

Here are a few examples of the most offensively malformed HTTP response;

  • Status line joined to body, and no headers

    HTTP/1.1 200 OK{ "key": "value" }

  • No status line, and body joined to headers

    Content-Type: application/json{ "key": "value" }

  • No status line or headers and dictionaries are touching

    { "key": "value" }{ "key": "value" }

  • dictionaries are touching with nested data or other fun stuff

    { "key": "\"val}{}ue\"" }{ "nested": { "dict": [419, 68] } }

... and for clarity, here's what an ideal HTTP response would look like;

HTTP/1.1 200 OK
Server: SimpleHTTP/0.6 Python/3.12.6
Date: Sat, 28 Sep 2024 23:29:00 GMT
Content-Type: application/json

{ "key": "value" }
{ "key": "\"val}{}ue\"" }
{ "nested": { "dict": [419, 68] } }

Notes about input

So far I've yet to find a simple RegExp that also handles nested dictionary data-structures, &&/|| escaped double-quotes, &&/|| curly-braces within double-quotes.

And the more complex RegExp attempts that satisfy some of the parsing requirements both fail with certain inputs, and more importantly do not spark joy.

Currently I don't need to worry about the API sending a list as a response, eg. nothing like;

[{ "key": "value" }]

... But lists within dictionary values are expected.


Current solution

So far I've encoded a character-by-character scanning loop that detects when within a string or curly-brace state, plus I believe handles escaped quotes correctly, and when it detects a whole dictionary makes a call out to json_decode

function! ExtractJSONDicts(data) abort
  let l:dictionary_list = []

  let l:index = 0
  let l:slice_start = 0
  let l:inside_string = v:false
  let l:escape_count = 0
  let l:curly_depth = 0
  while l:index < len(a:data)
    let l:character = a:data[l:index]

    if l:inside_string
      if l:character == '\'
        let l:escape_count += 1
      else
        if l:character == '"'
          if l:escape_count % 2 == 0
            let l:inside_string = v:false
          else
            let l:inside_string = v:true
          endif
        endif

        let l:escape_count = 0
      endif
    elseif l:character == '"'
      let l:inside_string = v:true
    elseif l:character == '{'
      let l:curly_depth += 1
    elseif l:character == '}'
      let l:curly_depth -= 1
      if l:curly_depth == 0
        call add(l:dictionary_list, json_decode(a:data[l:slice_start:l:index]))

        let l:slice_start = l:index + 1
        let l:inside_string = v:false
        let l:escape_count = 0
      endif
    elseif l:curly_depth == 0
      let l:slice_start = l:index + 1
    endif

    let l:index += 1
  endwhile

  return l:dictionary_list
endfunction

For brevity of this OP; here are where unit tests can be double-checked.


Related documentation

  • :help channel-raw
  • :help channel-open-options

For the curious readers; here be a perma-link to source code of callback parser function in the context of the plugin that I'm working on.

\$\endgroup\$

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.