Revisions to Yet another C++ JSON parser

added 8 characters in body

Source Link

edited Jun 29, 2014 at 20:13

34.1k
4
77
145

1. I haven't tried to work through all the details of ensuring that nested structures would remain unambiguous, but at least offhand it doesn't seem terribly difficult.^{1. I haven't tried to work through all the details of ensuring that nested structures would remain unambiguous, but at least offhand it doesn't seem terribly difficult.}

edited body

Source Link

edited Jun 29, 2014 at 15:17

Jerry Coffin

34.1k
4
77
145

If I'm going to read the file, I think I'd generally prefer those log entries coalesced into something like Array (4 StringsNumbers). Given that the intent is for this to parse JSON, it's probably sufficient specify "JSON" in one place in the output, rather than prefixing every single line of output with "JSON". In short, as it stands right now, this produces output that's extremely verbose, leading to very low information density, so the reader needs to read and digest a great deal of output to understand even a relatively small amount of input--in fact, I'm pretty sure it would usually be easier to read the input file directly.

If I'm going to read the file, I think I'd generally prefer those log entries coalesced into something like Array (4 Strings). Given that the intent is for this to parse JSON, it's probably sufficient specify "JSON" in one place in the output, rather than prefixing every single line of output with "JSON". In short, as it stands right now, this produces output that's extremely verbose, leading to very low information density, so the reader needs to read and digest a great deal of output to understand even a relatively small amount of input--in fact, I'm pretty sure it would usually be easier to read the input file directly.

If I'm going to read the file, I think I'd generally prefer those log entries coalesced into something like Array (4 Numbers). Given that the intent is for this to parse JSON, it's probably sufficient specify "JSON" in one place in the output, rather than prefixing every single line of output with "JSON". In short, as it stands right now, this produces output that's extremely verbose, leading to very low information density, so the reader needs to read and digest a great deal of output to understand even a relatively small amount of input--in fact, I'm pretty sure it would usually be easier to read the input file directly.

Source Link

answered Jun 29, 2014 at 15:04

Jerry Coffin

34.1k
4
77
145

I've hesitated for quite a while, but decided there are a few points worth commenting on.

The first and most obvious is that (by design) this simply doesn't do much. It can log the types of statements encountered in the JSON you give to it, but that's all. To be of much real use, you'd normally want to create something at least vaguely AST-like, storing the JSON input data as a set of nodes with attributes describing the data in the nodes.

Alternatively, you could build some sort of call-back style framework where the parser called a specific function when each type of input was detected, and it would be up to the client code to decide how to store the data (and which data to store).

As it stands right now, however, the most it can produce is a description of the overall structure of the input file. If that's really all you want, I think it's worth considering whether you're trying to produce output to be read by people or by the machine for further processing.

If you want human-readable output, I think I'd prefer that somewhat more processing be done on the data before it was printed out. For example, as it stands now, an array like [1, 2, 3, 4] would produce a line of output for the JSON array, another for the value-list, and another for each item in the array.

If I'm going to read the file, I think I'd generally prefer those log entries coalesced into something like Array (4 Strings). Given that the intent is for this to parse JSON, it's probably sufficient specify "JSON" in one place in the output, rather than prefixing every single line of output with "JSON". In short, as it stands right now, this produces output that's extremely verbose, leading to very low information density, so the reader needs to read and digest a great deal of output to understand even a relatively small amount of input--in fact, I'm pretty sure it would usually be easier to read the input file directly.

If your primary intent is to produce output for the computer to read for further processing, it's less necessary to go to extra work to coalesce the information, but still useful to keep the information compact. Given the small number of possibilities in a JSON file, I'd probably assign a single letter to each of the strings the parser can now produce, and just write those out. Alternatively, do some coalescing here as well, so repetitions of the same pattern are signaled by the pattern (in parentheses if it's more than one letter) followed by a number in brackets. For example, a map of 4 string/number pairs followed by an array of 6 numbers might come out something like: M(SN)[4]AN[6]. This is still somewhat human readable (if necessary), and a lot quicker for a parser on the receiving end to sort out (not to mention just being a lot less data to store, transmit, etc.)¹

As far as the style of code itself goes, I really have only a few comments, and pretty minor ones at that.

Given a list of alternatives like:

 production : 
            | Alternative1
            | Alternative2

I prefer to add a semicolon to signal the end of the list:

 production : 
            | Alternative1
            | Alternative2
            ;

This is at a large scale, and may even be out of your control, but I found this code:
```
 try
 {
     yyFlexLexer         lexer(&std::cin, &std::cout);
     yy::JsonParser      parser(lexer);

     std::cout << (parser.parse() == 0 ? "OK" : "FAIL") << "\n";
 }
 catch (std::exception const& e)
 {
     std::cout << "Exception: " << e.what() << "\n";
 }
```
...somewhat ugly. Combining return values and exceptions like this, and having to respond to both makes it seem rather...disjointed. I'd rather see one style adopted throughout, so you can depend on failure always being signaled by throwing an exception or else that it's always signaled by the return value. As it stands right now, we not only end up with this code being fairly ugly, but we also end up with rather uneven error reporting--errors reported via exception may have fairly detailed error messages, but those reported via return value all produce an identical (and probably unhelpful) "FAIL".

1. I haven't tried to work through all the details of ensuring that nested structures would remain unambiguous, but at least offhand it doesn't seem terribly difficult.

Stack Exchange Network

Return to Answer