0

I have a one-time URL encoded string:

$encodedJson = "%5B%7B%0A%09%22base%22%3A%20%7B%0A%09%09%22url%22%3A%20%22abc.com%22%2C%0A%09%09%22referrer%22%3A%20%22xyz.com%22%0A%09%7D%0A%7D%2C%20%7B%0A%09%22client%22%3A%20%7B%0A%09%09%22Pixel%22%3A%20false%2C%0A%09%09%22screen%22%3A%20%221680x1050%22%0A%09%7D%0A%7D%5D"

If I use the following functions, I have a decoded JSON, which is an array:

$decodedJsonArray = json_decode(rawurldecode($encodedJson), true);

Then print_r($decodedJsonArray); gives me the desired output:

Array
(
    [0] => Array
        (
            [base] => Array
                (
                    [url] => abc.com
                    [referrer] => xyz.com
                )

        )

    [1] => Array
        (
            [client] => Array
                (
                    [Pixel] => 
                    [screen] => 1680x1050
                )

        )

)

Now, let's say I have a multiple-times URL encoded string:

$encodedJson = "%25255B%25257B%25250A%252509%252522base%252522%25253A%252520%25257B%25250A%252509%252509%252522url%252522%25253A%252520%252522abc.com%252522%25252C%25250A%252509%252509%252522referrer%252522%25253A%252520%252522xyz.com%252522%25250A%252509%25257D%25250A%25257D%25252C%252520%25257B%25250A%252509%252522client%252522%25253A%252520%25257B%25250A%252509%252509%252522Pixel%252522%25253A%252520false%25252C%25250A%252509%252509%252522screen%252522%25253A%252520%2525221680x1050%252522%25250A%252509%25257D%25250A%25257D%25255D"

This string is three-times URL encoded. Now, I want to achieve the same JSON array as before. I am trying to write a function similar to the following:

function recursiveJsonDecode($encodedJson) {
    if (isJson($encodedJson)) {
        return $encodedJson;
    } else {
        $decodedJsonArray = json_decode(rawurldecode($encodedJson), true);
        return $decodedJsonArray;
    }
}

But it's not working. Any help would be much appreciated.

7
  • What exactly is not working in the second example? show us the output you get. Commented Nov 29, 2017 at 15:42
  • I didn't know PHP had a function called isJson Commented Nov 29, 2017 at 15:43
  • 6
    A very important feature of a recursive function is that it calls itself. Commented Nov 29, 2017 at 15:43
  • @DontPanic a function that doesn't is called a deaf-mute recursive function Commented Nov 29, 2017 at 15:45
  • 2
    This is just a theoretical exercise, I assume? Because in reality you rather shouldn’t be dealing with data that was encoded more than once to begin with ... this would rather be a reason to reject the data IMHO, and tell whoever is sending it to you to get their stuff in order ... Commented Nov 29, 2017 at 15:47

4 Answers 4

3

Shipping & Post Office Supplies | USPS.com - Postal Store

Ordering shipping supplies is cute because it's the only time you can ever receive a box full of nothing but boxes!

When you receive your boxes in the mail, what do you do with them tho? I remove only the outermost packaging and set my boxes on a shelf; maybe I'll use them to send things out later. Someone that writes a recursive JSON decoder might do something differently tho – they might attempt to open all of those boxes and be sad to find out they received nothing!

"I opened every single box and I never found my order's contents!" the recursive JSON decoder laments


Don't decode it just because you can

There's no way to determine if a string is JSON encoded or not. Because of this, it's not the consumer's job to decide whether to parse or not.

Take for example, the JSON string, "5" – is it a single-encoded string of '5'?

json_encode("5");
// => '"5"'

or is it a double-encoded integer 5?

json_encode(json_encode(5));
// => '"5"'

If you're looking at the only the JSON-encoded result, there's no way to tell, but 5 (int) and "5" (string) are as different as [5] or {value: 5} – they're completely different types – The JSON consumer must know how many times the value has been encoded. That's not complicated, as you should avoid double-encoding in the first place.


When we decode JSON, we only do it once

json_decode('"5"');
// => "5"

Your recursive function would effectively do this

json_decode(json_decode('"5"'));
// => 5

Only one of those is a valid answer – this is why you see all isJson functions built around error-checking a decode operation – people trick themselves into thinking that just because you can decode a string, that it was JSON in the first place.

Returning to our USPS example, it would mean you only stop opening boxes once you encounter a thing that cannot be opened – I just keep opening boxes and once I discover they're all empty, I'm stuck wondering where my order contents are.

This idea that you can somehow detect when to stop decoding is broken from the start – In this example, watch what happens when I have a simple form submission and a recursive JSON decoder being used to process the submission...

If I fill in a form with my name "[]", and now you use a recursive JSON decoder on the submitted form data, you will end up with

$formData == [ "name" => [] ] // name is an array, wups!

Whereas a non-recursive JSON decoder would keep the name as a string

$formData == [ "name" => "[]" ] // name is a string, as the user typed

Just because you can parse it, doesn't mean you should


Once a string has been double or triple encoded (URL-encoded, JSON-encoded, or whatever-encoded)- the only way to reverse it is to decode it exactly the same amount of times

Sign up to request clarification or add additional context in comments.

6 Comments

Then why are all of his code samples in php, and there's a php tag on his question?
The question is pretty vague. I took it to be about url encoding and not specifically about JSON.
The questions is not vague imo – the person is how to recursively decode JSON and shows a function with their attempt to do so
@naomik I think the first line of your answer is wrong, see this SO post for how to detect JSON data: stackoverflow.com/q/6041741/3088508
David, just because a string can be JSON-decoded does not mean it's JSON
|
1

You can treat URL decoding as a fixed point operation:

function fixedPointDecode($string) {
     $decoded = urldecode($string); 
     while ($decoded != $string) {
         $string = $decoded;
         $decoded = urldecode($string);    
     }
     return $decoded;
}

The idea is that if the result of urldecode does not change the original string then it's fully decoded.

Then you can do:

 json_decode(fixedPointDecode($string));

Note: I have not found any indication that there are any URL encoded strings that do not converge to a fixed point but I'm curious if anyone else has.

6 Comments

The only problem with this is sometimes you will encounter a valid value that also appears to be encoded. Say I have a device ID of "frt%235" – that's the actual device ID – if I let a program decide when to stop decoding, it would change that to frt#5, which is invalid in this case
If you change the while to while (json_decode($decoded) == null) then this might work but in that case the function will cause an infinite loop if there's no valid json string hidden in the input. I recommend you solve the problem at its source by ensuring the string is only urlencoded once
but that's the point, the consumer can't control the contents of the data – it's broken to rely upon null/error-checks as a signal for whether a string should be parsed
Well, that's partially true, but this is also what you get for choosing buggy providers. Getting a string that's urlencoded multiple times should merit a bug report to the provider and the provider should not claim that it's working as intended. In no world should double encoding the entire string make any sense.
@naomik If you know how many times it's encoded in advance then that's a much better strategy than a fixed point operation, definitely.
|
1

json_decode will return null if it's not valid JSON as it says here:

NULL is returned if the json cannot be decoded or if the encoded data is deeper than the recursion limit.

So just test it:

while(($decodedJsonArray = json_decode($encodedJson, true)) === null) {
    $encodedJson = rawurldecode($encodedJson);
}

print_r($decodedJsonArray);

To use your isJson function:

while(!isJson($encodedJson)) {
    $encodedJson = rawurldecode($encodedJson);
}
$decodedJsonArray = json_decode($encodedJson, true);

print_r($decodedJsonArray);

4 Comments

Wow, that's a lot cleaner than my recursive function. Have an upvote!
To go with some random downvote :(
Hi, as I commented to David's answer, your function is also working if I have a string like $encodedJson = "%25etcetc"; But it's not working when I parse the string from a file i.e. $encodedJson = file_get_contents("test.txt"); or $encodedJson = file_get_contents("test.json"); . Any idea why?
AbraCadavar, you just ordered 1 case of Priority Mail Shoe Box from USPS.com – how many boxes will you open when you receive your order?
0

Calling rawurldecode(rawurldecode(rawurldecode($encodedJson))) reveals that your string is actually rawurldecoded 3 times, not json_encoded 3 times, so I made the recursive function rawurldecode it on every iteration until the json_decode worked:

$encodedJson = "%25255B%25257B%25250A%252509%252522base%252522%25253A%252520%25257B%25250A%252509%252509%252522url%252522%25253A%252520%252522abc.com%252522%25252C%25250A%252509%252509%252522referrer%252522%25253A%252520%252522xyz.com%252522%25250A%252509%25257D%25250A%25257D%25252C%252520%25257B%25250A%252509%252522client%252522%25253A%252520%25257B%25250A%252509%252509%252522Pixel%252522%25253A%252520false%25252C%25250A%252509%252509%252522screen%252522%25253A%252520%2525221680x1050%252522%25250A%252509%25257D%25250A%25257D%25255D";

function recursiveJsonDecode ($inJson) {
    $outputArr = json_decode($inJson);
    if (json_last_error() == JSON_ERROR_NONE) {
        return $outputArr;
    } else {
        return recursiveJsonDecode(rawurldecode($inJson));
    }
}

print_r(recursiveJsonDecode($encodedJson));

eval.in demo

7 Comments

this cannot work when applied to any generic datum
The way the OP's input data is, I'd say that it would work, if it was screwed up in the same way as his input array was (json_encoded first, then rawurlencoded). See here: eval.in/910031
Hi, you function is working if I have a string like $encodedJson = "%25etcetc"; But it's not working when I parse the string from a file i.e. $encodedJson = file_get_contents("test.txt"); or $encodedJson = file_get_contents("test.json"); . Any idea why?
@IqbalNazir It's working for me when I get $encodedJson from an array, but because I don't actually know what's in your test.txt file, I'm just assuming it's what you posted in your answer. See this eval.in: eval.in/910041 Please post what's in your test.txt file to a service like www.pastebin.com if my assumption is incorrect.
Thanks man. It's actually working. I had " " symbol in my text file. After removing them, it's working.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.