Skip to main content

Timeline for Get a hash from XML

Current License: CC BY-SA 4.0

29 events
when toggle format what by license comment
S May 11, 2024 at 3:32 vote accept gdonega
May 10, 2024 at 20:53 answer added Michael Kay timeline score: 4
May 9, 2024 at 20:19 answer added Stephen C. Steel timeline score: 4
May 9, 2024 at 18:39 comment added gdonega I've updated with the canonicalized result!
May 9, 2024 at 18:36 history edited gdonega CC BY-SA 4.0
added 543 characters in body
May 8, 2024 at 16:01 comment added JimmyJames @gdonega Sorry, did you see differences in the canonical output? If so, it would be helpful to future readers to show that.
May 8, 2024 at 15:01 comment added gdonega Ohh! I see! That DOES make a lot of sense! And I can see why c# keeps the whitespace! Thanks for the explanation!
May 8, 2024 at 13:51 comment added JimmyJames I could have missed something but the only thing that I would expect to change when you canonicalize that document is that the <e3> element will be changed to: <e3 id="E3" xml:base="foo"></e3> See here.
May 8, 2024 at 13:46 comment added JimmyJames I just looked at the canonical spec and it does consider any whitespace within an element to be significant. That means all the line endings and well as indentation in your example doc. I see in the comments history that you tried stripping it out (sorry missed that before) but my experience says you should always do that.
May 8, 2024 at 13:24 comment added JimmyJames @gdonega I would suggest trying a couple things: 1. print out the canonical forms and look for differences. 2. Strip out all unnecessary whitespace.
May 8, 2024 at 11:43 comment added gdonega In this case, the differences are break-lines, unnecessary whitespace, etc. The logical equivalence is correct.
May 8, 2024 at 7:20 comment added Bart van Ingen Schenau @gdonega, logical equivalence of two XML documents does not need to imply identical byte streams. Compare the canonicalized string that you feed into the hash algorithm and see if you can explain the differences in the byte stream. Be also on the lookout for things like a byte-order mark at the start of the byte stream.
May 8, 2024 at 4:30 comment added Cort Ammon I'd just like to say that this demonstrates the outrageous complexity of XML. In any other data format, the canonical format is the canonical format. There's exactly one, and everybody does it the same. With XML, even using the same algorithm, C14N, algorithms still find ambiguity.
May 8, 2024 at 2:29 comment added gdonega I've updated the question with my code and my results!
May 8, 2024 at 2:27 history edited gdonega CC BY-SA 4.0
Adding java and c# code
May 8, 2024 at 1:47 comment added gdonega Alright! I'll update my question...
May 7, 2024 at 21:35 comment added Michael Kay It would be helpful if you could be more precise. Please show us an input document that is canonicalized differently by Java and C#, and show us exactly how you invoked the canonicalization process in each case. They SHOULD produce exactly the same output, byte for byte, and if they don't, then we should be able to find out why.
May 7, 2024 at 20:32 review Close votes
May 12, 2024 at 3:03
May 7, 2024 at 20:00 vote accept gdonega
S May 11, 2024 at 3:32
May 7, 2024 at 19:45 comment added gdonega Fair enough... well, I will try to get this hash in some other way then... Thanks a lot!
May 7, 2024 at 19:43 comment added Greg Burghardt Honestly, that's the problem. You need to use the same lib to generate the canonical form - and quite possibly the same version.
May 7, 2024 at 19:39 comment added gdonega Yeah, it is probably an unusual case (LoL). The problem is that the c# has one lib and java has another one. So... each of them uses one "interpretation" of the W3C standard.
May 7, 2024 at 19:35 comment added gdonega Thanks for your comment! Yeah! I've tried! One of the 2 tests that I made was a "clean" (no unnecessary white-spaces) single line. The other one had multiple lines, tabs, and unnecessary white-spaces.
May 7, 2024 at 19:34 comment added Greg Burghardt I don't know if you can get the same hash when you have two different libs generating the canonical form. You might need to know which lib generated the canonical forms and use the same lib to compare them by regenerating the hash? The answer might be to simply use your own chosen lib to take two non-canonical XML files, generate a canonical version of each and then hash those outputs.
May 7, 2024 at 19:31 comment added Greg Burghardt The wikipedia article about canonical XML states that, *"According to the W3C, if two XML documents have the same canonical form, then the two documents are logically equivalent within the given application context (except for limitations regarding a few unusual cases)." I wonder if you just happen to be experiencing one of these "unusual cases"?
May 7, 2024 at 19:29 comment added Greg Burghardt And <![CDATA[ islands could be another point of contention.
May 7, 2024 at 19:29 comment added Greg Burghardt This is an interesting question. Canonical XML, on it surface, seems like it should eliminate inconsistencies, like white-space differences. Have you tried opening the two XML files in a diff tool to see what the differences are? Even differences in line endings could doom a trustable hash of the two files.
S May 7, 2024 at 19:21 review First questions
May 7, 2024 at 20:03
S May 7, 2024 at 19:21 history asked gdonega CC BY-SA 4.0