Timeline for Get a hash from XML

Current License: CC BY-SA 4.0

29 events

when toggle format	what		by	license	comment
S May 11, 2024 at 3:32	vote	accept	gdonega
May 10, 2024 at 20:53	answer	added	Michael Kay		timeline score: 4
May 9, 2024 at 20:19	answer	added	Stephen C. Steel		timeline score: 4
May 9, 2024 at 18:39	comment	added	gdonega		I've updated with the canonicalized result!
May 9, 2024 at 18:36	history	edited	gdonega	CC BY-SA 4.0	added 543 characters in body
May 8, 2024 at 16:01	comment	added	JimmyJames		@gdonega Sorry, did you see differences in the canonical output? If so, it would be helpful to future readers to show that.
May 8, 2024 at 15:01	comment	added	gdonega		Ohh! I see! That DOES make a lot of sense! And I can see why c# keeps the whitespace! Thanks for the explanation!
May 8, 2024 at 13:51	comment	added	JimmyJames		I could have missed something but the only thing that I would expect to change when you canonicalize that document is that the `<e3>` element will be changed to: `<e3 id="E3" xml:base="foo"></e3>` See here.
May 8, 2024 at 13:46	comment	added	JimmyJames		I just looked at the canonical spec and it does consider any whitespace within an element to be significant. That means all the line endings and well as indentation in your example doc. I see in the comments history that you tried stripping it out (sorry missed that before) but my experience says you should always do that.
May 8, 2024 at 13:24	comment	added	JimmyJames		@gdonega I would suggest trying a couple things: 1. print out the canonical forms and look for differences. 2. Strip out all unnecessary whitespace.
May 8, 2024 at 11:43	comment	added	gdonega		In this case, the differences are break-lines, unnecessary whitespace, etc. The logical equivalence is correct.
May 8, 2024 at 7:20	comment	added	Bart van Ingen Schenau		@gdonega, logical equivalence of two XML documents does not need to imply identical byte streams. Compare the canonicalized string that you feed into the hash algorithm and see if you can explain the differences in the byte stream. Be also on the lookout for things like a byte-order mark at the start of the byte stream.
May 8, 2024 at 4:30	comment	added	Cort Ammon		I'd just like to say that this demonstrates the outrageous complexity of XML. In any other data format, the canonical format is the canonical format. There's exactly one, and everybody does it the same. With XML, even using the same algorithm, C14N, algorithms still find ambiguity.
May 8, 2024 at 2:29	comment	added	gdonega		I've updated the question with my code and my results!
May 8, 2024 at 2:27	history	edited	gdonega	CC BY-SA 4.0	Adding java and c# code
May 8, 2024 at 1:47	comment	added	gdonega		Alright! I'll update my question...
May 7, 2024 at 21:35	comment	added	Michael Kay		It would be helpful if you could be more precise. Please show us an input document that is canonicalized differently by Java and C#, and show us exactly how you invoked the canonicalization process in each case. They SHOULD produce exactly the same output, byte for byte, and if they don't, then we should be able to find out why.
May 7, 2024 at 20:32	review	Close votes
May 12, 2024 at 3:03
May 7, 2024 at 20:00	vote	accept	gdonega
S May 11, 2024 at 3:32
May 7, 2024 at 19:45	comment	added	gdonega		Fair enough... well, I will try to get this hash in some other way then... Thanks a lot!
May 7, 2024 at 19:43	comment	added	Greg Burghardt		Honestly, that's the problem. You need to use the same lib to generate the canonical form - and quite possibly the same version.
May 7, 2024 at 19:39	comment	added	gdonega		Yeah, it is probably an unusual case (LoL). The problem is that the c# has one lib and java has another one. So... each of them uses one "interpretation" of the W3C standard.
May 7, 2024 at 19:35	comment	added	gdonega		Thanks for your comment! Yeah! I've tried! One of the 2 tests that I made was a "clean" (no unnecessary white-spaces) single line. The other one had multiple lines, tabs, and unnecessary white-spaces.
May 7, 2024 at 19:34	comment	added	Greg Burghardt		I don't know if you can get the same hash when you have two different libs generating the canonical form. You might need to know which lib generated the canonical forms and use the same lib to compare them by regenerating the hash? The answer might be to simply use your own chosen lib to take two non-canonical XML files, generate a canonical version of each and then hash those outputs.
May 7, 2024 at 19:31	comment	added	Greg Burghardt		The wikipedia article about canonical XML states that, "According to the W3C, if two XML documents have the same canonical form, then the two documents are logically equivalent within the given application context (except for limitations regarding a few unusual cases)*." I wonder if you just happen to be experiencing one of these "unusual cases"?
May 7, 2024 at 19:29	comment	added	Greg Burghardt		And `<![CDATA[` islands could be another point of contention.
May 7, 2024 at 19:29	comment	added	Greg Burghardt		This is an interesting question. Canonical XML, on it surface, seems like it should eliminate inconsistencies, like white-space differences. Have you tried opening the two XML files in a diff tool to see what the differences are? Even differences in line endings could doom a trustable hash of the two files.
S May 7, 2024 at 19:21	review	First questions
May 7, 2024 at 20:03
S May 7, 2024 at 19:21	history	asked	gdonega	CC BY-SA 4.0

toggle format