I need to get the same hash of an xml in any language.
I tried to get the xml's canonical form then get it's hash
But what I experienced was that the canonical is not a "fixed standard". It is implemented in different forms by all the libs and languages that I worked with... so, I never get the SAME hash.
So, my question is: is there a way to get a trustable hash of the same canonical XML?
Edit
I'm using this xml as example:
<doc xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org" xml:base="something/else">
<e1>
<e2 xmlns="" xml:id="abc" xml:base="bar/">
<e3 id="E3" xml:base="foo"/>
</e2>
</e1>
</doc>
To get the Canonical Form, I've used:
In c# (.net version 8):
string stringXml = "<doc xmlns=\"http://www.ietf.org\" xmlns:w3c=\"http://www.w3.org\" xml:base=\"something/else\">\n <e1>\n <e2 xmlns=\"\" xml:id=\"abc\" xml:base=\"bar/\">\n <e3 id=\"E3\" xml:base=\"foo\"/>\n </e2>\n </e1>\n</doc>";
System.Security.Cryptography.Xml.XmlDsigC14NWithCommentsTransform c14n = new();
System.Xml.XmlDocument documentXml = new();
documentXml.LoadXml(stringXml);
c14n.LoadInput(documentXml);
Stream stream = (Stream)c14n.GetOutput(typeof(Stream));
string result = new StreamReader(stream).ReadToEnd();
using var hash = System.Security.Cryptography.SHA256.Create();
var byteArray = hash.ComputeHash(System.Text.Encoding.UTF8.GetBytes(result));
string sha256hex = Convert.ToHexString(byteArray);
Console.WriteLine(sha256hex);
- The sha256hex result was:
4716238DE66819B69981AE1BD3943451D0EADEEA001583D27CDFDC4255484CB6
- The canonicalized result was:
<doc xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org" xml:base="something/else"><e1><e2 xmlns="" xml:base="bar/" xml:id="abc"><e3 id="E3" xml:base="foo"></e3></e2></e1></doc>
In java (version 21):
And Lib commons-codec (version 1.17.0) (just for hashing)
String stringXml= "<doc xmlns=\"http://www.ietf.org\" xmlns:w3c=\"http://www.w3.org\" xml:base=\"something/else\">\n <e1>\n <e2 xmlns=\"\" xml:id=\"abc\" xml:base=\"bar/\">\n <e3 id=\"E3\" xml:base=\"foo\"/>\n </e2>\n </e1>\n</doc>";
org.apache.xml.security.Init.init();
org.apache.xml.security.c14n.Canonicalizer c14n = org.apache.xml.security.c14n.Canonicalizer.getInstance(org.apache.xml.security.c14n.Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS);
java.io.ByteArrayOutputStream stream = new java.io.ByteArrayOutputStream();
c14n.canonicalize(stringXml.getBytes(), stream, false);
String result = stream.toString(java.nio.charset.StandardCharsets.UTF_8);
String sha256hex = org.apache.commons.codec.digest.DigestUtils.sha256Hex(result);
System.out.println(sha256hex);
- The sha256hex result was:
dea874fbbe21f9e27e521cfddf61aa54bc1b0b18692e3105455eeca24beea1f6
- The canonicalized result was:
<doc xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org" xml:base="something/else">
<e1>
<e2 xmlns="" xml:base="bar/" xml:id="abc">
<e3 id="E3" xml:base="foo"></e3>
</e2>
</e1>
</doc>
<e3>element will be changed to:<e3 id="E3" xml:base="foo"></e3>See here.