0

Currently have

<?xml version="1.0" encoding="ISO-8859-1"?>  <entity>  <id>8624</id>  <name>Test_Report.csv</name>  <startDate>24/05/2021 9:15 am</startDate>  <level>info</level>  </entity>

I did:

message = messsage.replaceAll("\\s+","\\n")

This gives me:

<?xml version="1.0"\nencoding="ISO-8859-1"?>\n<entity>\n<id>8624</id>\n  <name>Test_Report.csv</name>\n<startDate>24/05/2021\n9:15 am</startDate>\n<level>info</level>\n </entity>

I want the output to keep the spaces in xml element data untouched and look like below:

<?xml version="1.0" encoding="ISO-8859-1"?>
<entity>
<id>8624</id>
<name>Test_Report.csv</name>
<startDate>24/05/2021 9:15 am</startDate>
<level>info</level>
</entity>
2
  • 2
    that would break the inside of startDate as well though. Commented May 24, 2021 at 9:03
  • please update the question about 1) if you want to replace just spaces (" ") or all whitespace ("\s") and 2) if you want to keep spaces within the xml element data ontouched, like you seem to do, based on your example. Commented May 24, 2021 at 9:13

3 Answers 3

2
message = messsage.replaceAll("(?<=>)\\s*(?=<)","\n")
Sign up to request clarification or add additional context in comments.

Comments

1

The correct regex should match only spaces between > and < characters, so it will look like this \>\s*?\< (it also will work when you don't have spaces between >< XML tags).

And a replacement string will be >\n<

Pay an attention that the second parameter for a String.replaceAll is a plain String, not a Regular Expression, so you don't need to escape it with backslashes \.

backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences, and backslashes are used to escape literal characters in the replacement string.

message.replaceAll("\\>\s*?\\<", ">\n<");

You can run it online here - https://www.mycompiler.io/view/1Bm8MzU


If your XML string is already formatted (i.e. has indentations with spaces), you can keep this indentation space characters by capturing them into the group \>(\s*?)\< and add a new line character before this group using >\n$1<.

message.replaceAll("\\>(\\s*?)\\<", ">\n$1<")

You can run it online here - https://www.mycompiler.io/view/5zaV7tf

Comments

1
String message = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>  <entity>  <id>8624</id>  <name>Test_Report.csv</name>  <startDate>24/05/2021 9:15 am</startDate>  <level>info</level>  </entity>";
String result = message.replaceAll(">\\s*<", ">\n<");
System.out.println(result);

will match all the occurrences of zero or more spaces placed between > and < characters, and matched pattern will be then replaced with >\n<, outputting:

<?xml version="1.0" encoding="ISO-8859-1"?>
<entity>
<id>8624</id>
<name>Test_Report.csv</name>
<startDate>24/05/2021 9:15 am</startDate>
<level>info</level>
</entity>

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.