5
\$\begingroup\$

I was concatenating a lot of strings lately, usually the results of std::format:

std::string str;
str += std::format("the answer is {}\n", 42);
str += std::format("what is the {}?\n", "question");

I got tired of temporary strings and switched to std::format_to:

std::string str;
std::format_to(std::back_inserter(str), "the answer is {}\n", 42);
std::format_to(std::back_inserter(str), "what is the {}?\n", "question");

But now it was horribly verbose, so I figured I created a helper:

auto make_append_format(std::string& str)
{
  return [&str]<class... TArgs>(const std::format_string<TArgs...> fmt, TArgs&&... args)
  { return std::vformat_to(std::back_inserter(str), fmt.get(), std::make_format_args(args...)); };
}

And now the code looks a bit neater:

std::string str;
const auto append_format = make_append_format(str);
append_format("the answer is {}\n", 42);
append_format("what is the {}?\n", "question");

Any input on style and potential issues with make_append_format? Should I forward the parameter pack?

\$\endgroup\$
1
  • \$\begingroup\$ You could modify make_append_format to take a second parameter for the inserter, defaulting to std::back_inserter, in case you want to use it with other insertion iterators later. It would look something like this: template<typename Func=decltype(std::back_inserter<std::string>)> auto make_append_format(std::string& str, Func f = std::back_inserter). It's not necessary or anything, though, and there isn't really a cleaner way to do it since auto parameters don't play nicely with default arguments; it'd just make it a bit more reusable if you ever run into the same situation again. \$\endgroup\$ Commented Sep 2 at 22:05

5 Answers 5

11
\$\begingroup\$

Why are you concatenating many formatted strings?

Usually there are a few reasons I could think of, e.g. generating a web page, performing some kind of logging or similar. In that case it makes more sense to use an (output) stream, as you may not want to buffer everything in memory in the first place. You can just generate the formatted text and write it out to a socket, file or any other sink. In that case you can use the << operator for formating. Or you could keep formatting small portions of the text using format and then put them into the output stream - generally there is no need to micro-optimize at that level.

In other cases you may e.g. have a destination in a GUI with it's own "sink".


Personally I don't see any intermediate variable in your first code block that just uses the += operator and format. This is a well known method and to me it is much more readable than some self-invented method. So if there is any advantage of using such a method, it is usually offset against the next developer having to get to grips with your code.


In this case I would recommend sticking with the basics and, if applicable, streaming. Streams can still end up in a string-sink, and you can implement any buffering or other strategies that may be required (though, in C++, that can also be done with std::string itself).

To me, implementing these kind of low level functions to just get rid of a bit of syntax isn't worth it.

\$\endgroup\$
5
  • 8
    \$\begingroup\$ The OP's first version doesn't have any temporary named variables, but it has temporary std::string objects which get created and then destroyed after being operands to +=. If memory is allocated and freed for them, this is potentially inefficient. Not that this invalidates your whole answer, but there is a real point here if we're concerned about efficiency. \$\endgroup\$ Commented Sep 2 at 0:45
  • \$\begingroup\$ @PeterCordes Thanks, understand. std::back_inserter(str) does avoid temporary string values and keeps everything in one string instance, but without reserving space you’ll still pay for reallocation and buffer copies as it grows. So I'd say that streaming would still be a good idea. \$\endgroup\$ Commented Sep 2 at 13:01
  • 1
    \$\begingroup\$ You have a good point that OP should use std::stringstream, not std::string, for repeated concatenation. That in no way requires switching from format_to to <<, nor would that switch be an improvement. \$\endgroup\$ Commented Sep 3 at 21:47
  • \$\begingroup\$ @BenVoigt Yeah, which is why I updated to include " Or you could keep formatting small portions of the text using format and then put them into the output stream - generally there is no need to micro-optimize at that level." because I'm generally not in favor of formatting with << myself. Feel free to edit the answer if you know a way to improve that part of it. \$\endgroup\$ Commented Sep 4 at 11:28
  • 2
    \$\begingroup\$ @MaartenBodewes: I interpreted that sentence in your answer to mean std::format into a temporary string, then append that temporary to the stream. But you can use std::format_to directly into the stream. The format vs operator<< is not so much about performance as about usability -- for localization into different languages, the ability of a format string to control the order of placements is very helpful. \$\endgroup\$ Commented Sep 4 at 14:47
5
\$\begingroup\$

Overall I agree with Maarten, but here is another issue that you might face.

I have a different concern than what you expect. This way of handling string concatenation is error prone. You can easily fall into a trap if multiple strings are built at the same time. Right now, you might not need it, but in the future it could really cause issues. Keeping variable name within the generation is far safer. This will also remove the need for binding and turn your function into a simple forwarder where to parameter would be turned from string to iterator.

\$\endgroup\$
2
  • 3
    \$\begingroup\$ "Keeping variable name within the generation is far safer" -> Just rename the forwarder to match the variable being wrapped, const auto format_into_str = make_append_format(str); \$\endgroup\$ Commented Sep 3 at 21:50
  • \$\begingroup\$ That is not checked by the compiler. \$\endgroup\$ Commented Sep 5 at 6:11
5
\$\begingroup\$

I'll answer the specific question:

Should I forward the parameter pack?

I'd say that you're doing the right thing here; it's the same pattern as std::format():

template<class... Args>
std::string std::format(std::format_string<Args...> fmt, Args&&... args)
{
    return std::vformat(fmt.get(), std::make_format_args(args...));
}
\$\endgroup\$
3
\$\begingroup\$

You wrote:

std::string str;
const auto append_format = make_append_format(str);
append_format("the answer is {}\n", 42);
append_format("what is the {}?\n", "question");

I don't see much difference between that and:

std::string str;
auto out = std::back_inserter(str);
out = std::format_to(out, "the answer is {}\n", 42);
out = std::format_to(out, "what is the {}?\n", "question");

Note that you could even omit the out = part and the code would still work — although I'll stop short of recommending that you omit it, because I think it's convenient and useful that the idiom out = std::format_to(out, ...) Does The Right Thing for other kinds of iterators too. For example, you might switch to a core-language array and pointers:

char buffer[100];
auto out = buffer;
out = std::format_to(out, "the answer is {}\n", 42);
out = std::format_to(out, "what is the {}?\n", "question");
*out = '\0';

So your make_append_format helper doesn't need to exist at all; but, FWIW, here's what I'd say about its implementation:

Pretty sure it could be constexpr (and certainly should be constexpr in C++26).

The lambda is way too arcane. You have:

return [&str]<class... TArgs>(const std::format_string<TArgs...> fmt, TArgs&&... args)
{ return std::vformat_to(std::back_inserter(str), fmt.get(), std::make_format_args(args...)); };

you should just write:

return [&](const auto& fmt, const auto&... args) {
  return std::vformat_to(std::back_inserter(str), fmt.get(), std::make_format_args(args...));
};

This gets rid of all those angle-brackets, and also the scary [&str] capture on the lambda. (Although maybe it's justified in this case, because you're returning the captureful lambda to your caller — if someone accidentally changed (std::string& str) into (std::string str), your code would keep compiling but stop working (and it would have undefined behavior). So writing the unusually verbose [&str] might arguably be a way to keep the reader on their toes here.

But, as I said, it'd be better not to have this function at all.

\$\endgroup\$
2
\$\begingroup\$

If you know how long the string will be, or at least an upper bound, you could first allocate the string, filling it with null characters, then write each segment to the end. You might use std::format_to_n with the address of the first null character as the buffer and the remaining number of characters as the maximum size. After appending each substring, you can extract the size of bytes written, add it to the index of the first null character in the buffer, and subtract it from the size of the destination buffer. When you’re done, you can resize the string to its correct length, then shrink_to_fit.

If you allocate enough space up front, you will not need reallocation that possibly copies the entire array, and each portion of the string will be written in place, rather than allocating a temporary buffer, writing to that and then copying from it, or else adding each character with push_back. If necessary, you could add more zero bytes to the end of the std::string when you detect there are too few left, then recalculate the address because it may have changed.

\$\endgroup\$
3
  • 1
    \$\begingroup\$ That's a lot of work to avoid simply calling str.reserve(expected_size). Don't manually keep track of the insertion index when std::string already does that. \$\endgroup\$ Commented Sep 3 at 21:51
  • \$\begingroup\$ @BenVoigt That’s unfortunately not the case, as of last I checked. If you append, you’re double-buffering, writing to a temporary std::string and then copying to the final destination. If you reserve the destination buffer, you still might need to reallocate the temporary strings and copy them. If you use a back_inserter as your output iterator, you have to push characters at the end one at a time in a sub-optimal way. You really do optimize the concatenation by pre-allocating the buffer with resize rather than reserve and tracking how much of the padding you’ve used. \$\endgroup\$ Commented Sep 3 at 22:56
  • \$\begingroup\$ However, the minor gains might or might not be worth the extra complexity. \$\endgroup\$ Commented Sep 4 at 22:25

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.