69

I've recent been reading about immutable strings Why can't strings be mutable in Java and .NET? and Why .NET String is immutable? as well some stuff about why D chose immutable strings. There seem to be many advantages.

  • trivially thread safe
  • more secure
  • more memory efficient in most use cases.
  • cheap substrings (tokenizing and slicing)

Not to mention most new languages have immutable strings, D2.0, Java, C#, Python, etc.

Would C++ benefit from immutable strings?

Is it possible to implement an immutable string class in c++ (or c++0x) that would have all of these advantages?


update:

There are two attempts at immutable strings const_string and fix_str. Neither have been updated in half a decade. Are they even used? Why didn't const_string ever make it into boost?

4
  • 43
    A very elaborate and convincing argument you made there, BlueRaja. Commented May 26, 2010 at 21:04
  • 6
    Well, BlueRaja didn't actually make an argument, as you've all so clearly pointed out. But he might be right, in that C++ is perhaps too much of a hybrid language for purist attempts at an immutable string to find a home. This says more about the C++ culture than the language itself, of course. Commented Jun 1, 2010 at 16:05
  • 5
    Objection! Ruby's string is not immutable! Commented May 5, 2011 at 9:42
  • 1
    They have not been updated since 2005, but there aren’t many bugs reported, so I think it’s fine to use. Commented Feb 5, 2018 at 15:23

12 Answers 12

55

I found most people in this thread do not really understand what immutable_string is. It is not only about the constness. The really power of immutable_string is the performance (even in single thread program) and the memory usage.

Imagine that, if all strings are immutable, and all string are implemented like

class string {
    char* _head ;
    size_t _len ;
} ;

How can we implement a sub-str operation? We don't need to copy any char. All we have to do is assign the _head and the _len. Then the sub-string shares the same memory segment with the source string.

Of course we can not really implement a immutable_string only with the two data members. The real implementation might need a reference-counted(or fly-weighted) memory block. Like this

class immutable_string {
    boost::fly_weight<std::string> _s ;
    char* _head ;
    size_t _len ;
} ;

Both the memory and the performance would be better than the traditional string in most cases, especially when you know what you are doing.

Of course C++ can benefit from immutable string, and it is nice to have one. I have checked the boost::const_string and the fix_str mentioned by Cubbi. Those should be what I am talking about.

Sign up to request clarification or add additional context in comments.

Comments

26

As an opinion:

  • Yes, I'd quite like an immutable string library for C++.
  • No, I would not like std::string to be immutable.

Is it really worth doing (as a standard library feature)? I would say not. The use of const gives you locally immutable strings, and the basic nature of systems programming languages means that you really do need mutable strings.

4 Comments

The closest I've come to immutable strings in C++ was a "span" class that has two const pointers, one for the begin and one for the end. It did not manage memory, but did support the usual utility functions (find, etc). As a result, it turned out to be very useful for parsing.
@StevenSudit: Many large projects have that, though it's commonly called stringref or similar.
@MooingDuck That's true. Google calls its StringPiece
There is also string_view in C++17, it appears to be const only.
9

My conclusion is that C++ does not require the immutable pattern because it has const semantics.

In Java, if you have a Person class and you return the String name of the person with the getName() method, your only protection is the immutable pattern. If it would not be there you would have to clone() your strings all night and day (as you have to do with data members that are not typical value-objects, but still needs to be protected).

In C++ you have const std::string& getName() const. So you can write SomeFunction(person.getName()) where it is like void SomeFunction(const std::string& subject).

  • No copy happened
  • If anyone wants to copy he is free to do so
  • Technique applies to all data types, not just strings

1 Comment

Correction! Immutable strings can be useful in multi-threaded programs as they have zero overhad on handling concurrency. And most of the time you do not edit your strings, you rather simply replace them.
3

You're certainly not the only person who though that. In fact, there is const_string library by Maxim Yegorushkin, which seems to have been written with inclusion into boost in mind. And here's a little newer library, fix_str by Roland Pibinger. I'm not sure how tricky would full string interning at run-time be, but most of the advantages are achievable when necessary.

Comments

3

I don't think there's a definitive answer here. It's subjective—if not because personal taste then at least because of the type of code one most often deals with. (Still, a valuable question.)

Immutable strings are great when memory is cheap—this wasn't true when C++ was developed, and it isn't the case on all platforms targeted by C++. (OTOH on more limited platforms C seems much more common than C++, so that argument is weak.)

You can create an immutable string class in C++, and you can make it largely compatible with std::string—but you will still lose when comparing to a built-in string class with dedicated optimizations and language features.

std::string is the best standard string we get, so I wouldn't like to see any messing with it. I use it very rarely, though; std::string has too many drawbacks from my point of view.

6 Comments

If std::string is the best standard string, but you use it very rarely, because of its too many drawbacks, what DO you use?
CString (please don't kill me) because of >10 years of accrued libraries, better native API interop (including wchar_t / char conversions). Back then, the well-defined copy-on-write would also be an advantage over std::string's lack of performance guarantees.
@Mike both the MFC and the ATL version since they are source-code-compatible, but two distinct implementaitons that don't match. It's a major WTF to always have an "ATL" and an "MFC" version of libraries.
Immutable strings go back at least to the late 1970s if not before; I don't think memory was particularly cheap then. In Applesoft BASIC, Commodore BASIC, or many other implementations, each element in a string array would hold a two-byte pointer and a one-byte length; the string data itself would be stored in a pool with no other overhead. A statement like A$(4)=A$(6) would merely copy the length and the pointer; it would not have to copy any data. Microsoft's garbage-collection algorithm was not well implemented (dog slow), but it was possible for code to determine when a GC cycle...
...would be coming soon and use some PEEKs and POKEs to add "generations" to the garbage collector, or use a third-party GC which was more efficient. Although it was not uncommon for programs to store fixed collections of strings with an overhead of one byte per string or--for ASCII strings--zero (use bit 7 of each byte to say if it's the last), I don't think one could have a mutable array of mixed-length strings with less overhead. The problem with immutable strings is that a GC has to be able to find them, which would be workable in some languages, but not so well in C++.
|
2
const std::string

There you go. A string literal is also immutable, unless you want to get into undefined behavior.

Edit: Of course that's only half the story. A const string variable isn't useful because you can't make it reference a new string. A reference to a const string would do it, except that C++ won't allow you to reassign a reference as in other languages like Python. The closest thing would be a smart pointer to a dynamically allocated string.

6 Comments

You'd need more than that, e.g. you'd want std::string::replace to return a modified copy rather than cause a compile error.
@peterchen -> const std::string orig; const std::string copy = std::string(orig).replace(...); - what would an immutable string do that's better?
IMHO assignment of a new string is mutating a string, and from what I remember of API's that had such a construct this is how they took it too. What you want really does sound more like an assignable reference and it seems to me that something like a smart pointer would be a better answer to that than making a const string that's assignable. I also do find const std::string vars useful from time to time so I'd have to beg to differ there.
It's not the correct interface for an immutable object, it's two statements instead of one, it's an implementation detail leaking to the calling code? --- An object should make the right thing easy, the wrong thing hard (or impossible). Do I need to put a "don't show this string to other threads" comment between the copy and the replace, and afterwards a "now you can"? --- I agree that const std::string is a close approximation, but without some of the benefits.
@Peter: It might be nice if the language supported two types of replace: The current one and replaced, where the latter operates on a const reference and returns a copy that has the replacements. The latter might be able to avoid copying everything twice. However, so long as we lack such a function, we're stuck with Noah's work-around, which is a reasonable alternative. The better answer would be full support for an immutable variant of std::string.
|
1

Immutable strings are great if, whenever it's necessary to create a new a string, the memory manager will always be able to determine determine the whereabouts of every string reference. On most platforms, language support for such ability could be provided at relatively modest cost, but on platforms without such language support built in it's much harder.

If, for example, one wanted to design a Pascal implementation on x86 that supported immutable strings, it would be necessary for the string allocator to be able to walk the stack to find all string references; the only execution-time cost of that would be requiring a consistent function-call approach [e.g. not using tail calls, and having every non-leaf function maintain a frame pointer]. Each memory area allocated with new would need to have a bit to indicate whether it contained any strings and those that do contain strings would need to have an index to a memory-layout descriptor, but those costs would be pretty slight.

If a GC wasn't table to walk the stack, then it would be necessary to have code use handles rather than pointers, and have code create string handles when local variables come into scope, and destroy the handles when they go out of scope. Much greater overhead.

Comments

0

Qt also uses immutable strings with copy-on-write.
There is some debate about how much performance it really buys you with decent compilers.

8 Comments

I would not call copy-on-write strings immutable. immutable strings are a subset of COW strings. That is, everything and immutable string can do a COW string could do as well, but the reverse is not true. It's these extra abilities that make COW strings suck for concurrent environments.
And the advantage to thread safety is completely gone once you throw COW in the mix (you need to lock, either explicitly or inside the library itself) whenever you are performing a write to ensure thread safety.
@Caspin - true but if you are going to have immutable strings you might as well make efficient use of them with COW
@iconiK: That is the reason for the comment '(... or inside the library itself)'. The thing is that locking is required, and it can be a costly operation. The fact that it is hidden from the user means that there are less chances of doing it wrong in user code, but it does not take away the costs. If you compare that with Java inmutable strings, you can copy references and know they will never be changed, you can create modifications with almost no cost at all (allocations in a generational GC are fast --10 cpu instructions).
Copy-on-write, as such, doesn't actually require locks; it just means that actions that appear to modify an instance actually point it to a new buffer, leaving the original alone. Replacing a pointer is almost always atomic. The hidden cost is in managing the lifespan of the original, which is usually done by reference counting. Even with interlocked operations, this counting is expensive, which is why std:string implementations have indeed moved away from it. In GC'd languages, like C#, this is a non-issue, so we have immutable strings, though without COW semantics.
|
0

constant strings make little sense with value semantics, and sharing isn't one of C++'s greatest strengths...

2 Comments

@Steven Maybe we are talking about different things when we say "value semantics". C# strings are always handled through a transparent level of indirection (reference semantics), whereas C++ strings are not (value semantics).
Maybe. In C#, actual value types (such as int) inherit from System.ValueType and are passed as copies, while reference types are passed by reference and (normally) compared by reference. While C# strings are references, they have value semantics in that they're immutable and are compared by content, not address. In C++, a std::string is a value, but it contains a reference (pointer, actually) to a mutable buffer. Therefore, passing a copy of a C++ string invokes the copy constructor to duplicate the buffer, whereas passing a const reference avoids the overhead. I hope that's clearer.
0

Would C++ benefit from immutable strings? Probably not much.

An immutable string is not the same as a read-only string. Immutability guarantees that no change to the observable state of the string may occur outside of what your own code can affect, to the point that if you take any code passing std::string by value, you could just replace it with such an immutable string pointer and everything will work (you cannot distinguish passing such a string by value from passing by reference).

C++ guarantees this only when you create a const object right from the beginning ‒ adding const to an existing object does not make it immutable, and you can remove it via const_cast any time without any issues. This basically means that you cannot make a type immutable in C++, only an object:

const std::string str("hello");
// or
const std::string &str = *new const std::string("hello");

There are actually two guarantees at play here ‒ you cannot modify str via const_cast<std::string&>(str) (undefined behaviour when modifying const object), and you cannot modify the character data via const_cast<char*>(str.data()) (undefined behaviour for data() const). You could theoretically get const_cast<std::string&>(str).data(), and modifying that might be fine for a trivial std::string implementation, but the standard std::string might be optimized so that it stores the character data (up to a size) in the object itself, thus you still risk modifying the object.

How do you pass an immutable object around? You can't.

void f(const std::string &str);

This is no longer an immutable object, but a read-only object ‒ you cannot modify it through str, but it can still change at any time.

To have an actually immutable string, you need to encode this in the type system somehow. The best you can do is to wrap it in another object where it is immutable, like const std::tuple<const std::string> &str ‒ there is no tuple variance that would permit getting this reference from a mutable std::tuple<std::string>, and so, when created, the string is already a const object.

Lastly, you also need guarantees about the lifetime of such an object, since even an immutable object can be deleted. That is thankfully not so complicated ‒ std::shared_ptr<const std::tuple<const std::string>> gives you all the guarantees ‒ once you observe its value, it should stay constant forever. This is pretty much the basis of what languages with immutable strings give you.

Now, to reason about the answer, how would C++ benefit from using such a string? Are there any places where read-only strings are stored, without giving you the control over the specific type? There are not that many I could come up with:

  • std::exception ‒ constructing it from an error message needs to copy the string to preserve it, but with an immutable string pointer, it could just store that and return in what(). However, you can create your own exception types that do that.
  • std::locale ‒ likewise, constructing from a locale name could just store the immutable string inside, without copying. However, locale names are not that frequently used, and are short enough to make small string optimization kick in for most of cases.
  • std::messagesget() could be a primary target for immutable strings, as any other "string catalogs" ‒ retrieval could happen relatively often, and the strings are moderately long and constant enough. Nothing stops you from adding a caching layer to this object however.

I encourage you to find more such examples. Indeed, there are many more cases where strings are accepted and processed, or generated and returned, than simply stored.

That being said, the situation in user code is drastically different ‒ once you store user records or configurations, you become more in need of such a type. So while the C++ language itself might not benefit from it, your code definitely would! Such a type needs to:

  • Be easily constructible from a string literal without copying, creating an immutable string with permanent lifetime.
  • Be move-constructible from std::string, taking ownership of its character data.
  • Be constructible without copying from other objects with lifetime management that exhibit immutability, such as std::shared_ptr<std::array<const char, N>> or the aforementioned tuple.
  • Be constructible by value from std::string, char iterators and the usual stuff, copying the data to its internal memory.
  • Be "unsafe-constructible" without copying from things like std::shared_ptr<std::span<char>>, where immutability is not guaranteed by the type. This option needs to be clearly marked as dangerous, and modifying the underlying data in any way during the lifetime of the result should cause undefined behaviour.
  • Interface well with existing std::shared_ptr-based code, preferably be handled as std::shared_ptr itself.
  • Allow allocation-less slicing, producing views into its data with the same lifetime.
  • Include compatibility with C strings in the form of const char *data() const for retrieval. Note that sliced strings may be missing the '\0' character at the end, and so there needs to be a way to detect this having happened and return std::shared_ptr<const char*> instead (either aliased to the original data, or to a copy thereof with '\0' at the end).

This interplay between std::shared_ptr and std::string makes such a type definitely preferable to be defined by the standard rather than user code, as some situations could be handled much better without unnecessary allocations or indirection, and the immutability guarantee enables important optimizations.

Comments

-1

Strings are mutable in Ruby.

$ irb
>> foo="hello"
=> "hello"
>> bar=foo
=> "hello"
>> foo << "world"
=> "helloworld"
>> print bar
helloworld=> nil
  • trivially thread safe

I would tend to forget safety arguments. If you want to be thread-safe, lock it, or don't touch it. C++ is not a convenient language, have your own conventions.

  • more secure

No. As soon as you have pointer arithmetics and unprotected access to the address space, forget about being secure. Safer against innocently bad coding, yes.

  • more memory efficient in most use cases.

Unless you implement CPU-intensive mechanisms, I don't see how.

  • cheap substrings (tokenizing and slicing)

That would be one very good point. Could be done by referring to a string with backreferences, where modifications to a string would cause a copy. Tokenizing and slicing become free, mutations become expensive.

Comments

-5

C++ strings are thread safe, all immutable objects are guaranteed to be thread safe but Java's StringBuffer is mutable like C++ string is and the both of them are thread safe. Why worry about speed, define your method or function parameters with the const keyword to tell the compiler the string will be immutable in that scope. Also if string object is immutable on demand, waiting when you absolutely need to use the string, in other words, when you append other strings to the main string, you have a list of strings until you actually need the whole string then they are joined together at that point.

immutable and mutable object operate at the same speed to my knowledge , except their methods which is a matter of pro and cons. constant primitives and variable primitives move at different speeds because at the machine level, variables are assigned to a register or a memory space which require a few binary operations, while constants are labels that don't require any of those and are thus faster (or less work is done). works only for primitives and not for object.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.