I don't think this requires changing your ToString interface itself, rather you can change the protocol by which the language invokes the to_string method. That is, don't just unconditionally call .to_string() when a string conversion is needed.
Here's a simple solution which could work either for a single-threaded language, or for a language like Java which allows synchronising (i.e. reentrant locking) on an object. Since your language is garbage-collected, you already have overhead memory per object, so add one extra bit to this overhead for an "is being stringified" flag. The protocol is then: when an implicit string conversion should take place,
- If the flag is already set on this object, then return
'...'.
- Otherwise, set the flag, invoke
to_string(), unset the flag, and then use the result.
In code, the implementation would look like this (where str is the function called either explicitly or implicitly when a string conversion is needed):
def str(obj):
if obj.is_stringifying:
return '...'
else:
obj.is_stringifying = True
try:
return obj.to_string()
finally:
obj.is_stringifying = False
In case it's not clear, the .is_stringifying flag is internal to the language, it is not a "real" attribute and user code isn't supposed to see it. It's part of the object header.
The '...' string could be made an overrideable property on the interface, or a separate method, if you want to give users more control over the string representation of recursive objects. But in this case, there should be a default implementation for the vast majority of cases where this either doesn't matter or can't happen anyway.
Note that this can still suffer from infinite recursion if someone ever invokes .to_string() directly in their own implementation of .to_string(); users instead must call the str function for explicit conversions, or rely on implicit conversions (e.g. when the object is concatenated with a string, or included in a string template). Python deals with this by naming the method __str__, using the convention that methods beginning with two underscores are part of a language protocol and shouldn't be called directly from user code.
That said, the above does exhibit some weird behaviour in edge cases. If a user puts some other code inside to_string which calls out of the "stringifying" code into the rest of the application, then that code in the application will not observe the correct stringifying behaviour for all objects.
This isn't necessarily a design problem, since it only happens if the ToString implementation is incorrect; you can also get incorrect behaviour from e.g. the standard library's sort() function if you provide an incorrect comparator implementation. But it is a limitation, and using the object header also means to_string() has to be a synchronised method if the program is multi-threaded, which isn't great.
If you want to address this, here's an alternative approach which does change the ToString interface: the idea is that instead of returning a string, you write the object's string representation to a "writer" object. Then the writer object itself can hold the state of which objects are currently being stringified. (The writer can't safely be passed between threads, but concurrent string conversions in different threads would each have their own writer.)
Here we assume that the ToString interface's to_string method accepts a writer object. Then the protocol for string conversion is that a writer is created, the object's to_string method is invoked passing that writer object, and then the writer's result is used.
class Stringifier:
def __init__(self):
self._stringifying = set()
self.out = ''
def write_str(self, s):
self.out += s
def write_obj(self, obj):
# a stable, unique id per object
# could be its address as an integer, if the GC won't move it
obj_id = id(obj)
if obj_id in self._stringifying:
self.write_str('...')
else:
self._stringifying.add(obj_id)
try:
obj.to_string(self)
finally:
self._stringifying.remove(obj_id)
def str(obj):
writer = Stringifier()
writer.write_obj(obj)
return writer.out
This somewhat resembles how Rust's Display trait works, except with the added check to avoid self-recursion. But from the user's side, the implementation for a user-defined type looks the same as for implementing Display.
This kind of interface (where you pass a writer around) can also be more efficient than recursive string concatenation, since it doesn't have to build a lot of temporary string objects in the process. Additionally, if the string is meant to be written to e.g. console output or a file, then an appropriate writer object can be provided such that the object's string representation never has to exist in memory all at once.
mpairs. It displays more info about the structure of the cycle(s). For example, if you have(define a (mlist 1 2 3 4)) (set-mcar! (mcdr (mcdr a)) (mcdr a)) (set-mcdr! (mcdr (mcdr a)) a)then when you printayou get#0=(mcons 1 #1=(mcons 2 (mcons #1# #0#)))$\endgroup$a.push(a)even work? What is the inferred type? $\endgroup$