9
$\begingroup$

Suppose a language has a rust-style trait system but is garbage collected and most types are reference types by default.

I could create a trait like this:

trait ToString {
    fn to_string(self) -> String;
}

Typical implementation would be something like:

impl ToString for List<T: ToString> {
    fn to_string(self) -> String { "[" + self.map(|i: T|i.to_string()).join(',') + "]" }
}

But if I then do:

let mut a = []; // [] creates a mutable vector, there are no arrays
a.push(a); // Type of a is decided based on what is added to it
print!(a);

It will create an infinite loop printing the list forever.

In Python, if you do this you'll get:

a = []
a.append(a)
print(a)
# [[...]]

It will detect the cycle and print [...] instead of the full list if the list was already in the process of being printed.

How can I modify my ToString interface to allow types to detect if they are already being printed and prevent an infinite loop for recursive data structures. Ideally it would work for arbitrary data structures and not just lists, while also keeping it ergonomic to implement the interface.

$\endgroup$
6
  • $\begingroup$ You might also want to check out how Racket does it with mpairs. It displays more info about the structure of the cycle(s). For example, if you have (define a (mlist 1 2 3 4)) (set-mcar! (mcdr (mcdr a)) (mcdr a)) (set-mcdr! (mcdr (mcdr a)) a) then when you print a you get #0=(mcons 1 #1=(mcons 2 (mcons #1# #0#))) $\endgroup$ Commented Sep 1, 2023 at 13:43
  • $\begingroup$ @DavidYoung FYI, Racket adopted that from Common Lisp. $\endgroup$ Commented Sep 1, 2023 at 19:10
  • $\begingroup$ If your language is statically typed, how does a.push(a) even work? What is the inferred type? $\endgroup$ Commented Sep 3, 2023 at 12:17
  • $\begingroup$ A list of lists of list of lists... @xigoi $\endgroup$ Commented Sep 3, 2023 at 12:39
  • $\begingroup$ @mousetail In that case, couldn't you just detect at compile time that the type is self-referential and use a stringifying function that doesn't descend? $\endgroup$ Commented Sep 3, 2023 at 18:55

1 Answer 1

8
$\begingroup$

I don't think this requires changing your ToString interface itself, rather you can change the protocol by which the language invokes the to_string method. That is, don't just unconditionally call .to_string() when a string conversion is needed.

Here's a simple solution which could work either for a single-threaded language, or for a language like Java which allows synchronising (i.e. reentrant locking) on an object. Since your language is garbage-collected, you already have overhead memory per object, so add one extra bit to this overhead for an "is being stringified" flag. The protocol is then: when an implicit string conversion should take place,

  • If the flag is already set on this object, then return '...'.
  • Otherwise, set the flag, invoke to_string(), unset the flag, and then use the result.

In code, the implementation would look like this (where str is the function called either explicitly or implicitly when a string conversion is needed):

def str(obj):
    if obj.is_stringifying:
        return '...'
    else:
        obj.is_stringifying = True
        try:
            return obj.to_string()
        finally:
            obj.is_stringifying = False

In case it's not clear, the .is_stringifying flag is internal to the language, it is not a "real" attribute and user code isn't supposed to see it. It's part of the object header.

The '...' string could be made an overrideable property on the interface, or a separate method, if you want to give users more control over the string representation of recursive objects. But in this case, there should be a default implementation for the vast majority of cases where this either doesn't matter or can't happen anyway.

Note that this can still suffer from infinite recursion if someone ever invokes .to_string() directly in their own implementation of .to_string(); users instead must call the str function for explicit conversions, or rely on implicit conversions (e.g. when the object is concatenated with a string, or included in a string template). Python deals with this by naming the method __str__, using the convention that methods beginning with two underscores are part of a language protocol and shouldn't be called directly from user code.


That said, the above does exhibit some weird behaviour in edge cases. If a user puts some other code inside to_string which calls out of the "stringifying" code into the rest of the application, then that code in the application will not observe the correct stringifying behaviour for all objects.

This isn't necessarily a design problem, since it only happens if the ToString implementation is incorrect; you can also get incorrect behaviour from e.g. the standard library's sort() function if you provide an incorrect comparator implementation. But it is a limitation, and using the object header also means to_string() has to be a synchronised method if the program is multi-threaded, which isn't great.

If you want to address this, here's an alternative approach which does change the ToString interface: the idea is that instead of returning a string, you write the object's string representation to a "writer" object. Then the writer object itself can hold the state of which objects are currently being stringified. (The writer can't safely be passed between threads, but concurrent string conversions in different threads would each have their own writer.)

Here we assume that the ToString interface's to_string method accepts a writer object. Then the protocol for string conversion is that a writer is created, the object's to_string method is invoked passing that writer object, and then the writer's result is used.

class Stringifier:
    def __init__(self):
        self._stringifying = set()
        self.out = ''
    
    def write_str(self, s):
        self.out += s
    
    def write_obj(self, obj):
        # a stable, unique id per object
        # could be its address as an integer, if the GC won't move it
        obj_id = id(obj)
        if obj_id in self._stringifying:
            self.write_str('...')
        else:
            self._stringifying.add(obj_id)
            try:
                obj.to_string(self)
            finally:
                self._stringifying.remove(obj_id)

def str(obj):
    writer = Stringifier()
    writer.write_obj(obj)
    return writer.out

This somewhat resembles how Rust's Display trait works, except with the added check to avoid self-recursion. But from the user's side, the implementation for a user-defined type looks the same as for implementing Display.

This kind of interface (where you pass a writer around) can also be more efficient than recursive string concatenation, since it doesn't have to build a lot of temporary string objects in the process. Additionally, if the string is meant to be written to e.g. console output or a file, then an appropriate writer object can be provided such that the object's string representation never has to exist in memory all at once.

$\endgroup$
6
  • $\begingroup$ The question asks for a suitable interface, i.e. a type or protocol, but what probably is needed is what you provide: an implementation that detects recursion because it is a runtime property and unlikely to be expressed in a type. $\endgroup$ Commented Sep 1, 2023 at 15:23
  • 1
    $\begingroup$ I think this may work not very well in multithreaded programs… $\endgroup$ Commented Sep 1, 2023 at 15:34
  • $\begingroup$ @user3840170 Good point, if the language has threads then a simple flag in the object header would require synchronising on the object when you stringify it. I've edited to mention this. $\endgroup$ Commented Sep 1, 2023 at 15:55
  • $\begingroup$ The stringifier is reminiscent of Swift's TextOutputStream/TextOutputStreamable, though by the nature of those protocols they can't detect recursion -- they only have an equivalent to write_str, not write_obj. $\endgroup$ Commented Sep 2, 2023 at 3:39
  • $\begingroup$ Instead of a flag on the object header, which allows only one such method and requires messing with the internal object representation, you could just use a global (Weak)Set that is filled with the objects being stringified. By emptying it only after the initial to_string call, you could not just implement loop detection but also avoid printing the same object multiple times. $\endgroup$ Commented Sep 2, 2023 at 5:24

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.