A simple machine that enumerates strings

Question

I can get ChatGPT to write a Python program that does various enumerations for me, but I'm trying to understand this from an abstract, theoretical viewpoint.

The first thing that occurs to me in trying to model this idea of "construct a bunch of things" is, there is some mechanism or system that takes "building blocks", puts them somewhere, and when the construction is finished, puts it somewhere else (or records it), so that we are creating a list / repository of the things we have constructed. While this is happening, the mechanism / actor needs to be following certain rules so that it knows which kinds of constructions it should do, what it should do next, when it's done, etc.

If we start with the "building blocks" being some symbols, and we just want to enumerate strings of those symbols, I can add a little detail to my mental model of how this goes about, as follows (before trying to make it fully formalized).

The "builder" / "building agent" should choose one of the symbols, and put it somewhere as the first element of the string. It can keep doing this, choosing another element and adding it to the next position of the string it is building. The builder also needs to decide when a string is finished.

This initial, simple model seems to hold that the builder is endowed with some kind of "choice"-making ability. The choices could be fully random, or guided by some kind of internal judgment or decision-making procedure.

Even in this minimal model, one thing we might realize is that the agent needs access to multiple copies of the same symbol; otherwise it could only use each symbol once. So we can change the envisioned scenario slightly. There could be a sort of "vending machine" that can create any of the symbols on-demand, so that the agent requests any of the symbols it decides it wants, and then moves it over to where the string is being built.

If we introduce the idea of creating new copies or instances of a given symbol, then the model doesn't need a single copy of each symbol sitting around somewhere for the agent to choose from initially; unless that is how the agent initially learns what symbols are available. Instead, we can imagine the agent already knows, or is told in advance, what symbols are available, and that it can go request a copy of any of them, at any time, from the vending machine.

If the agent's choices are genuinely random, I assume there is some mathematical way of determining what the expected distribution of strings would end up being. If it's 50-50 whether the agent will go on to select another symbol or complete the string, then it will tend to produce shorter strings over longer ones.

If we let this system run for a while, we'll end up with some smattering of strings. Now, let's add in that we want the agent to be comprehensive and to construct every distinct possible string made of those symbols.

Again, favoring a simple and intuitive model before we increase the formality, the agent can keep track of which strings it has produced so far, ensuring not to waste time creating the same one over again, and also to check and see which ones it hasn't made yet, and go for those.

If we do not establish an upper limit on how long a string should be, avoiding duplicate strings doesn't make that big of a difference, and the result will still be the agent randomly selecting which strings to make, and after some time, having made kind of a random selection of strings. That said, given that the agent needs to keep track, we can add in that it is either looking at the collection of strings it has placed somewhere, in the finished-strings area, and is able to know and recognize "I've already made that string, better not repeat the exact same choices if my current selection seems close to that one"; or it can be recording the information somewhere, like on a piece of paper, or just committed to memory.

I will also refer to the agent / builder as a / the "machine".

Now, in order to allow the machine to be comprehensive, we could set a limit on the length of the string. The machine now needs to be able to recognize the length of any given string while it is building it, so that it can decide "the string cannot be any longer; I have to finish this one here and put it in the place where finished constructions go."

If the machine still chooses randomly, but it can check if a string has already been made and knows to avoid making that string again, then we can imagine that it consistently does build the complete set of strings. The order in which it builds them will be random, and it will change each time we run this process again. But we can imagine seeing the set of strings laid out in a certain order, where over time the holes / gaps get filled in, until the whole set is completed. In a way, I think this is kind of a proof that every string will be generated: as long as the agent is capable of picking the next string to be one it hasn't made yet, it will necessarily eventually reach a point where there aren't any strings left to make.

Now maybe to tighten this model a bit, we need to start thinking in more detail how such an agent / machine "knows" which strings it has or hasn't made yet, and also how it "chooses" which next symbol to get. Maybe we can reduce those seeming faculties of "knowledge" and "choice" to simpler, mechanistic actions.

Here might be just one way to do this. The machine might have a certain value it can read from, dictating its behavior. The value is either "go on" or "stop now". The machine can also look at 2 strings and decide if they are the same string or not. The machine can also ask "is this string within the current length limit?"

So maybe it does something like this. At the beginning of a construction, the value is set to "go on". The machine selects a symbol at random, and then checks "is this string so far the same as any of the strings I already made?" and "Is this string within the length limit?" If the string is not the same as an existing one, it can choose to finish it and add it to the finished constructions, or it can choose to go on if the string would still be within the length limit. If the string is the same as an existing one but the machine could go on and be within the length limit, it can choose to go on.

Now that the model is getting a little more precise, there are more finicky technical questions emerging. The above model should eventually produce all the correct strings, but we don't know in advance how long it will take, since it is choosing randomly and might randomly skip over a certain string many times, for no reason. Also, the above model is not guaranteed to never accidentally produce the same string twice. If the model knows that its current string is not new, but that it can go on and add another symbol, it might do so; but there might not be any new strings to create at any higher length. The machine would be forced to acknowledge it cannot produce a new strings, and discard the current string and start over. In order to avoid that problem, maybe we can let the machine "look ahead". If it realizes that this is the last opportunity to publish a unique string, it publishes it.

The next aspect of adding more technical precision is moving away from random choice. Random choice is something that is offered in a language like Python, but maybe we are trying to think about how we could build a very rudimentary machine from scratch that can enumerate the strings. The only method I currently know of is that since a very simple machine can't really "make arbitrary choices" (I think?), we have to choose for it. So, we put the symbols in an order, and we have the machine always choose the firstmost available symbol. Similarly, we put the desired string lengths in some arbitrary order, and we have the machine always choose the firstmost length available.

This changes the behavior of the machine and makes it much more deterministic. If the symbols are a, b, and c, and their order is first a, second b, third c; and the maximum string length is 3 symbols, and the preferred string length order is first 1, second 2, third 3, then the machine will operate roughly as follows:

The most preferred symbol is a. Can I choose that symbol and still be able to create a unique string? Let's say I do. I have a hypothetical string "a". Let me check the strings I've built so far. Do any match "a"? There are no strings. Is that hypothetical string's length within the length limit? It is. Ok, I construct string "a". It is valid, so I publish it and start over.
The most preferred symbol is a. Can I choose that symbol and still be able to create a unique string? Let's say I do. I have a hypothetical string "a". Let me check the strings I've built so far. Do any match "a"? There is one string. It matches "a". I cannot publish the string "a". If I go on, will that string be within the length limit? Yes, it will. If I go on, will there be unique strings for me to make? Let's imagine I do. I will add a symbol to my hypothetical string. The most preferred symbol is "a". Does this match any strings I have made so far? No, it doesn't. Good, I can publish this string. I publish it and start over.

And so on.

Now I am trying to make this closer to completely rigorous by imagining setting up a very simple machine that can actually do this. I think of "writing" as "changing the state of some stateful object". I think of "reading" as "taking an input into some function".

The machine is provided with the first symbol as an input. It then receives the first symbol from the first string in the published strings area. The function returns True if they match, False if they do not.

I'm gonna stop here, these are some new ideas for me, but what I'm realizing now is I don't yet see a clear intuitive way for the machine to be able to count the length of strings, but I'm sure I'll figure that out next.

I was wondering if anyone could comment on this, give me further ideas, explain connections to established ideas, etc.

I think I see how to track length. Each symbol can also do double duty as a representation of length. We may need a unique symbol for each allowable length. I.e., a = 1, b = 2, etc. In order to count the length of a string, the machine checks "is there a next symbol"? and if so, rewrites the "length" symbol, wherever it's stored, to the next symbol in the order. This is how we can turn the decision procedure of how long a string is into checking equality of the length-representation symbol and the allowed-length symbol. — Julius Hamilton
– Julius Hamilton, Commented Sep 28 at 13:36
You appear to have reinvented grammars and nondeterministic automata. The field of study of such devices is called formal language theory. It got going in the 1950s (although similar things were being done much earlier) and the literature is large. Many different types of grammars and automata have been invented and studied. — reinierpost
– reinierpost, Commented Sep 28 at 20:35

Stack Exchange Network

A simple machine that enumerates strings

0

Hot Network Questions

A simple machine that enumerates strings

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions