One way to represent this kind of (computable) recursive function mathematically comes from domain theory. I will end up describing something called a Scott domain in this answer, named after Dana Scott.
The key idea is to define an ordering for the gadgets in our language (natural numbers and functions) in terms of how much information content they have. We can then use this ordering to define what a recursive function means. In particular, we will use a certain kind of least upper bound (aka supremum) with respect to the ordering. We will also restrict what kind of set we're allowed to take the least upper bound of (this special kind of set is called a chain).
Representing nontermination
Also, what about infinite loops? It isn't immediately obvious what nontermination should mean mathematically [...]
I'll actually start by addressing this point. This is the crucial part. Once this is resolved, the other things fall into place.
The language described in the question just uses natural numbers as its data type (and functions on natural numbers). We might consider thinking about this data type as the set of natural numbers $\mathbb{N}$.
However, some computations might not terminate. If we have some expression 3 * someFn(7), it might turn out that someFn(7) does not terminate and, therefore, that entire expression does not terminate.
So, let's extend the set of naturals with an extra element to represent nontermination which we will write ⊥ (pronounced "bottom"). Let's call this new set $\mathbb{N}_\bot = \mathbb{N} \cup \{\bot\}$.
Ordering by information content
Now, here is a key step. We can give this new set an ordering. This ordering will be all about information content. Intuitively, given two elements $x, y \in \mathbb{N}_\bot$, $x$ is "less than or equal to" $y$ if $x$ has "at most as much information as" $y$. You can also think of it as "$x$ approximates $y$." We will write this $x \sqsubseteq y$.
$\bot$, representing nontermination, gives you the least amount of information.
Our new ordering $\sqsubseteq$ on $\mathbb{N}_\bot$ will be far simpler than the usual $\le$ ordering on numbers.
It will be defined in two parts. For any element $x \in \mathbb{N}_\bot$, we have:
- $\bot \sqsubseteq x$
- $x \sqsubseteq x$
So, $x$ "approximates" $y$ if either $x = \bot$ or $x = y$. In particular, notice that even though $3 \le 4$, it is not the case that $3 \sqsubseteq 4$ and also not the case that $4 \sqsubseteq 3$. They have incompatible information, so they are not comparable under the $\sqsubseteq$ ordering.
We can draw a diagram to describe this:
$$
\begin{matrix}
  0 & & 1 &  & 2 & \cdots \\
  &\huge\diagdown & \huge| & \huge\diagup \\
  && \bot
\end{matrix}
$$
(Imagine there are similar connections for all other natural numbers.)
If $x$ is connected to $y$ and $x$ is drawn lower in the diagram, this means that $x$ is "less than" $y$. This kind of diagram is called a Hasse diagram.
Ordering functions by information content
Given this, we can already start to talk about functions with nontermination. Generally, we can think of functions in our language as being elements of the set of mathematical functions $\mathbb{N}_\bot \to \mathbb{N}_\bot$.
(A brief aside: If the programming language has a call-by-value evaluation order, then we only consider strict functions from that set. A function $f$ is called "strict" if it satisfies the equation $f(\bot) = \bot$.)
We can create an ordering for the set $\mathbb{N}_\bot \to \mathbb{N}_\bot$ by using the $\mathbb{N}_\bot$ ordering pointwise. In particular, given two functions $f, g : \mathbb{N}_\bot \to \mathbb{N}_\bot$:
- $f \sqsubseteq g$ if for every x in $\mathbb{N}_\bot$ we have $f(x) \sqsubseteq g(x)$
So if $s$ is a function that just immediately returns 0 for every input and $t$ is a function that goes into an infinite loop when its input is 5 but otherwise immediately returns 0, then we have $t \sqsubseteq s$.
Our original ordering was a bit boring, mostly coinciding with equality, but this one starts to get a bit more interesting. We can also continue this process for higher-order functions, though this is outside the scope of this answer. If you do continue this process, you end up with a fairly intricate structure.
Comparable things
How does this actually help us with the original question? Well, this leads us directly to the concepts we need to represent recursion. There are two related concepts we need and then we can talk about recursion.
The first concept lets us restrict ourselves to only comparable collections.
Now that we have these orders, we can talk "chains." What is a chain? A chain is any subset $A$ of some ordered set (like a subset of $\mathbb{N}_\bot$ or a subset of $\mathbb{N}_\bot \to \mathbb{N}_\bot$) where every pair of elements $x, y \in A$ are comparable. That is, for every $x$ and $y$ in $A$, it is either the case that $x \sqsubseteq y$ or that $y \sqsubseteq x$.
In the case of $\mathbb{N}_\bot$, the chains are pretty simple. The empty set is a chain (vacuously). $\{\bot\}$ is a chain and $\{\bot, 7\}$ is a chain. On the other hand, $\{\bot, 7, 8\}$ is not a chain since 7 and 8 are not comparable under the ordering $\sqsubseteq$.
The chains for $\mathbb{N}_\bot \to \mathbb{N}_\bot$ are more interesting. Many functions are uncomparable, just as in the $\mathbb{N}_\bot$ case, however there are also many functions that are comparable.
Those chains of functions will be very useful for us. We will look at such a chain right now. Also, this leads into the second concept we need: the least upper bound of a chain.
Recursion
Let's return to our factorial function. If we look at some input-output pairs, we know that we will find:
$$
\begin{array} {c|cccccc}
x & \bot & 0 & 1 & 2 & 3 & \cdots \\ \hline
f(x) & \bot & 1 & 1 & 2 & 6 & \cdots
\end{array}
$$
(An aside: The set of all input-output pairs of a function is called "the graph" of the function)
But what if we consider a function $g_0$ that gives the factorial only for 0 and then, for larger numbers, goes into an infinite loop? Specifically:
$$
\begin{array} {c|cccccc}
x & \bot & 0 & 1 & 2 & 3 & \cdots \\ \hline
g_0(x) & \bot & 1 & \bot & \bot & \bot & \cdots
\end{array}
$$
We can also consider a function $g_1$ that gives the factorial for 0 and 1, but then goes into an infinite loop for larger numbers:
\begin{array} {c|cccccc}
x & \bot & 0 & 1 & 2 & 3 & \cdots \\ \hline
g_1(x) & \bot & 1 & 1 & \bot & \bot & \cdots
\end{array}
We can continue this pattern for any natural number $n$ to get $g_n$ that gives the "correct" answer for every $x \le n$ and goes into an infinite loop for $x \gt n$.
These $g$'s form a chain, in fact an infinite chain: $g_0 \sqsubseteq g_1 \sqsubseteq g_2 \sqsubseteq g_3 \sqsubseteq \cdots$
Call the set of these functions $G = \{g_0, g_1, g_2, \ldots\}$.
We also see that, for any natural number $n$, we have $g_n \sqsubseteq f$ (where $f$ is our factorial function). To put it another way, $f$ is an upper bound of the set $G$.
It just so happens that $f$ is the unique upper bound of $G$ but in general there could be many upper bounds. I will return to this shortly. For now, it suffices to say we always want the least upper bound.
The least upper bound of a set $A$ with respect to the order $\sqsubseteq$ is written $\sqcup\,A$. If $A$ is a chain, then $\sqcup\,A$ must exist in our ordering. (An aside: This is because our ordering is chain-complete, but it is not necessary to understand the details of that here.)
We say that $z$ is a "least upper bound" of the chain $A$ if the following two conditions hold:
- $z$ is an upper bound of $A$
- If $x$ is any upper bound of $A$, then $z \sqsubseteq x$
$G$ is a chain. So, $\sqcup$ exists and we can say that $f = \sqcup \,G$.
Though it might look different, this has similarities to finding a limit in calculus. In fact, you can reframe the calculus concept of a limit in terms of directed sets (which are a generalization of chains)! [1] This is outside the scope of this answer, however.
Finding a chain
How exactly did we end up with the chain of functions G? Here is an alternate presentation. Consider the function:
$$
h(k)(x) = \left\{ \begin{array}{ll}
1 & \mathrm{if}\ x = 0 \\
x \cdot k(x - 1) & \mathrm{otherwise}
\end{array}
\right.
$$
This is obtained from our original function by adding a new argument $k$ and replacing recursive calls with calls to $k$.
Now we can see that $g_0 = h(\bot)$. In fact, we can use $h$ and $\bot$ for all of the $g$'s:
- $g_0 = h(\bot)$
- $g_1 = h(h(\bot))$
- $g_2 = h(h(h(\bot)))$
- ...
So, for each $g_i$ we have $g_i = h(h(\ldots(h(⊥))\ldots))$ where there are exactly $i+1$ applications of $h$. We can also write this as $g_i = h^{i+1}(\bot)$. What does this give us?
We can now write $G$ as $G = \{h^n(\bot) : n \in \mathbb{N}\}$.
We might now write
- $f = \sqcup\,\{h^n(\bot) : n \in \mathbb{N}\}$
or alternatively
- $f = \sqcup_{n \in \mathbb{N}}\,h^n(\bot)$
Previously, I had just verbally described the chain $G$. Now we have given specific mathematical definitions for everything involved.
Least fixed points: the meaning of recursive functions
We can in fact see that $f$ is the least fixed point of $h$.
First, what is a fixed point?
- $a$ is called a "fixed point" of $f$ if $f(a) = a$
- $z$ is a least fixed point of $f$ if it is a fixed point of $f$ and, for any other fixed point $b$ of $f$, we have $z \sqsubseteq b$.
So, firstly, we can see that $h(f) = f$. Therefore $f$ is a fixed point of $h$. We are just plugging in $f$ for the where the original recursive call went. This recovers the original function.
Furthermore, $f$ is the least fixed point of $h$. This is more important in the case of a function which does not terminate for some of its inputs. In that case, we would need to be careful to choose the least fixed point to make sure that we end up with $\bot$ when $f$ does not terminate.
This is what characterizes a recursive function! You take the recursive function defined in your programming language and create a new, non-recursive function like this (we called this new function $h$):
- Include an additional argument (we called this argument $k$ when we made $h$)
- Replace all recursive calls with applications of that argument (we replaced the $f(x-1)$ call with $k(x-1)$ when we defined $h$).
Then the denotation of your original function is the least fixed point of this new, non-recursive function. The least fixed point of such a function always exists and it's always unique!
Tying up a loose end: multiple upper bounds
In our example, we only had one upper bound. But I also said that, in general, there could be multiple upper bounds. Furthermore, we always want the least upper bound.
What is an example of a chain of functions that has multiple upper bounds? Well, say we only have the first four $g$'s: $P = \{g_0, g_1, g_2, g_3\}$. This has many upper bounds. In fact, not only is $f$ an upper bound, but so is every $g_i$ where $i \ge 3$. It has other upper bounds beyond those, as well.
Where might such a chain arise? Let's say we started with a function that gave the factorial for only inputs less than or equal to 3 and otherwise went into an infinite loop. In a shorthand syntax for the language, we might write:
p(0) := 1
p(1) := 1
p(2) := 2
p(3) := 6
p(x) := p(x) if x > 3
In this case, $p$ is an upper bound of $P$. But, as I said, there are many upper bounds for $P$! In fact, $p$ turns out to be the unique least upper bound of $P$, since it gives at most as much information as the other upper bounds. Never more information than the other upper bounds of $P$.
If we do that same trick from above and write
$$
u(k)(x) = \left\{ \begin{array}{ll}
f(x) & \mathrm{if}\ x \le 3 \\
k(x) & \mathrm{otherwise}
\end{array}
\right.
$$
then we can see that the least fixed point of $u$ is $p$. However, $u$ has more fixed points.
Another fixed point of $u$ is the function which gives $f(x)$ whenever $x \le 3$ but is then always gives 7 whenever $x \gt 3$. But this is not the least fixed point, since that function has more information than $p$. There is only one least fixed point of $u$ and that is $p$.
[1] See Limits: A New Approach to Real Analysis by Alan F. Beardon
Further reading
Papers:
- Domains for Denotational Semantics by Dana Scott
- A type-theoretical alternative to ISWIM, CUCH, OWHY by Dana Scott
- Toward a mathematical semantics for computer languages by Dana Scott and Christopher Strachey
Book chapters:
- Formal Semantics of Programming Languages (chapters 5, 8, 9, 10, 12) by Glynn Winskel
- Semantics of Programming Languages: Structures and Techniques (chapter 4) by Carl A. Gunter
- Domain-Theoretic Foundations of Functional Programming by Thomas Streicher
- Introduction to Lattices and Order by Brian A. Davey and Hilary Priestley. This is an introduction to order theory more broadly. They also do talk about programming language semantics specifically (for example, in Ch. 9).