The language described in the question just uses natural numbers as its data type (and functions on natural numbers). We might consider thinking about this data type as the set of natural numbers N$\mathbb{N}$.
So, let's extend the set of naturals with an extra element to represent nontermination which we will write ⊥ (pronounced "bottom"). Let's call this new set N⊥ (= N ∪ {⊥})$\mathbb{N}_\bot = \mathbb{N} \cup \{\bot\}$.
Now, here is a key step. We can give this new set an ordering. This ordering will be all about information content. Intuitively, given two elements x and y of N⊥$x, y \in \mathbb{N}_\bot$, x$x$ is "less than or equal to" y$y$ if x$x$ has "at most as much information as" y$y$. You can also think of it as "x"$x$ approximates y$y$." We will write this x ⊑ y$x \sqsubseteq y$.
⊥$\bot$, representing nontermination, gives you the least amount of information.
Our new ordering ⊑$\sqsubseteq$ on N⊥$\mathbb{N}_\bot$ will be far simpler than the usual ≤$\le$ ordering on numbers.
It will be defined in two parts. For any element x of N⊥$x \in \mathbb{N}_\bot$, we have:
- ⊥ ⊑ x$\bot \sqsubseteq x$
- x ⊑ x$x \sqsubseteq x$
So, x$x$ "approximates" y$y$ if either x is ⊥$x = \bot$ or x = y$x = y$. In particular, notice that even though 3 ≤ 4$3 \le 4$, it is not the case that 3 ⊑ 4$3 \sqsubseteq 4$ and also not the case that 4 ⊑ 3$4 \sqsubseteq 3$. They have incompatible information, so they are not comparable under the ⊑$\sqsubseteq$ ordering.
┌─┐ ┌─┐ ┌─┐
│0│ │1│ │2│ ...
└┬┘ └┬┘ └┬┘
└───┐│┌───┘
┌┴┴┴┐
│ ⊥ │
└───┘
$$
\begin{matrix}
0 & & 1 & & 2 & \cdots \\
&\huge\diagdown & \huge| & \huge\diagup \\
&& \bot
\end{matrix}
$$
If x$x$ is connected to y$y$ and x$x$ is drawn lower in the diagram, this means that x$x$ is "less than" y$y$. This kind of diagram is called a Hasse diagram (which I've written in kind of a strange Unicode way, since we don't have LaTeX support here).
Given this, we can already start to talk about functions with nontermination. Generally, we can think of functions in our language as being elements of the set of mathematical functions N⊥ → N⊥$\mathbb{N}_\bot \to \mathbb{N}_\bot$.
(A brief aside: If the programming language has a call-by-value evaluation order, then we only consider strict functions from that set. A function f$f$ is called "strict" if it satisfies the equation f(⊥) = ⊥$f(\bot) = \bot$.)
We can create an ordering for the set N⊥ → N⊥$\mathbb{N}_\bot \to \mathbb{N}_\bot$ by using the N⊥$\mathbb{N}_\bot$ ordering pointwise. In particular, given two functions f, g : N⊥ → N⊥$f, g : \mathbb{N}_\bot \to \mathbb{N}_\bot$:
- f ⊑ g$f \sqsubseteq g$ if for every x in N⊥$\mathbb{N}_\bot$ we have f(x) ⊑ g(x)$f(x) \sqsubseteq g(x)$
So if s$s$ is a function that just immediately returns 0 for every input and t$t$ is a function that goes into an infinite loop when its input is 5 but otherwise immediately returns 0, then we have t ⊑ s$t \sqsubseteq s$.
Now that we have these orders, we can talk "chains." What is a chain? A chain is any subset A$A$ of some ordered set (like a subset of N⊥$\mathbb{N}_\bot$ or a subset of N⊥ → N⊥$\mathbb{N}_\bot \to \mathbb{N}_\bot$) where every pair of elements x, y in A$x, y \in A$ are comparable. That is, for every x$x$ and y$y$ in A$A$, it is either the case that x ⊑ y$x \sqsubseteq y$ or that y ⊑ x$y \sqsubseteq x$.
In the case of N⊥$\mathbb{N}_\bot$, the chains are pretty simple. The empty set is a chain (vacuously). {⊥}$\{\bot\}$ is a chain and {⊥, 7}$\{\bot, 7\}$ is a chain. On the other hand, {⊥, 7, 8}$\{\bot, 7, 8\}$ is not a chain since 7 and 8 are not comparable under the ordering ⊑$\sqsubseteq$.
The chains for N⊥ → N⊥$\mathbb{N}_\bot \to \mathbb{N}_\bot$ are more interesting. Many functions are uncomparable, just as in the N⊥$\mathbb{N}_\bot$ case, however there are also many functions that are comparable.
x | f(x)
--+-----
⊥ | ⊥
0 | 1
1 | 1
2 | 2
3 | 6
...
$$
\begin{array} {c|cccccc}
x & \bot & 0 & 1 & 2 & 3 & \cdots \\ \hline
f(x) & \bot & 1 & 1 & 2 & 6 & \cdots
\end{array}
$$
But what if we consider a function g_0$g_0$ that gives the factorial only for 0 and then, for larger numbers, goes into an infinite loop? Specifically:
x | g_0(x)
--+-----
⊥ | ⊥
0 | 1
1 | ⊥
2 | ⊥
3 | ⊥
...
$$
\begin{array} {c|cccccc}
x & \bot & 0 & 1 & 2 & 3 & \cdots \\ \hline
g_0(x) & \bot & 1 & \bot & \bot & \bot & \cdots
\end{array}
$$
We can also consider a function g_1$g_1$ that gives the factorial for 0 and 1, but then goes into an infinite loop for larger numbers:
x | g_1(x)
--+-----
⊥ | ⊥
0 | 1
1 | 1
2 | ⊥
3 | ⊥
...
\begin{array} {c|cccccc}
x & \bot & 0 & 1 & 2 & 3 & \cdots \\ \hline
g_1(x) & \bot & 1 & 1 & \bot & \bot & \cdots
\end{array}
We can continue this pattern for any natural number N$n$ to get g_N$g_n$ that gives the "correct" answer for every x ≤ N$x \le n$ and goes into an infinite loop for x > N$x \gt n$.
These g's$g$'s form a chain, in fact an infinite chain: g_0 ⊑ g_1 ⊑ g_2 ⊑ g_3 ⊑ ...$g_0 \sqsubseteq g_1 \sqsubseteq g_2 \sqsubseteq g_3 \sqsubseteq \cdots$
Call the set of these functions G = {g_0, g_1, g_2, ...}$G = \{g_0, g_1, g_2, \ldots\}$.
We also see that, for any natural number n$n$, we have g_n ⊑ f$g_n \sqsubseteq f$ (where f$f$ is our factorial function). To put it another way, f$f$ is an upper bound of the set G$G$.
It just so happens that f$f$ is the unique upper bound of G$G$ but in general there could be many upper bounds. I will return to this shortly. For now, it suffices to say we always want the least upper bound.
The least upper bound of a set A$A$ with respect to the order ⊑$\sqsubseteq$ is written ⊔A. [1]$\sqcup\,A$. If A$A$ is a chain, then ⊔A$\sqcup\,A$ must exist in our ordering. (An aside: This is because our ordering is chain-complete, but it is not necessary to understand the details of that here.)
We say that z$z$ is a "least upper bound" of the chain A$A$ if the following two conditions hold:
- z$z$ is an upper bound of A$A$
- If x$x$ is any upper bound of A$A$, then z ⊑ x$z \sqsubseteq x$
G$G$ is a chain. So, ⊔G$\sqcup$ exists and we can say that f = ⊔G$f = \sqcup \,G$.
Though it might look different, this has similarities to finding a limit in calculus. In fact, you can reframe the calculus concept of a limit in terms of directed sets (which are a generalization of chains)! [2][1] This is outside the scope of this answer, however.
{ 1 if x = 0
h(k)(x) = {
{ x * k(x-1) otherwise
$$
h(k)(x) = \left\{ \begin{array}{ll}
1 & \mathrm{if}\ x = 0 \\
x \cdot k(x - 1) & \mathrm{otherwise}
\end{array}
\right.
$$
This is obtained from our original function by adding a new argument k$k$ and replacing recursive calls with calls to k$k$.
Now we can see that g_0 = h(⊥)$g_0 = h(\bot)$. In fact, we can use h$h$ and ⊥$\bot$ for all of the g's$g$'s:
- g_0 = h(⊥)$g_0 = h(\bot)$
- g_1 = h(h(⊥))$g_1 = h(h(\bot))$
- g_2 = h(h(h(⊥)))$g_2 = h(h(h(\bot)))$
- ...
So, for each g_i$g_i$ we have g_i = h(h(...(h(⊥))...))$g_i = h(h(\ldots(h(⊥))\ldots))$ where there are exactly i+1$i+1$ applications of h$h$. We can also write this as g_i = hi+1(⊥)$g_i = h^{i+1}(\bot)$. What does this give us?
We can now write G as G = {hn(⊥) | n ∈$G$ as N}$G = \{h^n(\bot) : n \in \mathbb{N}\}$.
- f = ⊔{hn(⊥) | n ∈ N}$f = \sqcup\,\{h^n(\bot) : n \in \mathbb{N}\}$
- f = ⊔n ∈ N hn(⊥)$f = \sqcup_{n \in \mathbb{N}}\,h^n(\bot)$
Previously, I had just verbally described the chain G$G$. Now we have given specific mathematical definitions for everything involved.
We can in fact see that f$f$ is the least fixed point of h$h$.
a$a$ is called a "fixed point" of f$f$ if f(a) = a$f(a) = a$
z$z$ is a least fixed point of f$f$ if it is a fixed point of f$f$ and, for b any other fixed point $b$ of f$f$, we have z ⊑ b$z \sqsubset b$.
So, firstly, we can see that h(f) = f$h(f) = f$. Therefore f$f$ is a fixed point of h$h$. We are just plugging in f$f$ for the where the original recursive call went. This recovers the original function.
Furthermore, f$f$ is the least fixed point of h$h$. This is more important in the case of a function which does not terminate for some of its inputs. In that case, we would need to be careful to choose the least fixed point to make sure that we end up with ⊥$\bot$ when f$f$ does not terminate.
This is what characterizes a recursive function! You take the recursive function defined in your programming language and create a new, non-recursive function like this (we called this new function h$h$):
- Include an additional argument (we called this argument
k$k$ when we made h$h$)
- Replace all recursive calls with applications of that argument (we replaced the
f(x-1)$f(x-1)$ call with k(x-1)$k(x-1)$ when we defined h$h$).
What is an example of a chain of functions that has multiple upper bounds? Well, say we only have the first four g's$g$'s: P = {g_0, g_1, g_2, g_3}$P = \{g_0, g_1, g_2, g_3\}$. This has many upper bounds. In fact, not only is f$f$ an upper bound, but so is every g_i$g_i$ where i ≥ 3$i \ge 3$. It has other upper bounds beyond those, as well.
In this case, p$p$ is an upper bound of P$P$. But, as I said, there are many upper bounds for P$P$! In fact, p$p$ turns out to be the unique least upper bound of P$P$, since it gives at most as much information as the other upper bounds. Never more information than the other upper bounds of P$P$.
{ f(x) if x ≤ 3
u(k)(x) = {
{ k(x) otherwise
$$
u(k)(x) = \left\{ \begin{array}{ll}
f(x) & \mathrm{if}\ x \le 3 \\
k(x) & \mathrm{otherwise}
\end{array}
\right.
$$
then we can see that the least fixed point of u$u$ is p$p$. However, u$u$ has more fixed points.
Another fixed point of u$u$ is the function which gives f(x)$f(x)$ whenever x ≤ 3$x \le 3$ but is then always gives 7 whenever x > 3$x \gt 3$. But this is not the least fixed point, since that function has more information than p$p$. There is only one least fixed point of u$u$ and that is p$p$.
[1] The ⊔ should be bigger, but I'm restricted by StackExchange formatting.
[2][1] See Limits: A New Approach to Real Analysis by Alan F. Beardon