Pascal Lleonart

Posted on Jun 5

How to use the gradient method to find the extrema of a two variable function in python?

#math #gradient #python

The gradient method is used to calculate the maximum (or minimum) of a function near a given point.

The following diagram illustrates this process for a single variable function in two dimensions:

In this diagram the black line represents the graph of the function f. The points A, B, C and D (in red) are points on this graph (with coordinates (xA, f(xA)), etc...).

S1 and S2 (in blue) are also points on the graph, and they are two local maxima.

The green arrows show the path taken by the gradient method: it finds the maximum near the given point by following the slope!

What are we going to do?

We are going to study this two variable function and find its extrema for $x, y \in [-1.2, 1.2]$ thanks to the gradient method.

f: \R \times \R \longrightarrow \R, (x, y) \longmapsto \sin( \frac{1}{2} x^2 - \frac{1}{4} y^2 + 3 ) \cos( 2x + 1 - e^y )

$The graph of f for katex inline x, y \in [-1.2, 1.2] endkatex$

First of all, lets get this image on your computer!

We'll use matplotlib for graph drawings, so first you have to install it.

Then we have to define some functions.

Define f

To define f, you'll need the sin, cos and exp functions defined in numpy (that will be installed by matplotlib).

Answer

def f(x, y):
    return np.sin(1/2 * x**2 - 1/4 * y**2 + 3) * np.cos(2*x + 1 - np.exp(y))

Display the graph

We can use these lines to plot a set of points whitin the requested interval:

X = np.linspace(-1.2, 1.2, 1000)
Y = np.linspace(-1.2, 1.2, 1000)
X, Y = np.meshgrid(X,Y)
Z = f(X, Y)

Add these lines to display the graph:

ax = plt.figure().add_subplot(projection='3d')
ax.plot_surface(X, Y, Z, cmap=plt.cm.coolwarm)
plt.show()

Define the derivative functions (math incomming)

The gradient method requires the derivative of the relevant function. You can choose to calculate the derivative by hand and then implement it in Python, or you can use the definition of the derivative to save time.

Let $f : \R \longrightarrow \R$ be a differentiable function. We have:

f' : \R \longrightarrow \R, f'(x) = \lim\limits_{h \rightarrow 0} \frac{f(x + h) - f(x)}{h}

For two variable functions like $f : \R \times \R \longrightarrow \R$ , we have in fact two different derivatives : the partial derivatives on the both variables, i.e:

\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}

And with these, we can get the gradient of the function in a given point:

\nabla f (x_0, y_0) = \Big( \frac{\partial f}{\partial x}(x_0, y_0), \frac{\partial f}{\partial y} (x_0, y_0) \Big)

To calculate these, we must select one of the variables and calculate the derivative of the function while treating the other variable as a constant.

E.g:

f(x, y) = x^2 + xy + 3x

First we calculate $\frac{\partial f}{\partial x}$ :

We set y as a constant, so that we can consider $g(x) = x^2 + xy + 3x$ :

$g'(x) = 2x + y + 3$ So we have $\frac{\partial f}{\partial x} = 2x + y + 3$
Then we calculate $\frac{\partial f}{\partial y}$ :
We set x as a constant, so we can consider $h(y) = x^2 + xy + 3x$ :

$h'(x) = 0 + x + 0$ So we have $\frac{\partial f}{\partial y} = x$
Then we can get the gradient, etc...

Implementing the two Python functions may be difficult, you may need to use lambdas.

You can use d to calculate the derivative of a single-variable function and d2 for two-variables functions.

d2 should return, for f, x0, y0 given, $\nabla f (x_0, y_0)$ .

Answer

# derivative of a single variable function
def d(f, x0: float) -> float:
    h = 1/10000  # enough for our usage
    return (f(x0 + h) - f(x0)) / h

# derivative of a two variable function
def d2(f, x0: float, y0: float):
    return ( d((lambda x: f(x, y0)), x0), d((lambda y: f(x0, y)), y0) )

I will break down d2:

This function takes a two-variable function as parameter, as well as a pair (the function f and the pair (x0, y0)).
lambda x: f(x, y0) is a function in which the y value is set as a constant (y0 in this case). You can think of it as the g function from the previous example.
d((lambda x: f(x, y0)), x0) will calculate the derivative value of the "g" function at x0.
The same process applies to the second member of the returned tuple.

Now let's talk about the gradient method!

Let's visualize the main principle using another two dimensional example (a single-variable function).

Given point A, we must find the nearest maximum point.

First we need to define what we mean by "near points". In our case, we will consider points that are a distance alpha from A.

We implemented functions to calculate derivatives and with good reason! We can now calculate the derivative of the function in A.

Now that we have the slope, we can identify the point in the "points near A" that "continues" this slope (in this case, point B).

We can find the point B using this formula:

x_B = x_A + \alpha f'(x_A), y_B = f(x_B)

Then, we will continue until we stop moving (we will discuss this in more detail later).

Note: to find the minimum of the function, replace alpha by -alpha.

About the choice of `alpha`

Our alpha should not be too big or too small.

If it is too big, the algorithm may miss nearby local maxima and if it is too small the algorithm will be too slow.

Although there isn't a true method to find a good alpha but you can experiment with different values and then choose the best one.

In our case $\alpha = 0.01$ .

And how can we do for two variables functions?

That's the case where we need our gradient.

In this case, we have two variables, and two derivatives, and we are in three dimensions. Therefore, we can use this formula to find B :

x_B = x_A + \alpha \frac{\partial f}{\partial x}(x_A, y_A), y_B = y_A + \alpha \frac{\partial f}{\partial y}(x_A, y_A), z_B = f(x_B, y_B)

You now have all the cards in your hand!

Now we'll use all we've learnt!

We can see that, by definition, this algorithm finds the local maximum near the starting point, rather than the global maximum of the function.

To find the global maximum, we can try on different starting points.

We could select n points at random and run the algorithm on them but how many points should we generate to be sure that we have found the maximum ?

Another approach is to grid the graph with points and run the algorithm on all of them.

Implementing the algorithm for one point

We need to store the x and y coordinates of the last and penultimate points calculated (checking if stagnation is occurring).

To do so, we need an absolute value function and an epsilon constant to determine whether two points are close enough to be considered a maximum (in our case I chose an espilon = 0.01).

Answer

x0 = [0, x] # we suppose x, y are defined
y0 = [0, y]
z0 = f(x0[1], y0[1])

while abs(x0[0] - x0[1]) > epsilon and abs(y0[0] - y0[-1]) > epsilon:
    nablaf = d2(f, x0[1], y0[1])
    x0[0] = x0[1] # we're storing the last points inside a two-sized list
    y0[0] = y0[1]
    x0[1] = x0[0] + alpha * nablaf[0] # change the plus by a minus if you want a minimum
    y0[1] = y0[0] + alpha * nablaf[1]
    z0 = f(x0[1], y0[1])

Implementing the algorithm for the others

We can generate a grid of points like this:

XT = np.linspace(-1.2, 1.2, 20)
YT = np.linspace(-1.2, 1.2, 20)
maxs = [0,0,0] # will be useful for storing the maximum

Then it's just a basic nested for loop with a condition at the end of each iteration!

If you want to see where your maximum point is, you can do this:

plt.plot(maxs[0], maxs[1], maxs[2], 'ro')

At the end we have (-1.2120435693285236, 0.4416799874265309, 0.5109070812327089)

Thanks for reading my article!

If you have any questions, ask them in the comments or on my socials!

github
bluesky
linkedin

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

What are we going to do?

First of all, lets get this image on your computer!

Define f

Display the graph

Define the derivative functions (math incomming)

Now let's talk about the gradient method!

About the choice of alpha

And how can we do for two variables functions?

Now we'll use all we've learnt!

Implementing the algorithm for one point

Implementing the algorithm for the others

Thanks for reading my article!

About the choice of `alpha`