DEV Community

Cover image for How to use the gradient method to find the extrema of a two variable function in python?
Pascal Lleonart
Pascal Lleonart

Posted on

How to use the gradient method to find the extrema of a two variable function in python?

The gradient method is used to calculate the maximum (or minimum) of a function near a given point.

The following diagram illustrates this process for a single variable function in two dimensions:

example in two dimensions

In this diagram the black line represents the graph of the function f. The points A, B, C and D (in red) are points on this graph (with coordinates (xA, f(xA)), etc...).

S1 and S2 (in blue) are also points on the graph, and they are two local maxima.

The green arrows show the path taken by the gradient method: it finds the maximum near the given point by following the slope!

What are we going to do?

We are going to study this two variable function and find its extrema for x,y[1.2,1.2]x, y \in [-1.2, 1.2] thanks to the gradient method.

f:R×RR,(x,y)sin(12x214y2+3)cos(2x+1ey) f: \R \times \R \longrightarrow \R, (x, y) \longmapsto \sin( \frac{1}{2} x^2 - \frac{1}{4} y^2 + 3 ) \cos( 2x + 1 - e^y )

The graph of f for  katex inline  x, y \in [-1.2, 1.2]  endkatex

First of all, lets get this image on your computer!

We'll use matplotlib for graph drawings, so first you have to install it.

Then we have to define some functions.

Define f

To define f, you'll need the sin, cos and exp functions defined in numpy (that will be installed by matplotlib).

Answer
def f(x, y):
    return np.sin(1/2 * x**2 - 1/4 * y**2 + 3) * np.cos(2*x + 1 - np.exp(y))
Enter fullscreen mode Exit fullscreen mode

Display the graph

We can use these lines to plot a set of points whitin the requested interval:

X = np.linspace(-1.2, 1.2, 1000)
Y = np.linspace(-1.2, 1.2, 1000)
X, Y = np.meshgrid(X,Y)
Z = f(X, Y)
Enter fullscreen mode Exit fullscreen mode

Add these lines to display the graph:

ax = plt.figure().add_subplot(projection='3d')
ax.plot_surface(X, Y, Z, cmap=plt.cm.coolwarm)
plt.show()
Enter fullscreen mode Exit fullscreen mode

Define the derivative functions (math incomming)

The gradient method requires the derivative of the relevant function. You can choose to calculate the derivative by hand and then implement it in Python, or you can use the definition of the derivative to save time.

Let f:RRf : \R \longrightarrow \R be a differentiable function. We have:

f:RR,f(x)=limh0f(x+h)f(x)h f' : \R \longrightarrow \R, f'(x) = \lim\limits_{h \rightarrow 0} \frac{f(x + h) - f(x)}{h}

For two variable functions like f:R×RRf : \R \times \R \longrightarrow \R , we have in fact two different derivatives : the partial derivatives on the both variables, i.e:

fx,fy \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}

And with these, we can get the gradient of the function in a given point:

f(x0,y0)=(fx(x0,y0),fy(x0,y0)) \nabla f (x_0, y_0) = \Big( \frac{\partial f}{\partial x}(x_0, y_0), \frac{\partial f}{\partial y} (x_0, y_0) \Big)

To calculate these, we must select one of the variables and calculate the derivative of the function while treating the other variable as a constant.

E.g:

f(x,y)=x2+xy+3x f(x, y) = x^2 + xy + 3x
  1. First we calculate fx\frac{\partial f}{\partial x} :

    We set y as a constant, so that we can consider g(x)=x2+xy+3xg(x) = x^2 + xy + 3x :

    g(x)=2x+y+3 g'(x) = 2x + y + 3
    So we have fx=2x+y+3\frac{\partial f}{\partial x} = 2x + y + 3
  2. Then we calculate fy\frac{\partial f}{\partial y} :
    We set x as a constant, so we can consider h(y)=x2+xy+3xh(y) = x^2 + xy + 3x :

    h(x)=0+x+0 h'(x) = 0 + x + 0
    So we have fy=x\frac{\partial f}{\partial y} = x
  3. Then we can get the gradient, etc...

Implementing the two Python functions may be difficult, you may need to use lambdas.

You can use d to calculate the derivative of a single-variable function and d2 for two-variables functions.

d2 should return, for f, x0, y0 given, f(x0,y0)\nabla f (x_0, y_0) .

Answer
# derivative of a single variable function
def d(f, x0: float) -> float:
    h = 1/10000  # enough for our usage
    return (f(x0 + h) - f(x0)) / h

# derivative of a two variable function
def d2(f, x0: float, y0: float):
    return ( d((lambda x: f(x, y0)), x0), d((lambda y: f(x0, y)), y0) )
Enter fullscreen mode Exit fullscreen mode

I will break down d2:

  1. This function takes a two-variable function as parameter, as well as a pair (the function f and the pair (x0, y0)).
  2. lambda x: f(x, y0) is a function in which the y value is set as a constant (y0 in this case). You can think of it as the g function from the previous example.
  3. d((lambda x: f(x, y0)), x0) will calculate the derivative value of the "g" function at x0.
  4. The same process applies to the second member of the returned tuple.

Now let's talk about the gradient method!

Let's visualize the main principle using another two dimensional example (a single-variable function).

Given point A, we must find the nearest maximum point.

We have A a point in the graph of f

First we need to define what we mean by "near points". In our case, we will consider points that are a distance alpha from A.

We define the

We implemented functions to calculate derivatives and with good reason! We can now calculate the derivative of the function in A.

We have drawn the tangent line to f passing through A

Now that we have the slope, we can identify the point in the "points near A" that "continues" this slope (in this case, point B).

We find B, it's in this case the left one because the tangent line is decreasing

We can find the point B using this formula:

xB=xA+αf(xA),yB=f(xB) x_B = x_A + \alpha f'(x_A), y_B = f(x_B)

Then, we will continue until we stop moving (we will discuss this in more detail later).

Note: to find the minimum of the function, replace alpha by -alpha.

About the choice of alpha

Our alpha should not be too big or too small.

If it is too big, the algorithm may miss nearby local maxima and if it is too small the algorithm will be too slow.

Although there isn't a true method to find a good alpha but you can experiment with different values and then choose the best one.

In our case α=0.01\alpha = 0.01 .

And how can we do for two variables functions?

That's the case where we need our gradient.

In this case, we have two variables, and two derivatives, and we are in three dimensions. Therefore, we can use this formula to find B :

xB=xA+αfx(xA,yA),yB=yA+αfy(xA,yA),zB=f(xB,yB) x_B = x_A + \alpha \frac{\partial f}{\partial x}(x_A, y_A), y_B = y_A + \alpha \frac{\partial f}{\partial y}(x_A, y_A), z_B = f(x_B, y_B)

You now have all the cards in your hand!

Now we'll use all we've learnt!

We can see that, by definition, this algorithm finds the local maximum near the starting point, rather than the global maximum of the function.

To find the global maximum, we can try on different starting points.

We could select n points at random and run the algorithm on them but how many points should we generate to be sure that we have found the maximum ?

Another approach is to grid the graph with points and run the algorithm on all of them.

Implementing the algorithm for one point

We need to store the x and y coordinates of the last and penultimate points calculated (checking if stagnation is occurring).

To do so, we need an absolute value function and an epsilon constant to determine whether two points are close enough to be considered a maximum (in our case I chose an espilon = 0.01).

Answer
x0 = [0, x] # we suppose x, y are defined
y0 = [0, y]
z0 = f(x0[1], y0[1])

while abs(x0[0] - x0[1]) > epsilon and abs(y0[0] - y0[-1]) > epsilon:
    nablaf = d2(f, x0[1], y0[1])
    x0[0] = x0[1] # we're storing the last points inside a two-sized list
    y0[0] = y0[1]
    x0[1] = x0[0] + alpha * nablaf[0] # change the plus by a minus if you want a minimum
    y0[1] = y0[0] + alpha * nablaf[1]
    z0 = f(x0[1], y0[1])
Enter fullscreen mode Exit fullscreen mode

Implementing the algorithm for the others

We can generate a grid of points like this:

XT = np.linspace(-1.2, 1.2, 20)
YT = np.linspace(-1.2, 1.2, 20)
maxs = [0,0,0] # will be useful for storing the maximum
Enter fullscreen mode Exit fullscreen mode

Then it's just a basic nested for loop with a condition at the end of each iteration!

If you want to see where your maximum point is, you can do this:

plt.plot(maxs[0], maxs[1], maxs[2], 'ro')
Enter fullscreen mode Exit fullscreen mode

At the end we have (-1.2120435693285236, 0.4416799874265309, 0.5109070812327089)
The maximum is drawn in red

Thanks for reading my article!

If you have any questions, ask them in the comments or on my socials!

github
bluesky
linkedin

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.