The gradient method is used to calculate the maximum (or minimum) of a function near a given point.
The following diagram illustrates this process for a single variable function in two dimensions:
In this diagram the black line represents the graph of the function f. The points A, B, C and D (in red) are points on this graph (with coordinates (xA, f(xA)), etc...).
S1 and S2 (in blue) are also points on the graph, and they are two local maxima.
The green arrows show the path taken by the gradient method: it finds the maximum near the given point by following the slope!
What are we going to do?
We are going to study this two variable function and find its extrema for thanks to the gradient method.
First of all, lets get this image on your computer!
We'll use matplotlib
for graph drawings, so first you have to install it.
Then we have to define some functions.
Define f
To define f, you'll need the sin
, cos
and exp
functions defined in numpy
(that will be installed by matplotlib
).
Answer
def f(x, y):
return np.sin(1/2 * x**2 - 1/4 * y**2 + 3) * np.cos(2*x + 1 - np.exp(y))
Display the graph
We can use these lines to plot a set of points whitin the requested interval:
X = np.linspace(-1.2, 1.2, 1000)
Y = np.linspace(-1.2, 1.2, 1000)
X, Y = np.meshgrid(X,Y)
Z = f(X, Y)
Add these lines to display the graph:
ax = plt.figure().add_subplot(projection='3d')
ax.plot_surface(X, Y, Z, cmap=plt.cm.coolwarm)
plt.show()
Define the derivative functions (math incomming)
The gradient method requires the derivative of the relevant function. You can choose to calculate the derivative by hand and then implement it in Python, or you can use the definition of the derivative to save time.
Let
be a differentiable function. We have:
For two variable functions like , we have in fact two different derivatives : the partial derivatives on the both variables, i.e:
And with these, we can get the gradient of the function in a given point:
To calculate these, we must select one of the variables and calculate the derivative of the function while treating the other variable as a constant.
E.g:
-
First we calculate :
We set y as a constant, so that we can consider :
-
Then we calculate :
We set x as a constant, so we can consider :
Then we can get the gradient, etc...
Implementing the two Python functions may be difficult, you may need to use lambdas.
You can use d
to calculate the derivative of a single-variable function and d2
for two-variables functions.
d2
should return, for f
, x0
, y0
given,
.
I will break down Answer
# derivative of a single variable function
def d(f, x0: float) -> float:
h = 1/10000 # enough for our usage
return (f(x0 + h) - f(x0)) / h
# derivative of a two variable function
def d2(f, x0: float, y0: float):
return ( d((lambda x: f(x, y0)), x0), d((lambda y: f(x0, y)), y0) )
d2
:
f
and the pair (x0, y0)
).lambda x: f(x, y0)
is a function in which the y value is set as a constant (y0
in this case). You can think of it as the g function from the previous example.d((lambda x: f(x, y0)), x0)
will calculate the derivative value of the "g" function at x0
.
Now let's talk about the gradient method!
Let's visualize the main principle using another two dimensional example (a single-variable function).
Given point A, we must find the nearest maximum point.
First we need to define what we mean by "near points". In our case, we will consider points that are a distance alpha
from A.
We implemented functions to calculate derivatives and with good reason! We can now calculate the derivative of the function in A.
Now that we have the slope, we can identify the point in the "points near A" that "continues" this slope (in this case, point B).
We can find the point B using this formula:
Then, we will continue until we stop moving (we will discuss this in more detail later).
Note: to find the minimum of the function, replace alpha
by -alpha
.
About the choice of alpha
Our alpha
should not be too big or too small.
If it is too big, the algorithm may miss nearby local maxima and if it is too small the algorithm will be too slow.
Although there isn't a true method to find a good alpha but you can experiment with different values and then choose the best one.
In our case .
And how can we do for two variables functions?
That's the case where we need our gradient.
In this case, we have two variables, and two derivatives, and we are in three dimensions. Therefore, we can use this formula to find B :
You now have all the cards in your hand!
Now we'll use all we've learnt!
We can see that, by definition, this algorithm finds the local maximum near the starting point, rather than the global maximum of the function.
To find the global maximum, we can try on different starting points.
We could select n points at random and run the algorithm on them but how many points should we generate to be sure that we have found the maximum ?
Another approach is to grid the graph with points and run the algorithm on all of them.
Implementing the algorithm for one point
We need to store the x and y coordinates of the last and penultimate points calculated (checking if stagnation is occurring).
To do so, we need an absolute value function and an epsilon
constant to determine whether two points are close enough to be considered a maximum (in our case I chose an espilon = 0.01
).
Answer
x0 = [0, x] # we suppose x, y are defined
y0 = [0, y]
z0 = f(x0[1], y0[1])
while abs(x0[0] - x0[1]) > epsilon and abs(y0[0] - y0[-1]) > epsilon:
nablaf = d2(f, x0[1], y0[1])
x0[0] = x0[1] # we're storing the last points inside a two-sized list
y0[0] = y0[1]
x0[1] = x0[0] + alpha * nablaf[0] # change the plus by a minus if you want a minimum
y0[1] = y0[0] + alpha * nablaf[1]
z0 = f(x0[1], y0[1])
Implementing the algorithm for the others
We can generate a grid of points like this:
XT = np.linspace(-1.2, 1.2, 20)
YT = np.linspace(-1.2, 1.2, 20)
maxs = [0,0,0] # will be useful for storing the maximum
Then it's just a basic nested for loop with a condition at the end of each iteration!
If you want to see where your maximum point is, you can do this:
plt.plot(maxs[0], maxs[1], maxs[2], 'ro')
At the end we have (-1.2120435693285236, 0.4416799874265309, 0.5109070812327089)
Thanks for reading my article!
If you have any questions, ask them in the comments or on my socials!
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.