We know that machine learning exists to create programs that can learn to do something, be it distinguishing between simple cat and dog images or recommend the perfect shows to you on Netflix. There are primarily two ways in which machine learning can be performed: supervised and unsupervised learning. In case you need a primer, you can check out our Deep Dive on those learning models here.
Optimization is the cornerstone of every machine learning algorithm. After all, in order to arrive at the most accurate results possible, every algorithm must rely on some optimization technique to minimize error and increase accuracy. In this Deep Dive, we shall look at one of the most popular and simplest optimization algorithms out there – gradient descent.
What is Gradient Descent?
As an optimization algorithm, gradient descent attempts to find values for the parameters of a function such that they minimize the cost function. So, if we assume there is a function f, with two parameters x and y, an optimization algorithm like gradient descent wants to find the optimal values of x and y for which the value of the function becomes a minimum.
In order to visualize this in action, think of a bowl. Notice how it’s curved and has a base that represents the lowest point. Any random position on the surface of this bowl is the cost of the current values of the parameters. Meanwhile, the base of the bowl is the cost of the best set of parameters, since these parameters give us the lowest point possible on the bowl.
The goal is to continue to try different values for the parameters, evaluate their cost and select new parameters that have a slightly better (lower) cost.
Repeating this process enough times will lead to the bottom of the bowl and you will know the values of the parameters that result in the minimum cost.
The Gradient Descent Procedure
You start off with a set of initial values for all of your parameters. While typically initialize with 0.0, you could also start with very small random values.
The cost of the parameters is then evaluated by plugging them into the associated function and calculating the cost, which would look something like this:
parameter = 0.0
cost = function (parameter)
The derivative of the cost is calculated. The derivative refers to the slope of the function at a given point. We need to know the slope so that we know the direction (sign) to move the parameter values in order to get a lower cost on the next iteration. This is what it looks like, with the derivative termed delta:
delta = derivative(cost)
Now that we have the derivative, we can use it to update the value of the parameter. We can also specify a learning rate parameter (alpha) here to adjust the rate at which the parameters change on each update:
coefficient = coefficient – (alpha * delta)
This process is repeated until the cost is 0.0 (or close enough to it). And that is about it. This is why gradient descent is known as such a simple optimization algorithm.
Tips for implementing gradient descent
For each algorithm, there is always a set of best practices and tricks you can use to get the most out of it. The same holds true for gradient descent. Here are three tips you might want to consider the next time you find yourself working with it:
RAWALPINDI: The chairman of the Punjab Education Foundation, Malik Shoaib Awan, stated on Monday that…
Pakistan has taken a significant step towards addressing sexual violence and abuse with the introduction…
KARACHI: The State Bank of Pakistan (SBP) is anticipating $500 million from the Asian Development…
The Sindh Assembly was informed that over 28,500 employees of the provincial government were unlawfully…
The Monetary Policy Committee (MPC) of the State Bank of Pakistan decided to cut the…
The Securities and Exchange Commission of Pakistan (SECP) is organizing the Pakistan Startup Summit, which…
Leave a Comment