We know that machine learning exists to create programs that can learn to do something, be it distinguishing between simple cat and dog images or recommend the perfect shows to you on Netflix. There are primarily two ways in which machine learning can be performed: supervised and unsupervised learning. In case you need a primer, you can check out our Deep Dive on those learning models here.
Optimization is the cornerstone of every machine learning algorithm. After all, in order to arrive at the most accurate results possible, every algorithm must rely on some optimization technique to minimize error and increase accuracy. In this Deep Dive, we shall look at one of the most popular and simplest optimization algorithms out there – gradient descent.
What is Gradient Descent?
As an optimization algorithm, gradient descent attempts to find values for the parameters of a function such that they minimize the cost function. So, if we assume there is a function f, with two parameters x and y, an optimization algorithm like gradient descent wants to find the optimal values of x and y for which the value of the function becomes a minimum.
In order to visualize this in action, think of a bowl. Notice how it’s curved and has a base that represents the lowest point. Any random position on the surface of this bowl is the cost of the current values of the parameters. Meanwhile, the base of the bowl is the cost of the best set of parameters, since these parameters give us the lowest point possible on the bowl.
The goal is to continue to try different values for the parameters, evaluate their cost and select new parameters that have a slightly better (lower) cost.
Repeating this process enough times will lead to the bottom of the bowl and you will know the values of the parameters that result in the minimum cost.
The Gradient Descent Procedure
You start off with a set of initial values for all of your parameters. While typically initialize with 0.0, you could also start with very small random values.
The cost of the parameters is then evaluated by plugging them into the associated function and calculating the cost, which would look something like this:
parameter = 0.0
cost = function (parameter)
The derivative of the cost is calculated. The derivative refers to the slope of the function at a given point. We need to know the slope so that we know the direction (sign) to move the parameter values in order to get a lower cost on the next iteration. This is what it looks like, with the derivative termed delta:
delta = derivative(cost)
Now that we have the derivative, we can use it to update the value of the parameter. We can also specify a learning rate parameter (alpha) here to adjust the rate at which the parameters change on each update:
coefficient = coefficient – (alpha * delta)
This process is repeated until the cost is 0.0 (or close enough to it). And that is about it. This is why gradient descent is known as such a simple optimization algorithm.
Tips for implementing gradient descent
For each algorithm, there is always a set of best practices and tricks you can use to get the most out of it. The same holds true for gradient descent. Here are three tips you might want to consider the next time you find yourself working with it:
Shaza Fatima Khawaja, Minister of State for IT and Telecom, made it clear on Thursday…
PayPal, the global payment processing company, announced on Friday that it had successfully resolved a…
LAHORE: Punjab government colleges have completed the recruitment of 7,354 teaching interns. The Higher Education…
The Pakistan Engineering Council (PEC) is launching a free six-week online training program on Generative…
The Pakistan Software Houses Association (P@SHA) has raised alarms about the severe impact of the…
WhatsApp is rolling out a new feature in its latest Android beta version, allowing users…