Linear Regression¶
In this note, I am going to train a linear regression model with gradient decent
estimation.
It might look pretty easy but it's very helpful to increase the understanding of the basics of Deep Learning.
Overview¶
This note will covere:
- Prepare the data
- Loss function, chain rule and its derivative
- Code implementation
Parepare the data¶
Here we are generating 20 data points from a uniform distribution of (0, 1). True model is y = 2x. We added random noise with variance of 0.2.
import numpy
import matplotlib.pyplot as plt
%matplotlib inline
numpy.random.seed(seed=1)
x = numpy.random.uniform(0, 1, 20)
# real model
def f(x): return x * 2
noise_variance = 0.2 # Variance of the gaussian noise
# Gaussian noise error for each sample in x
noise = numpy.random.randn(x.shape[0]) * noise_variance
# Create targets t
t = f(x) + noise
# Plot the target t versus the input x
plt.plot(x, t, 'o', label='t')
# Plot the initial line
plt.plot([0, 1], [f(0), f(1)], 'b-', label='f(x)')
plt.xlabel('$x$', fontsize=15)
plt.ylabel('$t$', fontsize=15)
plt.ylim([0,2])
plt.title('inputs (x) vs targets (t)')
plt.grid()
plt.legend(loc=2)
plt.show()
Loss function, chain rule and its derivative¶
Model can be described as $y=x*w$. Note $t$ is our target data generated with noise. We can then define squared error cost as below: $$\xi = \sum_{i=1}^{N}\left \| t_{i} - y_{i}\right \|^{2} $$
and our goal is to minimize the loss: $$\underset{w}{argmin}\sum_{i}^{N}\left \| t_{i} - y_{i}\right \|^{2}$$
The grandient decent
can be defined as:
$$w(k+1) = w(k) - \Delta w(k)$$
$$\Delta w(k) = \mu\frac{\partial \xi}{\partial w} \;\;\; where \;\mu\; is\; learning\; rate$$
simplely apply chain rule
here:
$$\frac{\partial \xi }{\partial w} = \frac{\partial y}{\partial w} \frac{\partial \xi }{\partial y} $$
we know $$\frac{\partial \xi_{i} }{\partial y_{i}} = \frac{\partial (t_{i}-y_{i})^{2}}{\partial y_{i}}=2(y_{i}-t_{i})$$
and because $y=x*w$: $$\frac{\partial y_{i}}{\partial w_{i}}=x_{i}$$
so combine everything we will get: $$\Delta w = \mu * 2 * \sum_{i=1}^{N}x_{i}(y_{i}-t_{i})$$
Code implementation¶
# Define the neural network function y = x * w
def nn(x, w): return x * w
# Define the cost function
def cost(y, t): return ((t - y)**2).sum()
# define the gradient function. Remember that y = nn(x, w) = x * w
def gradient(w, x, t):
return 2 * x * (nn(x, w) - t)
# define the update function delta w
def delta_w(w_k, x, t, learning_rate):
return learning_rate * gradient(w_k, x, t).sum()
# Set the initial weight parameter
w = 0.1
# Set the learning rate
learning_rate = 0.1
# Start performing the gradient descent updates, and print the weights and cost:
nb_of_iterations = 4 # number of gradient descent updates
w_cost = [(w, cost(nn(x, w), t))] # List to store the weight,costs values
for i in range(nb_of_iterations):
dw = delta_w(w, x, t, learning_rate) # Get the delta w update
w = w - dw # Update the current weight parameter
w_cost.append((w, cost(nn(x, w), t))) # Add weight,cost to list
# Print the final w, and cost
for i in range(0, len(w_cost)):
print('w({}): {:.4f} \t cost: {:.4f}'.format(i, w_cost[i][0], w_cost[i][1]))
As you can see the $w$ is moving toward 2 with decreasing cost/loss.
check gradient decent process¶
# Plot the cost vs the given weight w
# Define a vector of weights for which we want to plot the cost
ws = numpy.linspace(0, 4, num=100) # weight values
cost_ws = numpy.vectorize(lambda w: cost(nn(x, w) , t))(ws) # cost for each weight in ws
# Plot the first 2 gradient descent updates
plt.plot(ws, cost_ws, 'r-') # Plot the error curve
# Plot the updates
for i in range(0, len(w_cost)-2):
w1, c1 = w_cost[i]
w2, c2 = w_cost[i+1]
plt.plot(w1, c1, 'bo')
plt.plot([w1, w2],[c1, c2], 'b-')
plt.text(w1, c1+0.5, '$w({})$'.format(i))
# Show figure
plt.xlabel('$w$', fontsize=15)
plt.ylabel('$\\xi$', fontsize=15)
plt.title('Gradient descent updates plotted on cost function')
plt.grid()
plt.show()
The sample codes in this note come from peterroelants.github.io where providing more details on neural netwrok and deep learning. It's very informative and highly recommanded. Here is more like my personal memo.