E-Commerce data with Neural network¶
In this note, I am going to use neural network to analyze a e-commerce data. The data is from Udemy: Deep Learning with Python lecture. The label will have multiple class. The model will have 1 hidden layer with 5 hidden units and use logistic function
for activation. The output layer will be activated by softmax
.
Process the Data¶
Import required library and the data. Print out first few rows to confirm the data structure.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # Plotting library
from sklearn.utils import shuffle
# Allow matplotlib to plot inside this notebook
%matplotlib inline
# Set the seed of the numpy random number generator so that the result is reproducable
np.random.seed(seed=1)
# check the data first
df = pd.read_csv('../data/ecommerce_data.csv')
df.head()
# 4 unique values for time_of_day
df.time_of_day.unique()
The 2nd and 3rd column is numeric and need to be normalized. 1st, 4th and 5th colums are categorized variable. 5th column time_of_day
will need to be transformed to 4 one-hot encoding variables. Last column user_action
is the label. Code below will tranform the raw data into the format for training.
def get_data():
df = pd.read_csv('../data/ecommerce_data.csv')
data = df.as_matrix()
X = data[:, :-1] # last column is label
Y = data[:, -1]
# Normalization for 2nd and 3rd columns
X[:, 1] = (X[:, 1] - X[: ,1].mean())/X[:, 1].std()
X[:, 2] = (X[:, 2] - X[: ,2].mean())/X[:, 2].std()
# handle time_of_day
R, C = X.shape
# we will have 4 more columns for each value in time_of_day (4 unique values)
X2 = np.zeros((R, C+3)) # initialized as zero
Z = np.zeros((R, 4))
Z[np.arange(R), X[:, C-1].astype(np.int32)] = 1
# copy data from X except time_of_day
X2[:, 0:(C-1)] = X[:, 0:(C-1)]
# add 4 dummy variables for time_of_day
X2[:, (C-1):(C+3)] = Z
return X2, Y
# Produce multi-class indicator for Y
def y2indicator(y, K):
N = len(y)
ind = np.zeros((N, K))
for i in range(N):
ind[i, y[i]] = 1
return ind
Forward Step, Cost Function¶
Forward step
will involve softmax and logistic function. For the mathmatical details, see Itetsu Blog: Neural-Network Cost-Function.
We can at first produce functions for producing prediction as below.
def softmax(a):
expA = np.exp(a)
return expA / expA.sum(axis=1, keepdims=True)
def forward(X, W1, b1, W2, b2):
Z = np.tanh(X.dot(W1) + b1)
return softmax(Z.dot(W2) + b2), Z # also return cost of hidden layer to calculate derivatives
def predict(P_Y_given_X):
return np.argmax(P_Y_given_X, axis=1)
def classification_rate(Y, P):
return np.mean(Y == P)
def cross_entropy(T, pY):
return -np.mean(T*np.log(pY))
Below will train a neural network model with 1 hidden layer with logistic function
and output layer activating with softmax function
.
# create train data
X, Y = get_data()
X, Y = shuffle(X, Y)
Y = Y.astype(np.int32)
M = 5 # n of hidden units
D = X.shape[1] # n of inputs
K = len(set(Y)) # n of class/ output nodes
# training data
Xtrain = X[:-100]
Ytrain = Y[:-100]
Ytrain_ind = y2indicator(Ytrain, K)
# test/validation data
Xtest = X[-100:]
Ytest = Y[-100:]
Ytest_ind = y2indicator(Ytest, K)
# initialize weight
W1 = np.random.randn(D, M)
b1 = np.zeros(M)
W2 = np.random.randn(M, K)
b2 = np.zeros(K)
Gradient Decent with Backpropgation¶
For the mathmatical details, see Itetsu Blog: Neural-Network Backward-propagation.
# start training
train_costs = []
test_costs = []
learning_rate = 0.001
for i in range(10000):
pYtrain, Ztrain = forward(Xtrain, W1, b1, W2, b2)
pYtest, Ztest = forward(Xtest, W1, b1, W2, b2)
ctrain = cross_entropy(Ytrain_ind, pYtrain)
ctest = cross_entropy(Ytest_ind, pYtest)
train_costs.append(ctrain)
test_costs.append(ctest)
W2 -= learning_rate*Ztrain.T.dot(pYtrain - Ytrain_ind)
b2 -= learning_rate*(pYtrain - Ytrain_ind).sum(axis=0)
dZ = (pYtrain - Ytrain_ind).dot(W2.T) * (1- Ztrain*Ztrain)
W1 -= learning_rate*Xtrain.T.dot(dZ)
b1 -= learning_rate*dZ.sum(axis=0)
if i % 1000 == 0:
print(i, ctrain, ctest)
print("Final train classification_rate:", classification_rate(Ytrain, predict(pYtrain)))
print("Final test classification_rate:", classification_rate(Ytest, predict(pYtest)))
legend1, = plt.plot(train_costs, label='train cost')
legend2, = plt.plot(test_costs, label='test cost')
plt.legend([legend1, legend2])
plt.show()