05 December 2016

Numpy - Get Started

What is Numpy?

NumPy is the fundamental package for scientific computing with Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more. It allows data sciebtists to easily implement methmatically concept into code. Most importantly, Numpy arrays are really fast and friendly on CPU and RAM.

Below is a note for basic usage of numpy.

In [1]:
import numpy as np

Array vs List

As you can see below, you can operate A with mathmatically operations like addition, extraction, power, squart, log etc.
Python List L cannot. It will need to do looping on each element to achieve the same and hence it's way too slow.

In [3]:
# Python List
L = [1, 2, 3]
A = np.array([1, 2, 3])

# You can operate A with mathmatically operation. L cannot. 
print(2*A)
print(A**2)
print(np.sqrt(A))
print(np.log(A))
[2 4 6]
[1 4 9]
[ 1.          1.41421356  1.73205081]
[ 0.          0.69314718  1.09861229]

Dot Product

An important operation for array is dot product as it's basic for matrix operation. Below are the various ways to achieve it.
In code below, dot product for $a$ and $b$ should be like $a\cdot b^{T}= 1*3+2*4=11$.

In [29]:
a = np.array([1, 2])
b = np.array([3, 4])

# dot product in different ways: 
np.dot(a, b) # 11
np.inner(a, b) # 11. dot product is also inner product
a.dot(b) # 11
b.dot(a) # 11
(a*b).sum() # 11

# you can use python looping to achieve the same, but it will be extremely slow when data is huge.
dot = 0
for i, j in zip(a, b):
    dot += i*j
print(dot) # 11
11

Matrix

We can still use np.array to product Matrix. Below is the code for various operations on matrix.

In [64]:
# Note you can also use np.matrix but np.array is recommanded officially. 
M = np.array([[1, 2], [3, 4]])
# extract element
M[0][0] # 1 ; this is the same as python list
M[0, 0] # 1

# matrix transport
M.T

# get shap
M.shap 

# Matrix product. same as dot product
M.dot(M)
np.inner(M, M)

# inverse matrix
np.linalg.inv(M)

# determination
np.linalg.det(M)

# diagonal element
np.diag(M) # [1, 4]

# note this will rerurn diagonal matrix
np.diag([1, 4]) # [[1, 0], [0, 4]]

# trace
np.diag(M).sum()
np.trace(M)

# product various 10x10 matrix 
# all zero
Z = np.zeros((10, 10))

# all one
O = np.ones((10, 10))

# random from uniform distribution
R = np.random.random((10, 10))

# random from normal distribution(0, 1)
# Note randn take each dimension as individual argument, others use turple
N = np.random.randn(10, 10)
print(N.mean())
print(N.var())
-0.11553011379
0.991388729704