Python Docs

NumPy Tutorial

Introduction to NumPy

NumPy (Numerical Python) is a fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

Why NumPy?

  • Powerful N-dimensional array object
  • Broadcasting functions for array operations
  • Tools for integrating C/C++ and Fortran code
  • Linear algebra, Fourier transform, and random number capabilities
  • Much faster than traditional Python lists for numerical operations

Installation

Install NumPy using pip:

pip install numpy

Import NumPy in your Python code:

import numpy as np

NumPy Arrays

Creating Arrays

NumPy arrays can be created in several ways:

From Python Lists

import numpy as np

# 1D array
arr1d = np.array([1, 2, 3, 4, 5])
print(arr1d)  # [1 2 3 4 5]

# 2D array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2d)
# [[1 2 3]
#  [4 5 6]]

# 3D array
arr3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(arr3d.shape)  # (2, 2, 2)

Using Built-in Functions

# Array of zeros
zeros = np.zeros((3, 4))
print(zeros)

# Array of ones
ones = np.ones((2, 3))
print(ones)

# Array with a range of values
range_arr = np.arange(0, 10, 2)
print(range_arr)  # [0 2 4 6 8]

# Array with evenly spaced values
linspace_arr = np.linspace(0, 1, 5)
print(linspace_arr)  # [0.   0.25 0.5  0.75 1.  ]

# Identity matrix
identity = np.eye(3)
print(identity)

# Array filled with a constant value
full_arr = np.full((2, 3), 7)
print(full_arr)

Array Attributes

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.shape)     # (2, 3) - dimensions
print(arr.ndim)      # 2 - number of dimensions
print(arr.size)      # 6 - total number of elements
print(arr.dtype)     # dtype('int64') - data type
print(arr.itemsize)  # 8 - size of each element in bytes

Array Indexing and Slicing

Basic Indexing

arr = np.array([1, 2, 3, 4, 5])

print(arr[0])    # 1 - first element
print(arr[-1])   # 5 - last element
print(arr[1:4])  # [2 3 4] - slice

2D Array Indexing

arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(arr2d[0, 1])      # 2 - element at row 0, col 1
print(arr2d[1])         # [4 5 6] - entire row
print(arr2d[:, 1])      # [2 5 8] - entire column
print(arr2d[0:2, 1:3])  # [[2 3] [5 6]] - subarray

Boolean Indexing

arr = np.array([1, 2, 3, 4, 5])

# Get elements greater than 3
mask = arr > 3
print(arr[mask])  # [4 5]

# One-liner
print(arr[arr > 3])  # [4 5]

Array Operations

Arithmetic Operations

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b)    # [5 7 9]
print(a - b)    # [-3 -3 -3]
print(a * b)    # [4 10 18]
print(a / b)    # [0.25 0.4  0.5]
print(a ** 2)   # [1 4 9]

Universal Functions (ufuncs)

arr = np.array([1, 4, 9, 16])

print(np.sqrt(arr))      # [1. 2. 3. 4.]
print(np.exp(arr))       # Exponential
print(np.log(arr))       # Natural logarithm
print(np.sin(arr))       # Sine
print(np.cos(arr))       # Cosine

Aggregation Functions

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(np.sum(arr))           # 21 - sum of all elements
print(np.sum(arr, axis=0))   # [5 7 9] - sum along columns
print(np.sum(arr, axis=1))   # [6 15] - sum along rows

print(np.mean(arr))          # 3.5 - mean
print(np.std(arr))           # Standard deviation
print(np.min(arr))           # 1 - minimum
print(np.max(arr))           # 6 - maximum
print(np.argmax(arr))        # 5 - index of maximum

Broadcasting

Broadcasting allows NumPy to work with arrays of different shapes during arithmetic operations:

# Scalar and array
arr = np.array([1, 2, 3])
print(arr + 10)  # [11 12 13]

# 1D array and 2D array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
arr1d = np.array([10, 20, 30])
print(arr2d + arr1d)
# [[11 22 33]
#  [14 25 36]]

Array Reshaping

arr = np.arange(12)
print(arr)  # [0 1 2 3 4 5 6 7 8 9 10 11]

# Reshape to 3x4
reshaped = arr.reshape(3, 4)
print(reshaped)
# [[0  1  2  3]
#  [4  5  6  7]
#  [8  9 10 11]]

# Reshape to 2x6
reshaped2 = arr.reshape(2, 6)
print(reshaped2)

# Flatten back to 1D
flattened = reshaped.flatten()
print(flattened)

# Transpose
transposed = reshaped.T
print(transposed.shape)  # (4, 3)

Linear Algebra

# Matrix multiplication
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

# Dot product
result = np.dot(a, b)
print(result)
# [[19 22]
#  [43 50]]

# Or use @ operator
result = a @ b
print(result)

# Determinant
det = np.linalg.det(a)
print(det)  # -2.0

# Inverse
inv = np.linalg.inv(a)
print(inv)

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(a)
print(eigenvalues)
print(eigenvectors)

Random Number Generation

# Random integers
random_ints = np.random.randint(0, 10, size=5)
print(random_ints)

# Random floats between 0 and 1
random_floats = np.random.random(5)
print(random_floats)

# Normal distribution
normal = np.random.normal(loc=0, scale=1, size=5)
print(normal)

# Random choice from array
arr = np.array([1, 2, 3, 4, 5])
choice = np.random.choice(arr, size=3)
print(choice)

# Set seed for reproducibility
np.random.seed(42)
print(np.random.random(3))

Stacking and Splitting Arrays

Stacking

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Vertical stack
vstacked = np.vstack((a, b))
print(vstacked)
# [[1 2 3]
#  [4 5 6]]

# Horizontal stack
hstacked = np.hstack((a, b))
print(hstacked)  # [1 2 3 4 5 6]

# Column stack
colstacked = np.column_stack((a, b))
print(colstacked)
# [[1 4]
#  [2 5]
#  [3 6]]

Splitting

arr = np.arange(12)

# Split into 3 equal parts
split = np.array_split(arr, 3)
print(split)  # [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([8, 9, 10, 11])]

# Split 2D array
arr2d = np.arange(12).reshape(3, 4)
vsplit_result = np.vsplit(arr2d, 3)
print(vsplit_result)

Practical Examples

Example 1: Statistical Analysis

# Simulate exam scores
np.random.seed(42)
scores = np.random.normal(75, 10, 100)

print(f"Mean: {np.mean(scores):.2f}")
print(f"Median: {np.median(scores):.2f}")
print(f"Std Dev: {np.std(scores):.2f}")
print(f"Min: {np.min(scores):.2f}")
print(f"Max: {np.max(scores):.2f}")

# Students who passed (>= 60)
passed = scores[scores >= 60]
print(f"Pass rate: {len(passed)/len(scores)*100:.1f}%")

Example 2: Image Processing

# Create a simple 5x5 grayscale image
image = np.random.randint(0, 256, size=(5, 5))
print("Original image:")
print(image)

# Normalize to 0-1 range
normalized = image / 255.0
print("\nNormalized:")
print(normalized)

# Apply threshold
threshold = 128
binary = np.where(image > threshold, 255, 0)
print("\nBinary (threshold=128):")
print(binary)

Best Practices

  • Use vectorized operations: Avoid loops when possible; use NumPy's built-in functions.
  • Preallocate arrays: Create arrays with known size using np.zeros() or np.empty().
  • Use appropriate data types: Specify dtype to save memory (e.g., np.int32 vs np.int64).
  • Avoid copying: Use views instead of copies when possible.
  • Use broadcasting: Leverage broadcasting instead of manual loops.

Summary

NumPy is the foundation of scientific computing in Python. Its powerful array operations, mathematical functions, and linear algebra capabilities make it essential for data analysis, machine learning, and scientific research. Master NumPy to work efficiently with numerical data.