Python Docs

NumPy for Data Science

NumPy provides efficient n-dimensional arrays, vectorized math, and broadcasting. It is the foundation of most data science and scientific computing libraries in Python.

Why NumPy?

Compared to plain Python lists, NumPy arrays offer:

  • Fast vectorized operations (no manual loops needed).
  • Fixed-type homogeneous arrays that are memory efficient.
  • Powerful indexing, slicing, and reshaping capabilities.
  • Broadcasting to apply operations between arrays of different shapes.
  • Built-in linear algebra, random sampling, and statistical functions.

Basics

Create arrays, reshape them, and perform vectorized and broadcasting operations.

Example

import numpy as np

np.random.seed(42)
a = np.array([1, 2, 3])
b = np.arange(6).reshape(2, 3)

print(a + 10)           # vectorized
print(b.mean(axis=0))   # column means
print(b * a)            # broadcasting (2x3 * 1x3)

What this shows:

  • a + 10 adds 10 to every element (vectorized operation).
  • b.mean(axis=0) computes mean along columns.
  • b * a multiplies each row of b by a using broadcasting.

Linear Algebra for Data Science

NumPy is heavily used in linear models, PCA, and optimization. The following example solves for regression coefficients using the normal equation.

Example

x = np.random.randn(100, 3)
w = np.array([0.2, -0.5, 1.0])
y = x @ w + 0.1

XtX = x.T @ x
Xty = x.T @ y
beta = np.linalg.solve(XtX, Xty)
print(beta)

Explanation:

  • x is a design matrix with 100 samples and 3 features.
  • w are the true coefficients, and y = Xw + 0.1 adds a bias term.
  • beta = (XᵀX)⁻¹ Xᵀy is solved using np.linalg.solve.
  • The printed beta should be close to the true w values.

Boolean Indexing & Basic Statistics

NumPy makes it easy to filter data and compute descriptive statistics such as mean, standard deviation, and percentiles.

Example

data = np.random.randn(1000)

print('Mean:', data.mean())
print('Std:', data.std())
print('95th percentile:', np.percentile(data, 95))

# Filter values > 1
high = data[data > 1]
print('Count > 1:', high.size)

Summary

NumPy is the backbone of numerical computing in Python. Mastering arrays, broadcasting, and linear algebra in NumPy will make it much easier to understand and use higher-level libraries like Pandas, scikit-learn, and TensorFlow.