Python Docs
NumPy for Data Science
NumPy provides efficient n-dimensional arrays, vectorized math, and broadcasting. It is the foundation of most data science and scientific computing libraries in Python.
Why NumPy?
Compared to plain Python lists, NumPy arrays offer:
- Fast vectorized operations (no manual loops needed).
- Fixed-type homogeneous arrays that are memory efficient.
- Powerful indexing, slicing, and reshaping capabilities.
- Broadcasting to apply operations between arrays of different shapes.
- Built-in linear algebra, random sampling, and statistical functions.
Basics
Create arrays, reshape them, and perform vectorized and broadcasting operations.
Example
import numpy as np np.random.seed(42) a = np.array([1, 2, 3]) b = np.arange(6).reshape(2, 3) print(a + 10) # vectorized print(b.mean(axis=0)) # column means print(b * a) # broadcasting (2x3 * 1x3)
What this shows:
a + 10adds 10 to every element (vectorized operation).b.mean(axis=0)computes mean along columns.b * amultiplies each row ofbbyausing broadcasting.
Linear Algebra for Data Science
NumPy is heavily used in linear models, PCA, and optimization. The following example solves for regression coefficients using the normal equation.
Example
x = np.random.randn(100, 3) w = np.array([0.2, -0.5, 1.0]) y = x @ w + 0.1 XtX = x.T @ x Xty = x.T @ y beta = np.linalg.solve(XtX, Xty) print(beta)
Explanation:
xis a design matrix with 100 samples and 3 features.ware the true coefficients, andy = Xw + 0.1adds a bias term.beta = (XᵀX)⁻¹ Xᵀyis solved usingnp.linalg.solve.- The printed
betashould be close to the truewvalues.
Boolean Indexing & Basic Statistics
NumPy makes it easy to filter data and compute descriptive statistics such as mean, standard deviation, and percentiles.
Example
data = np.random.randn(1000)
print('Mean:', data.mean())
print('Std:', data.std())
print('95th percentile:', np.percentile(data, 95))
# Filter values > 1
high = data[data > 1]
print('Count > 1:', high.size)Summary
NumPy is the backbone of numerical computing in Python. Mastering arrays, broadcasting, and linear algebra in NumPy will make it much easier to understand and use higher-level libraries like Pandas, scikit-learn, and TensorFlow.