Python Docs
Pandas for Data Analysis
Pandas provides high-level tools for cleaning, transforming, analyzing, and exploring data using its two core structures: Series(1D) and DataFrame (2D). It is the most widely used library for data wrangling in Python.
Why Pandas?
Pandas makes data tasks simple and fast:
- Easy loading of CSV, Excel, JSON, SQL
- Powerful filtering and selection
- Fast aggregations (groupby)
- Missing value handling
- Merges & joins
- Reshaping: pivot, melt, stack
- Time series functionality
Creating a DataFrame
A DataFrame is a table-like structure with labeled rows & columns.
Example
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [24, 30, 29],
'city': ['NY', 'LA', 'SF']
})
print(df)Inspecting Data
Quick overview functions:
Example
df.head() # first 5 rows df.tail() # last 5 rows df.info() # column types df.describe() # numeric stats df.shape # (rows, columns)
Filtering & Selecting Rows
Pandas makes filtering easy using boolean masks or query.
Examples
df[df['age'] > 25] # filter rows df[['name', 'city']] # select columns df.loc[0:1, 'name':'age'] # label-based df.iloc[0:2, 0:2] # integer-based
Handling Missing Values
Detect, fill or drop missing entries.
Example
df.isna().sum() df['age'] = df['age'].fillna(df['age'].median()) df = df.dropna(subset=['name'])
Groupby & Aggregation
Useful for summarizing data by categories.
Example
df.groupby('city')['age'].mean()
df.groupby('city').agg({
'age': 'mean',
'name': 'count'
})Merging & Joining DataFrames
Combine datasets like SQL joins.
Example
df1 = pd.DataFrame({'id':[1,2], 'age':[25,30]})
df2 = pd.DataFrame({'id':[1,2], 'city':['NY','LA']})
merged = df1.merge(df2, on='id') # inner join
print(merged)Reshaping Data
Reshape datasets using pivot, melt, stack, and unstack.
Examples
df.pivot(index='city', columns='name', values='age') pd.melt(df, id_vars=['name'])
Summary
Pandas is essential for data cleaning, wrangling, and exploration. It integrates seamlessly with NumPy, Matplotlib, and scikit-learn to form the core of the Python data science stack.