Python Docs

Pandas for Data Analysis

Pandas provides high-level tools for cleaning, transforming, analyzing, and exploring data using its two core structures: Series(1D) and DataFrame (2D). It is the most widely used library for data wrangling in Python.

Why Pandas?

Pandas makes data tasks simple and fast:

  • Easy loading of CSV, Excel, JSON, SQL
  • Powerful filtering and selection
  • Fast aggregations (groupby)
  • Missing value handling
  • Merges & joins
  • Reshaping: pivot, melt, stack
  • Time series functionality

Creating a DataFrame

A DataFrame is a table-like structure with labeled rows & columns.

Example

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [24, 30, 29],
    'city': ['NY', 'LA', 'SF']
})

print(df)

Inspecting Data

Quick overview functions:

Example

df.head()        # first 5 rows
df.tail()        # last 5 rows
df.info()        # column types
df.describe()    # numeric stats
df.shape         # (rows, columns)

Filtering & Selecting Rows

Pandas makes filtering easy using boolean masks or query.

Examples

df[df['age'] > 25]               # filter rows
df[['name', 'city']]              # select columns
df.loc[0:1, 'name':'age']         # label-based
df.iloc[0:2, 0:2]                 # integer-based

Handling Missing Values

Detect, fill or drop missing entries.

Example

df.isna().sum()
df['age'] = df['age'].fillna(df['age'].median())
df = df.dropna(subset=['name'])

Groupby & Aggregation

Useful for summarizing data by categories.

Example

df.groupby('city')['age'].mean()
df.groupby('city').agg({
    'age': 'mean',
    'name': 'count'
})

Merging & Joining DataFrames

Combine datasets like SQL joins.

Example

df1 = pd.DataFrame({'id':[1,2], 'age':[25,30]})
df2 = pd.DataFrame({'id':[1,2], 'city':['NY','LA']})

merged = df1.merge(df2, on='id')   # inner join
print(merged)

Reshaping Data

Reshape datasets using pivot, melt, stack, and unstack.

Examples

df.pivot(index='city', columns='name', values='age')
pd.melt(df, id_vars=['name'])

Summary

Pandas is essential for data cleaning, wrangling, and exploration. It integrates seamlessly with NumPy, Matplotlib, and scikit-learn to form the core of the Python data science stack.