Examples of Data Analysis with Pandas: 3 Practical Examples You’ll Actually Use
Let’s start with the kind of thing almost every analyst touches: sales data. If you’re looking for examples of data analysis with pandas: 3 practical examples, sales is always near the top of the list because it combines dates, categories, and numeric metrics in a very natural way.
Imagine a CSV exported from your e‑commerce platform:
order_id,order_date,region,product,category,quantity,unit_price,discount,customer_id
1001,2024-01-05,West,Running Shoes,Sports,2,89.99,0.10,501
1002,2024-01-05,East,T-Shirt,Apparel,4,19.99,0.00,502
1003,2024-01-06,West,Socks,Apparel,6,4.99,0.05,503
...
Load and prep the data:
import pandas as pd
orders = pd.read_csv("orders_2024.csv", parse_dates=["order_date"])
## Always a good first sanity check
print(orders.info())
print(orders.head())
## Compute a few basic metrics
orders["revenue"] = orders["quantity"] * orders["unit_price"] * (1 - orders["discount"])
Sales example of grouping and summarizing with pandas
This is one of the best examples of data analysis with pandas because it shows how quickly you can go from raw rows to management-ready summaries.
Daily revenue by region:
daily_region = (
orders
.groupby(["order_date", "region"], as_index=False)["revenue"]
.sum()
.sort_values(["order_date", "region"])
)
Top 5 products by total revenue:
top_products = (
orders
.groupby("product", as_index=False)["revenue"]
.sum()
.sort_values("revenue", ascending=False)
.head(5)
)
Month‑over‑month growth:
orders["year_month"] = orders["order_date"].dt.to_period("M").astype(str)
monthly = orders.groupby("year_month", as_index=False)["revenue"].sum()
monthly["mom_growth"] = monthly["revenue"].pct_change()
Already we’ve hit several real examples of data analysis with pandas:
- Creating derived metrics (
revenue,year_month) - Grouping and aggregating by time and category
- Calculating growth rates for reporting
For context on how sales and economic data are typically structured and reported, the U.S. Census Bureau’s retail data pages are a good reality check on what real-world tables look like: https://www.census.gov/retail.html
Adding customer-level metrics: another example of pandas in action
Now add a customer table:
customer_id,signup_date,segment,state
501,2023-11-01,New,CA
502,2022-07-15,Loyal,NY
503,2024-01-02,New,TX
...
customers = pd.read_csv("customers.csv", parse_dates=["signup_date"])
## Join orders with customers
orders_cust = orders.merge(customers, on="customer_id", how="left")
## Revenue by customer segment and region
segment_region = (
orders_cust
.groupby(["segment", "region"], as_index=False)["revenue"]
.sum()
.sort_values("revenue", ascending=False)
)
This gives you a clean example of data analysis with pandas where you:
- Merge multiple datasets
- Build segment-level summaries
- Prepare inputs for dashboards or presentations
If you work in BI, this is probably the closest to your day-to-day reality.
2. Customer churn: examples of data analysis with pandas for retention
The second of our examples of data analysis with pandas: 3 practical examples is customer churn — a favorite in SaaS, telecom, and subscription businesses. You want to know who is leaving, when, and what patterns predict it.
Assume you have a subscription dataset:
customer_id,start_date,end_date,plan,monthly_fee,country
1,2023-01-10,2024-02-01,Basic,19.00,US
2,2022-06-01,,Pro,49.00,US
3,2023-09-15,2023-12-15,Basic,19.00,UK
...
Load and prep:
subs = pd.read_csv("subscriptions.csv", parse_dates=["start_date", "end_date"])
## Define churn flag as of a reference date
as_of = pd.Timestamp("2024-07-01")
subs["is_churned"] = subs["end_date"].notna() & (subs["end_date"] <= as_of)
Churn rate by month: a very common example of pandas time-series analysis
You can compute churn by the month the customer started or the month they ended. Here’s churn by end month:
churned = subs[subs["is_churned"]].copy()
churned["churn_month"] = churned["end_date"].dt.to_period("M").astype(str)
churn_by_month = (
churned
.groupby("churn_month", as_index=False)["customer_id"]
.nunique()
.rename(columns={"customer_id": "churned_customers"})
)
Now add active customers per month to get a churn rate. One simple approach is to build a monthly panel:
## Create a month range for 2023-2024
month_index = pd.period_range("2023-01", "2024-12", freq="M")
rows = []
for _, row in subs.iterrows():
start = row["start_date"].to_period("M")
end = (row["end_date"] or as_of).to_period("M")
active_months = month_index[(month_index >= start) & (month_index <= end)]
for m in active_months:
rows.append({
"customer_id": row["customer_id"],
"month": str(m),
"plan": row["plan"],
"country": row["country"],
})
panel = pd.DataFrame(rows)
active_by_month = (
panel
.groupby("month", as_index=False)["customer_id"]
.nunique()
.rename(columns={"customer_id": "active_customers"})
)
churn_metrics = churn_by_month.merge(active_by_month, on="month", how="left")
churn_metrics["churn_rate"] = churn_metrics["churned_customers"] / churn_metrics["active_customers"]
This is one of those real examples of data analysis with pandas where you reshape data quite a bit to get a metric leadership actually cares about.
Feature engineering for a churn model
You can push this further by building features for a simple churn model in scikit‑learn. Pandas handles the data wrangling; the model just consumes arrays.
Suppose you have a usage log:
customer_id,event_time,event_type
1,2024-01-03T10:15:00,login
1,2024-01-03T10:16:00,view_page
2,2024-01-05T09:00:00,login
...
usage = pd.read_csv("usage_logs.csv", parse_dates=["event_time"])
usage["event_date"] = usage["event_time"].dt.date
## Daily activity per customer
usage_daily = (
usage
.groupby(["customer_id", "event_date"], as_index=False)["event_type"]
.count()
.rename(columns={"event_type": "events_per_day"})
)
## Aggregate to customer-level features
usage_features = (
usage_daily
.groupby("customer_id", as_index=False)
.agg(
avg_events_per_day=("events_per_day", "mean"),
active_days=("event_date", "nunique"),
)
)
## Combine with subs
model_data = subs.merge(usage_features, on="customer_id", how="left")
Now you have a tidy table with churn labels and behavioral features. This is a textbook example of data analysis with pandas feeding into machine learning.
If you want to compare your churn metrics or data practices to industry research, the U.S. Federal Communications Commission and other agencies often publish churn and subscription statistics; for example: https://www.fcc.gov/economics-analytics
3. Public-health style time series: examples include real-world data patterns
The third of our examples of data analysis with pandas: 3 practical examples borrows from public-health style analysis. Public data is messy, but it’s also where pandas shines.
Imagine you’re working with a simplified daily case count dataset inspired by public sources (for real data, agencies like the CDC publish detailed time series: https://data.cdc.gov/):
date,state,new_cases,new_hospitalizations,new_deaths
2024-01-01,CA,120,5,0
2024-01-01,NY,85,4,1
2024-01-02,CA,140,6,0
...
Load and clean:
health = pd.read_csv("health_daily.csv", parse_dates=["date"])
## Handle obvious data issues
health = health.dropna(subset=["state", "date"]) # drop rows missing key identifiers
for col in ["new_cases", "new_hospitalizations", "new_deaths"]:
health[col] = health[col].clip(lower=0) # no negative counts
Rolling averages and incidence rates
Public-health analysts care about trends more than noisy day-to-day swings. A 7‑day rolling average is one of the best examples of data analysis with pandas for time series.
health = health.sort_values(["state", "date"])
health["cases_7d_avg"] = (
health
.groupby("state")["new_cases"]
.transform(lambda s: s.rolling(window=7, min_periods=1).mean())
)
If you have a population table:
state,population
CA,38970000
NY,19450000
...
pop = pd.read_csv("state_population.csv")
health_pop = health.merge(pop, on="state", how="left")
health_pop["cases_per_100k"] = health_pop["new_cases"] / health_pop["population"] * 100_000
This is a clean example of data analysis with pandas where you:
- Combine surveillance-style data with demographic data
- Normalize metrics (per 100,000 people) for better comparisons
- Use rolling windows to smooth volatile series
For more background on why public-health metrics are often expressed per 100,000 population, the CDC’s data resources are helpful: https://www.cdc.gov/datastatistics/index.html
Comparing regions and detecting spikes
You can use pandas to find states with unusual spikes relative to their own history.
## Compute baseline as 28-day rolling median
health_pop["cases_28d_median"] = (
health_pop
.groupby("state")["new_cases"]
.transform(lambda s: s.rolling(window=28, min_periods=7).median())
)
health_pop["spike_ratio"] = health_pop["new_cases"] / (health_pop["cases_28d_median"] + 1e-6)
## Flag potential spikes where today's cases are > 3x recent median
spikes = health_pop[health_pop["spike_ratio"] > 3]
This pattern — calculate a baseline, then flag outliers — shows up in fraud detection, anomaly detection in IoT data, and more. It’s another one of those real examples of data analysis with pandas that generalizes well beyond health.
Pulling it together: more real-world examples pandas handles well
So far we’ve walked through examples of data analysis with pandas: 3 practical examples in sales, churn, and public health. In day-to-day work, you’ll see variations on these themes everywhere. A few more quick scenarios pandas handles nicely:
- Marketing attribution: joining ad impressions, clicks, and conversions across channels, then computing cost per acquisition and return on ad spend.
- Finance and budgeting: comparing actuals vs. budget by department and month, building variance tables, and computing year-to-date metrics.
- Operations and logistics: analyzing delivery times, on‑time performance, and bottlenecks by route or warehouse, using groupby and quantiles.
- Survey analysis: cleaning Likert-scale survey data, handling missing responses, and computing satisfaction scores by demographic group.
- Healthcare analytics: summarizing patient encounters, readmission rates, and length-of-stay statistics by facility or diagnosis (for an idea of how healthcare data is structured, see examples from NIH: https://www.nlm.nih.gov/).
These are not toy problems. They’re exactly the kinds of examples of data analysis with pandas that show up in analytics, data science, and even traditional BI roles.
FAQ: common questions about pandas data analysis examples
What are some real examples of data analysis with pandas in business?
Common business examples of data analysis with pandas include:
- Building monthly revenue and margin reports from transaction data
- Tracking customer churn and retention by cohort
- Analyzing marketing campaign performance by channel and creative
- Forecasting inventory needs based on historical demand patterns
All of these rely on the same core pandas patterns: cleaning, joining, grouping, and aggregating.
Can you give an example of using pandas with machine learning?
Yes. One popular example of this is using pandas to prepare a churn dataset: you join subscription records with product usage logs, engineer features like avg_events_per_day or days_since_last_login, and then pass the resulting DataFrame to a scikit‑learn model such as LogisticRegression. Pandas does the data wrangling; scikit‑learn does the modeling.
Are these examples of data analysis with pandas suitable for beginners?
They are approachable if you already know basic Python syntax. Each of the examples of data analysis with pandas: 3 practical examples here uses operations you’ll see constantly in real work: read_csv, merge, groupby, agg, rolling, and simple arithmetic. If you can get comfortable with those, you can handle a surprising amount of real-world analysis.
Where can I find real datasets to practice these pandas examples?
You can practice these real examples using:
- Public health datasets from the CDC: https://data.cdc.gov/
- Educational and social datasets from U.S. government portals: https://data.gov/
- Open research datasets shared by universities like Harvard: https://dataverse.harvard.edu/
Download a CSV, open a notebook, and try to recreate one of the examples of data analysis with pandas from this article using that real data.
In practice, the gap between toy tutorials and real work is all about messy joins, odd edge cases, and business-specific metrics. Working through these examples of data analysis with pandas: 3 practical examples will put you much closer to how analysts and data scientists actually use pandas in 2024 and 2025.
Related Topics
Examples of Data Analysis with Pandas: 3 Practical Examples You’ll Actually Use
Practical examples of data visualization with Matplotlib in Python
Practical examples of examples of basic data types in Python
8 examples of working with dictionaries in Python – 3 practical examples you’ll use every day
Examples of Context Managers in Python: 3 Practical Patterns You’ll Actually Use
Examples of error handling in Python: practical examples for real projects
Explore More Python Code Snippets
Discover more examples and insights in this category.
View All Python Code Snippets