Guide

Advanced Data Transformation And CleaningDeep dive

Create a Pivot Table from Excel with Pandas

Create a pivot table from an Excel file with pandas: read the workbook, aggregate with pivot_table using index, columns, values and aggfunc, then export the result.

To create a pivot table from an Excel file with pandas, read the workbook with pd.read_excel(), summarize it with pd.pivot_table(), and write the result back with to_excel(). The recipe below is fully runnable: the first block writes a sample workbook so the read has something to open.

Mapping source columns to pivot_table arguments Region maps to the index argument, Month maps to columns, Revenue maps to values, and the chosen aggfunc such as sum combines the values within each cell. Source columns Region Month Revenue aggfunc="sum" pivot_table() index="Region" columns="Month" values="Revenue" aggfunc="sum"

Create a sample workbook

Python
import pandas as pd

df_seed = pd.DataFrame({
    "Region": ["North", "North", "South", "South", "West", "West"],
    "Sales_Rep": ["Ana", "Ben", "Cara", "Dan", "Eve", "Finn"],
    "Month": ["Jan", "Feb", "Jan", "Feb", "Jan", "Feb"],
    "Revenue": [12000, 9000, 8000, 7500, 6000, 5500],
    "Units": [120, 90, 80, 75, 60, 55],
})
df_seed.to_excel("source_data.xlsx", sheet_name="Q1", index=False)

Build and export the pivot

Python
# 1. Load the workbook
df = pd.read_excel("source_data.xlsx", engine="openpyxl")

# 2. Build the pivot table
pivot = pd.pivot_table(
    df,
    values=["Revenue", "Units"],
    index=["Region", "Sales_Rep"],
    columns="Month",
    aggfunc={"Revenue": "sum", "Units": "mean"},
    fill_value=0,
    margins=True,
    margins_name="Grand Total",
)

# 3. Export
pivot.to_excel("report_pivot.xlsx", sheet_name="Q1_Summary")
print(pivot)
  • values — the metric columns to aggregate.
  • index — the column(s) that become the pivot's rows.
  • columns — the column whose values spread across the pivot's columns.
  • aggfunc — how to combine rows that fall in the same cell. A dict applies a different function per metric.
  • fill_value — what to put in cells with no matching rows (here 0 instead of NaN).
  • margins=True — adds a totals row and column labeled by margins_name.

Map Excel pivot fields to pandas

Excel pivot UIpandas argument
Rowsindex
Columnscolumns
Valuesvalues
Summarize Values Byaggfunc
Grand Totalsmargins=True
Empty cell replacementfill_value
Report Filterdf.query(...) / df.loc[...] before pivoting

Filter before you pivot

There is no separate "filter" argument — slice the DataFrame first, then pivot the subset:

Python
north_south = df[df["Region"].isin(["North", "South"])]
filtered_pivot = pd.pivot_table(
    north_south, values="Revenue", index="Region",
    columns="Month", aggfunc="sum", fill_value=0,
)
print(filtered_pivot)

Apply Excel number formatting on export

to_excel() writes raw values without cell formats. Use the xlsxwriter engine to format a currency column in the output file:

Python
flat = filtered_pivot.reset_index()
with pd.ExcelWriter("formatted_pivot.xlsx", engine="xlsxwriter") as writer:
    flat.to_excel(writer, sheet_name="Report", index=False)
    workbook = writer.book
    worksheet = writer.sheets["Report"]
    money_fmt = workbook.add_format({"num_format": "$#,##0.00"})
    # Columns B onward hold the revenue figures
    worksheet.set_column(1, len(flat.columns) - 1, 14, money_fmt)
print("Wrote formatted_pivot.xlsx")

Troubleshooting

ValueError: Index contains duplicate entries, cannot reshape — this comes from DataFrame.pivot(), not pivot_table(). pivot() cannot collapse duplicate index/column pairs; pivot_table() can, because it aggregates. Use pivot_table with an explicit aggfunc, or deduplicate first: df = df.drop_duplicates(subset=["Region", "Month"]).

ModuleNotFoundError: No module named 'openpyxl' — pandas needs an engine for .xlsx. Install it: pip install openpyxl xlsxwriter.

Header lookups fail (KeyError) — Excel exports often add trailing whitespace. Normalize headers before pivoting:

Python
df.columns = df.columns.str.strip().str.replace(r"\s+", "_", regex=True)
print(df.columns.tolist())

Categorical columns show empty combinations — when index/columns is a categorical dtype, pandas 3.0 defaults to observed=True, showing only combinations present in the data. Pass observed=False to include every category level.

Frequently asked questions

When should I use pivot_table() instead of pivot()? Use pivot_table() whenever a row/column combination can repeat, because it aggregates duplicates via aggfunc. pivot() only reshapes and raises ValueError: Index contains duplicate entries if any index/column pair occurs more than once.

Can I apply a different aggregation to each value column? Yes. Pass aggfunc a dictionary keyed by column, like aggfunc={"Revenue": "sum", "Units": "mean"}. Each metric is then summarized with its own function in a single call.

Why are my totals rows and columns missing? You need margins=True; without it pandas omits the grand totals entirely. Set margins_name to control the label that marks the totals row and column.

How do I filter the source like an Excel Report Filter? There is no filter argument — slice the DataFrame before pivoting, e.g. df[df["Region"].isin(["North", "South"])], then pass that subset to pivot_table().

Why does my output have no cell formatting?to_excel() writes raw values only. Use pd.ExcelWriter with engine="xlsxwriter", then apply a format via worksheet.set_column() with a num_format to style currency or other columns.

Where to go next