Guide

Advanced Data Transformation And CleaningDeep dive

Fill Missing Values in Excel with Pandas fillna

Q: Why does fillna() leave some blank cells untouched?

Those cells are empty strings, not NaN, so fillna() skips them. Run df.replace(r"^\s*$", pd.NA, regex=True) after loading, and avoid keep_default_na=False on import unless you map blanks yourself.

Q: How do I fill each column with a different value in one call?

Pass a dictionary, like df.fillna({"Revenue": df["Revenue"].median(), "Status": "Pending"}). This is the safest default because it never forces one column's fill onto another dtype.

Q: When should I use interpolate() instead of fillna()?

Use interpolate() for a continuous numeric trend, where it estimates a gap from its neighbors rather than repeating a single number. A scalar fillna or forward fill is better for categorical or non-trending data.

Q: Why did a numeric fill turn my column into text?

A column mixing numbers and text is object dtype, so filling with a number can coerce values to strings. Select numerics first with df.select_dtypes(include="number").columns and fill only those.

Q: Do my formulas survive to_excel()?

No — to_excel() writes computed values, not formulas. If downstream consumers need live formulas, write into a pre-built template with openpyxl instead of overwriting the sheet.

Fill missing values in an Excel file with pandas fillna: normalize blanks to NaN, then apply a scalar, a per-column dictionary, forward/back fill, or interpolation, and export.

To fill missing values in an Excel workbook with pandas, load the file with pd.read_excel(), turn Excel blanks into real NaN, apply DataFrame.fillna() (or ffill/bfill/interpolate), and write the result with df.to_excel(). The examples below run in order and share a namespace; the first block writes a sample workbook.

Create a sample workbook

Python

import pandas as pd

seed = pd.DataFrame({
    "Region": ["North", "South", "West", "East"],
    "Revenue": [12000, None, 9800, None],
    "Status": ["Closed", None, "Open", None],
    "Date": ["2024-01-01", None, "2024-01-03", "2024-01-04"],
})
seed.to_excel("monthly_report.xlsx", index=False)

Load and normalize blanks

Excel frequently exports an empty cell as an empty string, which pandas keeps as valid text — fillna() skips it. Convert whitespace-only cells to NaN first so every gap is fillable:

Python

df = pd.read_excel("monthly_report.xlsx", engine="openpyxl")
df = df.replace(r"^\s*$", pd.NA, regex=True)
print(df.isna().sum())

Fill with a per-column dictionary

A dictionary fills each column with a type-appropriate value in one call. This is the safest default because it never forces one column's fill onto another:

Python

fill_map = {
    "Revenue": df["Revenue"].median(),
    "Status": "Pending",
}
df = df.fillna(fill_map)
print(df)

Fill strategies at a glance

Strategy	Syntax	Best for
Global scalar	`df.fillna(0)`	A single uniform default
Per-column dict	`df.fillna({"A": 0, "B": "n/a"})`	Mixed dtypes, categorical defaults
Forward / back fill	`df.ffill()` / `df.bfill()`	Ordered logs, time series
Interpolation	`df.interpolate()`	Continuous numeric trends

Forward and back fill for ordered data

For time series, carry the last known value forward, then backfill any leading gap. In pandas 3.0 the method= argument to fillna() was removed — use the dedicated ffill() and bfill() methods:

Python

df["Date"] = pd.to_datetime(df["Date"], errors="coerce")
df = df.sort_values("Date")
df["Date"] = df["Date"].ffill().bfill()
print(df[["Region", "Date"]])

Interpolate numeric trends

When a numeric column represents a smooth trend, interpolate() estimates gaps from neighboring values instead of repeating one number:

Python

series = pd.Series([10.0, None, None, 40.0])
print(series.interpolate(method="linear"))

Export the filled workbook

Write without the index so the output matches the input layout:

Python

df.to_excel("monthly_report_filled.xlsx", index=False, engine="openpyxl")
print("Wrote monthly_report_filled.xlsx")

Troubleshooting

A numeric fill turns the column into text. A column mixing numbers and text is object dtype; filling it with a number can coerce values to strings. Isolate numerics first:

Python

num_cols = df.select_dtypes(include="number").columns
df[num_cols] = df[num_cols].fillna(0)
print(df.dtypes)

fillna() leaves blanks untouched. They are empty strings, not NaN. Run df.replace(r"^\s*$", pd.NA, regex=True) after loading. Also avoid keep_default_na=False on import unless you map blanks yourself.

Formula cells lose their formulas on export. to_excel() writes computed values, not formulas. If downstream consumers need live formulas, write into a pre-built template with openpyxl instead of overwriting the sheet wholesale.

Validation checklist — confirm df.isna().sum() is zero where it must be, check df.dtypes to ensure numeric columns stayed numeric, and open the exported file to confirm no #VALUE! cells.

Frequently asked questions

Why does fillna() leave some blank cells untouched? Those cells are empty strings, not NaN, so fillna() skips them. Run df.replace(r"^\s*$", pd.NA, regex=True) after loading, and avoid keep_default_na=False on import unless you map blanks yourself.

How do I fill each column with a different value in one call? Pass a dictionary, like df.fillna({"Revenue": df["Revenue"].median(), "Status": "Pending"}). This is the safest default because it never forces one column's fill onto another dtype.

When should I use interpolate() instead of fillna()? Use interpolate() for a continuous numeric trend, where it estimates a gap from its neighbors rather than repeating a single number. A scalar fillna or forward fill is better for categorical or non-trending data.

Why did a numeric fill turn my column into text? A column mixing numbers and text is object dtype, so filling with a number can coerce values to strings. Select numerics first with df.select_dtypes(include="number").columns and fill only those.

Do my formulas survive to_excel()? No — to_excel() writes computed values, not formulas. If downstream consumers need live formulas, write into a pre-built template with openpyxl instead of overwriting the sheet.

Where to go next

For profiling, classifying, and validating gaps end to end, see Handling Missing Data in Excel Reports.
For the broader cleaning toolkit, see Advanced Data Transformation and Cleaning.
Joins are a common source of new NaN — see Merging and Joining Excel DataFrames.