Pandas Data AnalysisCheat Sheet

Master data manipulation and analysis with comprehensive Pandas commands and shortcuts

Back to Cheat Sheets

Data Manipulation

Analysis Ready

Performance Tips

1
DataFrame Creation & Loading

Read CSV File

Load data from CSV file into DataFrame

pd.read_csv("file.csv")

basicdata-loadingfile-io

Example:

df = pd.read_csv("data.csv", index_col=0, parse_dates=True)

Quick Tip:

Use parse_dates=True for automatic date parsing

✅ Best Practice:

DO: Specify dtypes for better performance and memory usage, use chunksize for large files, set index_col when appropriate

❌ Common Mistake:

DON'T: Load entire large files into memory without chunking, ignore encoding issues (use encoding="utf-8"), forget to handle missing values with na_values parameter

Create DataFrame

Create DataFrame from dictionary, list, or array

pd.DataFrame(data, columns=cols)

basiccreationbasics

Example:

df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6]})

Quick Tip:

Can create from dict, list of lists, or numpy arrays

✅ Best Practice:

DO: Use dictionaries for column-oriented data, specify index when meaningful, set appropriate dtypes during creation

❌ Common Mistake:

DON'T: Create DataFrames in loops (use pd.concat instead), mix data types without considering memory usage, forget to set meaningful column names

Preview Data

View first or last n rows of DataFrame

df.head(n) / df.tail(n)

basicexplorationpreview

Example:

df.head(10) # First 10 rows df.tail(5) # Last 5 rows

Quick Tip:

Default n=5, use df.sample(n) for random rows

✅ Best Practice:

DO: Use head() for initial data inspection, combine with info() and describe() for comprehensive overview, use sample() for random inspection of large datasets

❌ Common Mistake:

DON'T: Rely only on head() for data understanding (may miss patterns), use large n values that clutter output, forget that head/tail may not be representative of entire dataset

Data Overview

Get DataFrame structure and statistical summary

df.info() / df.describe()

basicexplorationanalysis

Example:

df.info() # Data types, null counts df.describe() # Statistical summary

Quick Tip:

Use df.describe(include="all") for all columns

✅ Best Practice:

DO: Use info() to check data types and missing values, describe(include="all") for comprehensive statistics, check memory usage with info(memory_usage="deep")

❌ Common Mistake:

DON'T: Skip data type verification before analysis, ignore missing value counts, use describe() without understanding what statistics are meaningful for your data

DataFrame Properties

Get DataFrame dimensions and structure info

df.shape / df.columns / df.index

basicexplorationinfo

Example:

print(f"Shape: {df.shape}") # (rows, columns) print(f"Columns: {df.columns.tolist()}")

Quick Tip:

df.dtypes shows data types of each column

✅ Best Practice:

DO: Check shape before operations, use .columns.tolist() for column names as list, understand index vs columns distinction

❌ Common Mistake:

DON'T: Confuse shape with size, ignore index information, assume column order is preserved without checking

Read Excel File

Load data from Excel file into DataFrame

pd.read_excel("file.xlsx", sheet_name="Sheet1")

basicdata-loadingfile-io

Example:

df = pd.read_excel("data.xlsx", sheet_name=0, header=1)

Quick Tip:

Use sheet_name=None to read all sheets into dict

✅ Best Practice:

DO: Specify sheet_name explicitly, handle multiple sheets with sheet_name=None, set appropriate header row

❌ Common Mistake:

DON'T: Assume first sheet is what you want, ignore header parameter, forget to handle merged cells in Excel files

Save to CSV

Save DataFrame to CSV file

df.to_csv("output.csv", index=False)

basicexportfile-io

Example:

df.to_csv("results.csv", index=False, encoding="utf-8")

Quick Tip:

Use index=False to avoid saving row numbers

✅ Best Practice:

DO: Use index=False unless index is meaningful, specify encoding explicitly, handle special characters with proper encoding

❌ Common Mistake:

DON'T: Save index when not needed, ignore encoding issues, forget to handle file paths with spaces or special characters

2
Data Selection & Indexing

Select Columns

Select single column or multiple columns

df["column"] / df[["col1", "col2"]]

basicselectionbasics

Example:

name = df["name"] # Single column (Series) subset = df[["name", "age"]] # Multiple columns (DataFrame)

Quick Tip:

Single brackets return Series, double brackets return DataFrame

✅ Best Practice:

DO: Use double brackets when you need DataFrame output, check if columns exist before selection, use .loc for more explicit selection

❌ Common Mistake:

DON'T: Confuse Series vs DataFrame output, select non-existent columns without error handling, use chained selection that can cause SettingWithCopyWarning

Label & Position Indexing

Select data by labels (loc) or positions (iloc)

df.loc[rows, cols] / df.iloc[rows, cols]

intermediateselectionindexing

Example:

df.loc[0:5, "name":"age"] # By labels df.iloc[0:5, 1:3] # By positions

Quick Tip:

loc is inclusive of end, iloc is exclusive

✅ Best Practice:

DO: Use .loc for label-based selection, .iloc for position-based selection, understand inclusive vs exclusive behavior

❌ Common Mistake:

DON'T: Mix up loc and iloc behavior, use chained indexing instead of .loc/.iloc, forget that loc includes both endpoints while iloc excludes the end

Boolean Filtering

Filter rows based on conditions

df[df["column"] > value]

basicfilteringconditions

Example:

adults = df[df["age"] > 18] high_earners = df[df["salary"] >= 50000]

Quick Tip:

Use & (and), | (or), ~ (not) for multiple conditions

✅ Best Practice:

DO: Use parentheses for complex conditions, combine conditions with & and |, use .query() for complex string-based filtering

❌ Common Mistake:

DON'T: Use "and"/"or" instead of "&"/"|" for DataFrame conditions, forget parentheses in complex conditions, chain boolean operations without parentheses

Query Method

Filter data using string expressions

df.query("condition")

intermediatefilteringquery

Example:

df.query("age > 25 and salary < 60000") df.query("name.str.contains('John')")

Quick Tip:

Value Membership

Filter rows where column values are in a list

df[df["column"].isin(values)]

basicfilteringmembership

Example:

df[df["department"].isin(["Sales", "Marketing"])] df[df["id"].isin([1, 5, 10])]

Quick Tip:

Use ~df["col"].isin(values) for NOT IN

✅ Best Practice:

DO: Use for filtering with multiple values, combine with ~ for exclusion, use with lists or sets for membership testing

❌ Common Mistake:

DON'T: Use multiple OR conditions instead of isin(), forget ~ for negation, use with very large lists without considering performance

Unique Values

Get unique values or count of unique values

df["column"].unique() / df["column"].nunique()

basicexplorationanalysis

Example:

unique_depts = df["department"].unique() num_unique = df["department"].nunique()

Quick Tip:

Use value_counts() to get frequency of each unique value

✅ Best Practice:

DO: Use nunique() for quick counts, unique() to see actual values, combine with value_counts() for frequency analysis

❌ Common Mistake:

DON'T: Use len(df["col"].unique()) instead of nunique(), ignore NaN handling in unique counts, use unique() on large text columns without sampling

Placement Materials | Pandas Data Analysis Cheat Sheet

Pandas Data AnalysisCheat Sheet

1DataFrame Creation & Loading

Read CSV File

Create DataFrame

Preview Data

Data Overview

DataFrame Properties

Read Excel File

Save to CSV

2Data Selection & Indexing

Select Columns

Label & Position Indexing

Boolean Filtering

Query Method

Value Membership

Unique Values

1
DataFrame Creation & Loading

2
Data Selection & Indexing