Live Jobs
NEWSoftware Engineer - Automation Tester@UplersNEWSenior Embedded Systems Software Engineer@Google India Pvt LtdNEWManager, Software Engineering - IOS - Player Team, Bangalore@Warner Bros. DiscoveryNEWSenior Software Engineer - Java@ClarivateNEWSoftware Engineer - Automation Tester@UplersSoftware Developer 2, Python Backend Developer@Kinaxis Inc.Software Developer Intern@Mexilet Technologies Private LimitedFull-Stack Software Engineer Intern - Remote@UplersSoftware Engineer, AIML - Fixed Term Contract (6 months)@NatWest GroupNEWSoftware Engineer - Automation Tester@UplersNEWSenior Embedded Systems Software Engineer@Google India Pvt LtdNEWManager, Software Engineering - IOS - Player Team, Bangalore@Warner Bros. DiscoveryNEWSenior Software Engineer - Java@ClarivateNEWSoftware Engineer - Automation Tester@UplersSoftware Developer 2, Python Backend Developer@Kinaxis Inc.Software Developer Intern@Mexilet Technologies Private LimitedFull-Stack Software Engineer Intern - Remote@UplersSoftware Engineer, AIML - Fixed Term Contract (6 months)@NatWest GroupNEWSoftware Engineer - Automation Tester@UplersNEWSenior Embedded Systems Software Engineer@Google India Pvt LtdNEWManager, Software Engineering - IOS - Player Team, Bangalore@Warner Bros. DiscoveryNEWSenior Software Engineer - Java@ClarivateNEWSoftware Engineer - Automation Tester@UplersSoftware Developer 2, Python Backend Developer@Kinaxis Inc.Software Developer Intern@Mexilet Technologies Private LimitedFull-Stack Software Engineer Intern - Remote@UplersSoftware Engineer, AIML - Fixed Term Contract (6 months)@NatWest Group

Pandas Data AnalysisCheat Sheet

Master data manipulation and analysis with comprehensive Pandas commands and shortcuts

Back to Cheat Sheets
Data Manipulation
Analysis Ready
Performance Tips

1
DataFrame Creation & Loading

Read CSV File

Load data from CSV file into DataFrame

pd.read_csv("file.csv")
basicdata-loadingfile-io
Example:
df = pd.read_csv("data.csv", index_col=0, parse_dates=True)
Quick Tip:
Use parse_dates=True for automatic date parsing
✅ Best Practice:
DO: Specify dtypes for better performance and memory usage, use chunksize for large files, set index_col when appropriate
❌ Common Mistake:
DON'T: Load entire large files into memory without chunking, ignore encoding issues (use encoding="utf-8"), forget to handle missing values with na_values parameter

Create DataFrame

Create DataFrame from dictionary, list, or array

pd.DataFrame(data, columns=cols)
basiccreationbasics
Example:
df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6]})
Quick Tip:
Can create from dict, list of lists, or numpy arrays
✅ Best Practice:
DO: Use dictionaries for column-oriented data, specify index when meaningful, set appropriate dtypes during creation
❌ Common Mistake:
DON'T: Create DataFrames in loops (use pd.concat instead), mix data types without considering memory usage, forget to set meaningful column names

Preview Data

View first or last n rows of DataFrame

df.head(n) / df.tail(n)
basicexplorationpreview
Example:
df.head(10) # First 10 rows df.tail(5) # Last 5 rows
Quick Tip:
Default n=5, use df.sample(n) for random rows
✅ Best Practice:
DO: Use head() for initial data inspection, combine with info() and describe() for comprehensive overview, use sample() for random inspection of large datasets
❌ Common Mistake:
DON'T: Rely only on head() for data understanding (may miss patterns), use large n values that clutter output, forget that head/tail may not be representative of entire dataset

Data Overview

Get DataFrame structure and statistical summary

df.info() / df.describe()
basicexplorationanalysis
Example:
df.info() # Data types, null counts df.describe() # Statistical summary
Quick Tip:
Use df.describe(include="all") for all columns
✅ Best Practice:
DO: Use info() to check data types and missing values, describe(include="all") for comprehensive statistics, check memory usage with info(memory_usage="deep")
❌ Common Mistake:
DON'T: Skip data type verification before analysis, ignore missing value counts, use describe() without understanding what statistics are meaningful for your data

DataFrame Properties

Get DataFrame dimensions and structure info

df.shape / df.columns / df.index
basicexplorationinfo
Example:
print(f"Shape: {df.shape}") # (rows, columns) print(f"Columns: {df.columns.tolist()}")
Quick Tip:
df.dtypes shows data types of each column
✅ Best Practice:
DO: Check shape before operations, use .columns.tolist() for column names as list, understand index vs columns distinction
❌ Common Mistake:
DON'T: Confuse shape with size, ignore index information, assume column order is preserved without checking

Read Excel File

Load data from Excel file into DataFrame

pd.read_excel("file.xlsx", sheet_name="Sheet1")
basicdata-loadingfile-io
Example:
df = pd.read_excel("data.xlsx", sheet_name=0, header=1)
Quick Tip:
Use sheet_name=None to read all sheets into dict
✅ Best Practice:
DO: Specify sheet_name explicitly, handle multiple sheets with sheet_name=None, set appropriate header row
❌ Common Mistake:
DON'T: Assume first sheet is what you want, ignore header parameter, forget to handle merged cells in Excel files

Save to CSV

Save DataFrame to CSV file

df.to_csv("output.csv", index=False)
basicexportfile-io
Example:
df.to_csv("results.csv", index=False, encoding="utf-8")
Quick Tip:
Use index=False to avoid saving row numbers
✅ Best Practice:
DO: Use index=False unless index is meaningful, specify encoding explicitly, handle special characters with proper encoding
❌ Common Mistake:
DON'T: Save index when not needed, ignore encoding issues, forget to handle file paths with spaces or special characters

2
Data Selection & Indexing

Select Columns

Select single column or multiple columns

df["column"] / df[["col1", "col2"]]
basicselectionbasics
Example:
name = df["name"] # Single column (Series) subset = df[["name", "age"]] # Multiple columns (DataFrame)
Quick Tip:
Single brackets return Series, double brackets return DataFrame
✅ Best Practice:
DO: Use double brackets when you need DataFrame output, check if columns exist before selection, use .loc for more explicit selection
❌ Common Mistake:
DON'T: Confuse Series vs DataFrame output, select non-existent columns without error handling, use chained selection that can cause SettingWithCopyWarning

Label & Position Indexing

Select data by labels (loc) or positions (iloc)

df.loc[rows, cols] / df.iloc[rows, cols]
intermediateselectionindexing
Example:
df.loc[0:5, "name":"age"] # By labels df.iloc[0:5, 1:3] # By positions
Quick Tip:
loc is inclusive of end, iloc is exclusive
✅ Best Practice:
DO: Use .loc for label-based selection, .iloc for position-based selection, understand inclusive vs exclusive behavior
❌ Common Mistake:
DON'T: Mix up loc and iloc behavior, use chained indexing instead of .loc/.iloc, forget that loc includes both endpoints while iloc excludes the end

Boolean Filtering

Filter rows based on conditions

df[df["column"] > value]
basicfilteringconditions
Example:
adults = df[df["age"] > 18] high_earners = df[df["salary"] >= 50000]
Quick Tip:
Use & (and), | (or), ~ (not) for multiple conditions
✅ Best Practice:
DO: Use parentheses for complex conditions, combine conditions with & and |, use .query() for complex string-based filtering
❌ Common Mistake:
DON'T: Use "and"/"or" instead of "&"/"|" for DataFrame conditions, forget parentheses in complex conditions, chain boolean operations without parentheses

Query Method

Filter data using string expressions

df.query("condition")
intermediatefilteringquery
Example:
df.query("age > 25 and salary < 60000") df.query("name.str.contains('John')")
Quick Tip:
More readable for complex conditions
✅ Best Practice:
DO: Use for complex conditions, reference external variables with @, combine with string methods for text filtering
❌ Common Mistake:
DON'T: Use for simple conditions (boolean indexing is faster), forget @ for external variables, ignore performance implications for large datasets

Value Membership

Filter rows where column values are in a list

df[df["column"].isin(values)]
basicfilteringmembership
Example:
df[df["department"].isin(["Sales", "Marketing"])] df[df["id"].isin([1, 5, 10])]
Quick Tip:
Use ~df["col"].isin(values) for NOT IN
✅ Best Practice:
DO: Use for filtering with multiple values, combine with ~ for exclusion, use with lists or sets for membership testing
❌ Common Mistake:
DON'T: Use multiple OR conditions instead of isin(), forget ~ for negation, use with very large lists without considering performance

Unique Values

Get unique values or count of unique values

df["column"].unique() / df["column"].nunique()
basicexplorationanalysis
Example:
unique_depts = df["department"].unique() num_unique = df["department"].nunique()
Quick Tip:
Use value_counts() to get frequency of each unique value
✅ Best Practice:
DO: Use nunique() for quick counts, unique() to see actual values, combine with value_counts() for frequency analysis
❌ Common Mistake:
DON'T: Use len(df["col"].unique()) instead of nunique(), ignore NaN handling in unique counts, use unique() on large text columns without sampling