Frequently Asked Questions About Code With Antonio Pandas-Matplotlib Data Viz Skill
21 answers covering everything from basics to advanced usage.
// Basics
What is shorthand notation in matplotlib and how does it work?
Shorthand notation is a compact string argument passed to plt.plot() that sets color, marker, and line style in one parameter. For example, 'b.-' means blue color, dot marker, solid line. 'r.-' means red with dots and a line. This keeps your plot calls concise and avoids separate color, marker, and linestyle keyword arguments. It's documented in the pyplot.plot() format string section.
What is the same-directory convention for CSV files in Python?
The same-directory convention means saving your CSV data file in the exact same folder as your Python script or Jupyter notebook. This lets pd.read_csv('filename.csv') resolve without specifying a full or relative path. If you must use a subfolder, use an explicit relative path like 'data/filename.csv'. This prevents FileNotFoundError, the most common beginner issue with data loading.
What is patch_artist in matplotlib boxplots?
patch_artist is a boolean parameter you pass to plt.boxplot(). When set to True, it renders the box as a filled Patch object instead of just an outline. This is required if you want to use set_facecolor() to change the fill color of boxes. Without patch_artist=True, calling set_facecolor() silently does nothing — the boxes remain unfilled. Always include it when customizing box-and-whisker chart colors.
What does the explode parameter do in matplotlib pie charts?
The explode parameter takes a list of float offsets — one per slice — that physically separates slices from the center of the pie. For example, explode=[0.1, 0, 0, 0, 0.1] pulls the first and last slices outward by 10%. Use this to visually emphasize specific slices or to prevent label crowding on small slivers. A value of 0 keeps a slice flush with the center.
// How To
How do I save a matplotlib chart as a high-resolution PNG?
Call plt.savefig('filename.png', dpi=300) after your plot calls and before plt.show(). The dpi=300 parameter produces print-quality output suitable for reports and publications. The file extension controls the format — use .png for lossless or .jpg for smaller files. Note that the function name is savefig (one word), not save_fig. The file saves to your current working directory.
How do I plot all columns in a pandas DataFrame without plotting the index column?
Iterate over the DataFrame column names and skip the index column with an explicit if-check: for col in df.columns: if col != 'Year': plt.plot(df['Year'], df[col], label=col). Without this guard, the loop will attempt to plot the Year column against itself as a data series, producing a meaningless diagonal line on your chart.
How do I create a pie chart from categorical data in a pandas DataFrame?
Filter each category using df.loc[df['col']=='Value'].count()[0] to get integer counts. Collect counts into a list and category names into a labels list. Call plt.pie(counts, labels=labels, autopct='%1.2f%%', colors=['#hex1','#hex2']). Use the explode parameter to separate small slices and pctdistance=0.8 to keep percentage labels inside the chart. Always add plt.title().
How do I change the global style of all my matplotlib charts?
Call plt.style.use('ggplot') at the top of your script or notebook cell before any plot calls. This changes the entire figure's color palette, grid lines, and typography. Other popular styles include 'seaborn', 'fivethirtyeight', and 'dark_background'. Be aware that style changes persist across all subsequent cells in a Jupyter notebook. Reset to plt.style.use('default') if you need the original look for a specific chart.
How do I compare distributions across multiple groups visually?
Use a box-and-whisker chart. Filter each group from your DataFrame using df.loc[df['Group']=='Name']['Metric']. Pass all groups as a list to plt.boxplot([group1, group2, group3], labels=['A','B','C'], patch_artist=True). The chart shows each group's median, interquartile range, and extremes side by side, making distribution differences immediately visible without overlapping histograms.
// Troubleshooting
Why does my pandas column with numbers fail when I try to do math on it?
Your column likely contains string values with unit suffixes like '175lbs' or '6ft'. Pandas reads these as object (string) dtype. You must strip the suffix and cast to numeric before any math or plotting. Use a type-guarded list comprehension: [int(x.strip('lbs')) if type(x)==str else x for x in df['col']]. Reassign the result back to the column, then verify with df['col'].dtype.
Why won't my box plot face colors show up in matplotlib?
You forgot to set patch_artist=True in your plt.boxplot() call. Without this parameter, boxes are rendered as line-only outlines, and set_facecolor() has no effect — it fails silently. Add patch_artist=True, then iterate over the returned box objects: for box in bp['boxes']: box.set_facecolor('#hexcolor'). This is the single most common box plot customization mistake.
Why does pd.read_csv fail even though my file exists?
The most common cause is that the CSV is in a different directory from your script. Verify with import os; print(os.getcwd()) to see your current working directory. Either move the CSV to that directory or use a relative path like 'data/filename.csv'. Another cause is a typo in the filename — filenames and extensions are case-sensitive on Linux and macOS.
Why do my matplotlib tick labels overlap on a time-series chart?
Default tick marks try to show every data point, which causes overlap when you have many entries. Fix this by slicing your ticks: plt.xticks(df['Year'][::3]) shows every third year. Alternatively, create a custom range with range(start, end, step). You can also rotate labels with plt.xticks(rotation=45) as a supplementary fix, but slicing is the primary solution.
How do I handle multi-word column names with spaces in pandas?
Always use bracket notation: df['South Korea'], not dot notation df.South Korea. Dot notation fails when column names contain spaces, hyphens, or start with numbers because Python interprets them as attribute access syntax errors. Inspect df.columns first to see exact column names including whitespace and capitalization, then copy them verbatim into bracket notation.
// Comparisons
How does this pandas-matplotlib skill compare to using Seaborn?
Seaborn is built on top of matplotlib and provides higher-level functions that produce attractive statistical charts with less code. However, this pandas-matplotlib workflow gives you granular control over every element — tick spacing, individual box colors, label placement, and export DPI. Seaborn is better for fast exploratory analysis; this skill is better when you need precise control for publication-quality output or when you want to deeply understand what each line of plotting code does.
How does this skill compare to using Plotly for interactive charts?
Plotly produces interactive, zoomable, hoverable charts ideal for web dashboards, while this pandas-matplotlib workflow produces static images ideal for reports, academic papers, and presentations. Matplotlib gives finer control over print output and DPI. Plotly requires more setup for static export. Choose this skill when your deliverable is a PNG or PDF; choose Plotly when your audience interacts with charts in a browser.
// Advanced
Can I use this workflow in a Jupyter notebook or only in a Python script?
This workflow works in both Jupyter notebooks and standalone Python scripts. In Jupyter, add %matplotlib inline at the top to render charts directly below cells. Be aware that plt.style.use() persists across cells in a notebook session — if you switch styles for one chart, explicitly reset to 'default' before the next. In scripts, plt.show() opens a GUI window; plt.savefig() is more commonly used for headless output.
How do I add a custom value to my pandas Series tick list for forecasting?
A pandas Series cannot be directly concatenated with a plain Python list using the + operator. First convert the Series to a list with .tolist(), then append your custom values: ticks = df['Year'].tolist() + [2025, 2026]. Pass this combined list to plt.xticks(ticks). This is a common edge case when extending a chart's axis to include projected future dates.
How do I zoom into a specific range of a histogram?
After plotting your histogram with plt.hist(df['col'], bins=bins), call plt.xlim(lower, upper) to restrict the visible x-axis range. For example, plt.xlim(80, 100) zooms into the high-value tail. The histogram bars outside this range are hidden but still calculated. Combine with adjusted bins for the visible range for the cleanest result.
What's the best figure size for different chart types in matplotlib?
For time-series line charts, (8, 5) works well as a landscape default. For box-and-whisker plots where vertical spread matters, make height greater than width — try (6, 8). For pie charts, a square ratio like (7, 7) prevents oval distortion. Always set plt.figure(figsize=(w, h)) before your plot call, not after. Adjust based on how many series or groups you're comparing.
Should I use plt.style.use('ggplot') or manually set colors for each element?
Start with plt.style.use('ggplot') or another named style to set a coherent global palette and typography. Then override individual element colors only where needed using hex codes like '#4287f5'. This follows the 'style before color' principle — the global style handles 80% of the aesthetics, and you fine-tune the rest. Applying individual colors without a base style leads to inconsistent, time-consuming styling.