How Do Data Journalists Build Publication Charts from CSV?
For Journalism and data journalism professionals · Based on Code With Antonio Pandas-Matplotlib Data Viz Skill
// TL;DR
Data journalists need charts that communicate findings clearly to a general audience while maintaining accuracy. This pandas-matplotlib workflow ensures every chart has proper titles, axis labels, legends, and readable tick marks — the non-negotiables of journalistic data visualization. The google-augmented documentation approach helps you solve styling problems fast on deadline. Export at 300 DPI for print or web. The workflow is especially useful for comparing groups (box plots), showing trends (line graphs), and displaying proportional breakdowns (pie charts) in public-interest datasets.
Why Do Data Journalists Need a Structured Chart Workflow?
Deadline pressure leads to unlabeled axes, misleading scales, and charts that confuse readers instead of informing them. A structured workflow prevents these errors by making labeling, tick-mark alignment, and styling into non-skippable steps rather than afterthoughts.
The Label Everything principle is especially critical in journalism: a chart without axis labels or a title is unpublishable. Every chart produced with this workflow includes `plt.title()`, `plt.xlabel()`, `plt.ylabel()`, and `plt.legend()` as mandatory steps, not optional polish.
How Do I Go from a Public Dataset CSV to a Publication-Ready Chart?
Follow the 12-step workflow, adapted for journalistic needs:
1. Load and inspect: Use `pd.read_csv('dataset.csv')` and `df.head()`. Verify column names are what you expect — they're case-sensitive.
2. Clean: Government and NGO datasets frequently have unit suffixes or mixed types. Use type-guarded list comprehensions to strip suffixes and cast to numeric.
3. Choose chart type based on your story angle: Are you showing a trend over time (line graph)? A distribution (histogram)? A proportional breakdown (pie chart)? A group comparison (box plot)?
4. Build the minimal plot first: Get the correct shape on screen with `plt.show()` before adding any styling. This catches data errors early.
5. Style for readability: Use `plt.style.use('ggplot')` for a clean baseline. Set figure size to match your publication's column width. Fix tick marks to prevent overlap.
6. Export: `plt.savefig('chart.png', dpi=300)` for print; `dpi=150` for web-only if file size matters.
How Do I Ensure Accuracy and Avoid Misleading Readers?
The Tick Mark Sanity principle guards against one of the most common data journalism errors: distorted scales. Always explicitly set `plt.xticks()` and `plt.yticks()` to match your actual data intervals. Never let matplotlib auto-generate ticks on important axes — the defaults may skip values, compress ranges, or create the visual impression of a trend that isn't there.
For pie charts, always include `autopct='%1.2f%%'` to show exact percentages rather than relying on readers to estimate slice sizes by eye. For histograms, align tick labels with bin edges using `plt.xticks(bins)` so readers can see exactly where each bar starts and ends.
When comparing groups with box-and-whisker charts, include the `labels` parameter so each box is clearly identified. Use `patch_artist=True` and distinct colors to differentiate groups visually.
How Do I Handle Messy Government or NGO Dataset Columns?
Public datasets frequently contain columns with unit suffixes (e.g., '175lbs', '$1,200'), inconsistent capitalization, or mixed types. Before any visualization:
- Inspect dtypes with `df.dtypes`
- Strip suffixes: `[int(x.strip('lbs')) if type(x)==str else x for x in df['col']]`
- Verify column names exactly with `df.columns` — copy-paste rather than typing from memory
- Use bracket notation for multi-word column names: `df['GDP per capita']`, never `df.GDP per capita`
Spending five minutes on data cleaning prevents charts that silently misrepresent values because matplotlib plotted string data as categorical labels instead of numeric values.
Next step: Download a public interest CSV dataset (Census data, WHO statistics, election results), apply all 12 workflow steps, and produce one chart of each type. Review each chart with the question: could a reader understand this without any additional context?
// FREQUENTLY ASKED QUESTIONS
What DPI should I use for charts going into online articles vs. print?
Use dpi=300 for print publications — this is the standard for professional printing. For web-only articles, dpi=150 produces crisp visuals at smaller file sizes. Set this in plt.savefig('chart.png', dpi=300). The file extension (.png for lossless quality, .jpg for smaller size) controls the output format automatically.
How do I avoid making a misleading chart as a journalist?
Always explicitly set tick marks to match real data intervals — never rely on matplotlib's defaults, which can compress or skip values and create false visual impressions. Include exact percentage labels on pie charts with autopct='%1.2f%%'. Start y-axes at zero for bar charts. Label every axis. These practices are built into the workflow as mandatory steps, not optional polish.
Can I style charts to match my publication's visual guidelines?
Yes. Start with plt.style.use('ggplot') or another base style, then override specific elements with your publication's hex color codes. Set figure dimensions with plt.figure(figsize=(w, h)) to match column widths. Use consistent font sizes in plt.title(fontsize=18) and plt.xlabel(fontsize=12). Save the complete styling as a reusable script template for all future stories.