SOS9028 - Data visualization

Course content

This course is for anyone who wants to learn how to produce, refine, and present effective visualizations generated from datasets, summary tables, or the output of statistical models.

The effective use of graphs and charts is an important way to explore data for yourself and to communicate your ideas and results to others.

Being able to produce effective plots from data is also the best way to develop an eye for reading and understanding visualizations made by others, whether presented in academia, business, policy, or the media.

This seminar provides an intensive, hands-on introduction to the principles and practice of data visualization. We will begin with an overview of some basic principles. We will focus not just on the aesthetic aspects of good plots, but on how their effectiveness is rooted in the way we perceive properties like length, absolute and relative size, orientation, shape, and color. Students will learn how to produce and refine plots using ggplot, a powerful, versatile, and widely-used visualization library for R. It implements a "grammar of graphics" that gives us a coherent way to produce visualizations by expressing relationships between the attributes of data and their graphical representation.

Through a series of worked examples and exercises, students will learn how to build plots piece by piece, beginning with summaries of single variables and moving on to more complex graphics. Topics covered include plotting continuous and categorical variables, layering information on graphics; faceting grouped data to produce effective "small multiple" plots; transforming data to easily produce visual summaries on the graph such as trend lines, linear fits, error ranges, and boxplots; creating maps, together with simpler alternatives to maps for country- or state-level data. We will also cover cases where we are not working directly with a dataset but rather with estimates from a statistical model. Using these tools we will then explore the practical process of refining plots to accomplish common tasks such as highlighting key features of the data, labeling particular points, annotating plots, and changing their overall appearance. Finally we will examine some strategies for presenting graphical results in different formats (such as in print, online, or in slides) and to different sorts of audiences.

The course is held by Kieran Healy. Kieran Healy is Associate Professor in Sociology and the Kenan Institute for Ethics at Duke University. His research interests are in economic sociology, the sociology of culture, the sociology of organizations, and social theory. He is the author of Last Best Gifts: Altruism and the Market for Human Blood and Organs. His current focus is on the moral order of market society, the effect of quantification on the emergence and stabilization of social categories, and the link between these two topics.

Learning outcome

At the end of the course, participants will

- Understand the basic principles behind effective data visualization

- Have a practical sense for why some graphs and figures work well

  while others may fail to inform or actively mislead

- Know how to create a wide range of plots in R using ggplot2

- Know how to refine plots for effective presentation


Ph.d.-students at the Department of Sociology and Human Geography register for the course in Studentweb.

Participants outside the Department of Sociology and Human Geography shall fill out this application form.

The application deadline is 16th July 2017!


Formal prerequisite knowledge

Students should have some basic familiarity with elementary statistical concepts. Some knowledge of R and RStudio will be helpful, but is not required.


Room: PC-lab 035, Harriet Holters Building (in the basement)


Wednesday August 16th

9.00-11.30: Session 1. Course Overview and Supervised Lab Time.

               Getting oriented to R, RStudio, and RMarkdown.

               Make your first graph.

11.30-12.30: Lunch

12.30-14.30: Session 2. Lecture and Discussion.

               Reading: Tufte; Cleveland; Ware, Few.

               Looking at Data: Good graphs and Bad.

               Perception and Data Visualization

               Visual Tasks and Decoding Graphs

               Problems of Honesty and Good Judgement

14.30-15: Coffee break

15.00-17.00: Session 3. Lecture and Supervised Lab Time

               Reading: Healy; Grolemund & Wickham

               Core ggplot concepts

               Tidy data

               Data Mappings and Aesthetics

               Geoms and plot types


Thursday August 17th

9.00-11.30: Session 1. Lecture and Supervised Lab Time

               Reading: Healy; Grolemund & Wickham              

               Grouping, Faceting, and Transforming Data

               Small Multiples

               Data Transformations via Geoms and Pipelines

11.30-12.30: Lunch

12.30-14.30: Session 2. Lecture and Supervised Lab Time

               Working with geoms

               Writing and drawing on plots

               Scales, guides, and themes         

14.30-15: Coffee break

15.00-17.00: Session 3. Lecture and Supervised Lab Time

               Working with Models

               Getting model-based graphics right

               Model objects

               Generating predictions

               Using Broom

               Marginal effects

               Other tools


Friday August 18th

9.00-11.30: Lecture and Supervised Lab Time

               Choropleth Maps


               Small Multiple Maps

               Is your Data Really Spatial?

11.30-12.30: Lunch

12.30-14.30: Session 2. Lecture and Supervised Lab Time

               Refining Plots

               Color and Color Layering

               Working with Themes   

14.30-15: Coffee break

15.00-17.00: Session 3. Supervised Lab Time

               Case Studies: Redrawing bad graphs


Reading list

  • Readings will be supplied in PDF form by the instructor
  • Kieran Healy. 2017. Data Visualization for Social Science. (Draft.)
  • William S. Cleveland. Visualizing Data. Hobart Press.
  • Stephen Few. 2009. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press.
  • Garrett Grolemund and Hadley Wickham. 2016. R for Data Science. Wiley.
  • Jeffrey Heer and Michael Bostock. "Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design." Proceedings of the Sigchi Conference on Human Factors in Computing Systems, CHI ’10, New York.
  • Edward Tufte. 1983. The Visual Display of Quantitative Information. Graphics Press.
  • Colin Ware. 2008. Visual Thinking for Design. Morgan Kaufman.
  • Leland Wilkinson. 2005. The Grammar of Graphics. Springer.


Students will be assessed on (1) Attendance and active participation, (2) Completion of exercises during course time, and (3) A final paper that, using the tools of reproducible research covered in the course, produces effective visualizations from a data set of interest to the student and agreed upon with the instructor in advance of submission.

The paper is to be submitted by 1st October 2017 to

Grading scale

Grades are awarded on a pass/fail scale. Read more about the grading system.

Facts about this course






Autumn 2017

16-18th August 2017


Autumn 2017

Teaching language