Descriptive Statistics and Visualization

Descriptive Statistics and Visualization

Overview

In this module, you will learn about some basic tools for describing and exploring data. We will begin with a discussion of the data matrix political scientists use to organize and present data in table form. We then move on to describing data with descriptive statistics. Measures of central tendency like mode, median, and mean can be used to describe the typical case in a data set. Measures of dispersion like the range, variance, and standard deviation can be used to describe how the data are distributed around the typical case. We then discuss how you can present data graphically using boxplots, bar charts, histograms, and more. Combined, these tools can assist you in clearly describing data to an audience, and you will be in a better position to understand data presentations.

Learning Objectives

By the end of this module, students will be able to:

  1. Understand how to compile a data matrix and summarize large batches of data.
  2. Describe data with measures of central tendency and dispersion.
  3. Explain how to graph data for presentation and exploration.

Descriptive Statistics and Visualization

This lecture introduces the basics of organizing, describing, and presenting quantitative data. It provides an overview of the tools available to manage and describe quantitative data, including measures of central tendency, measures of dispersion, probability distributions, and different types of graphs.

Assignment 1

Using the attached dataset extracted from the 2023 Human Development IndexLinks to an external site., answer the following. There is no word count; use a single number, numbers, or word(s) as the question requires.

It is important that you used the attached dataset, rather than downloading the original from the HDI website. I have cleaned up the columns in the dataset in the attachment; as such, the columns referenced in the questions below may differ from the ones in the original data.

Make sure to download the linked Excel file, rather than viewing the dataset in the browser. If you only view it in the browser, you will not be able to see the column headers (e.g., C through F for question 1).

Dataset: HDI2023_dataset.xlsxDownload HDI2023_dataset.xlsx

  1. Calculate the mean and median for columns C through F (individually).
  2. Which measure of central tendency would be appropriate for the variable “Country”?
  3. What is the range of column G, Gross national income (GNI) per capita?
  4. What country has the highest GNI per capita in the sample?
  5. What country has the lowest life expectancy in the sample (Life expectancy at birth)?
  6. Given the mean calculated in (1) for Expected Years of Schooling, which observation has the largest variance from the mean?
  7. Which country has the largest variance between expected years of schooling and mean years of schooling?
  8. What country has seen the greatest decline in HDI score from 2022 to 2023?
  9. What country has a GNI per capita closest to the sample’s mean?
  10. What is the median GNI per capita of the sample?

Assignment 2

Data is everywhere — in news articles, polls, reports, and social media. Learning how to organize, summarize, and visualize data helps you see patterns, understand trends, and communicate findings clearly.

  • Graphs can make data more understandable, but some work better than others depending on the information. If you were trying to show voter turnout across different states, would you use a bar chart, histogram, or boxplot? Why?
  • Find a chart, graph, or table in a news article (or online report) and share it. Was the data presented clearly? What would you change to make it easier to interpret?