Building Data Visualization Tools

How to Work with Maps


The content of this blog is based on examples/ notes/ experiments related to the material presented in the “Building Data Visualization Tools” module of the “Mastering Software Development in R” Specialization (Coursera) created by Johns Hopkins University [1].

Required Packages

  • ggplot2, a system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”.
  • gridExtra, provides a number of user-level functions to work with “grid” graphics.
  • dplyr, a tool for working with data frame like objects, both in memory and out of memory.
  • viridis, the viridis color palette.
  • ggmap, a collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps)
# If necessary to install a package run
# install.packages("packageName")

# Load packages
library(ggplot2)
library(gridExtra)
library(dplyr)
library(viridis)
library(ggmap)

Data

The ggplot2 package includes some datasets with geographic information. The ggplot2::map_data() function allows to get map data from the maps package (use ?map_data form more information).

Specifically the italy dataset [2] is used for some of the examples below. Please note that this dataset was prepared aroind 1989 so it is out of date especially information pertaining provinces (see ?maps::italy).

# Get the italy dataset from ggplot2
# Consider only the following provinces "Bergamo" , "Como", "Lecco", "Milano", "Varese"
# and arrange by group and order (ascending order)
italy_map <- ggplot2::map_data(map = "italy")
italy_map_subset <- italy_map %>%
  filter(region %in% c("Bergamo" , "Como", "Lecco", "Milano", "Varese")) %>%
  arrange(group, order)

Each observation in the dataframe defines a geographical point with some extra information:

  • long & lat, longitude and latitude of the geographical point
  • group, an identifier connected with the specific polygon points are part of
    • a map can be made of different polygons (e.g. one polygon for the main land and one for each islands, one polygon for each state, …)
  • order, the order of the point within the specific group
    • how the all of the points being part of the same group should be connected in order to create the polygon
  • region, the name of the province (Italy) or state (USA)
head(italy_map, 3)
##       long      lat group order        region subregion
## 1 11.83295 46.50011     1     1 Bolzano-Bozen      
## 2 11.81089 46.52784     1     2 Bolzano-Bozen      
## 3 11.73068 46.51890     1     3 Bolzano-Bozen      

How to work with maps

Having spatial information in the data gives the opportunity to map the data or, in other words, visualizing the information contained in the data in a geographical context. R has different possibilities to map data, from normal plots using longitudelatitude as xy to more complex spatial data objects (e.g. shapefiles).

Mapping with ggplot2 package

The most basic way to create maps with your data is to use ggplot2, create a ggplot object and then, add a specific geom mapping longitute to x aesthetic and latitude to y aesthetic [4] [5]. This simple approach can be used to:

  • create maps of geographical areas (states, country, etc.)
  • map locations as points, lines, etc.

Create a map showing “Bergamo”, Como”, “Varese” and “Milano” provinces in Italy using simple points…

When plotting simple points the geom_point function is used. In this case the polygon and order of the points is not important when plotting.

italy_map_subset %>%
  ggplot(aes(x = long, y = lat)) +
  geom_point(aes(color = region))
Data Visualization

Author: Pier Lorenzo Paracchini

He is a generalist with a passion for people, data and technology. He has a Master of Science in Electronic Engineering from the Politecnico Di Milano and works as an enthusiast developer with a data scientist twist in the software innovation sector in Statoil. His journey in data science and machine learning started in 2014.

LinkedIn

Follow us on: