1 Examples of the use of the raster package to read and analyze raster data sets

1.1 Preliminaries

The following files will be used here, and are available on the SFTP server, in the folders /nc_files and /Rmaps:

  • ne_110m_admin_0_countries.shp — world coastline and country borders
  • ne_110m_coastline.shp— world coastlines
  • treecov.nc — global UMD tree-cover data
  • cru_ts4.02.1961.1990.tmp.ltm.nc — CRU monthly temperature time series, long-term means
  • cru_ts4.02.1901.2017.tmp.anm.nc— CRU monthly temperature time series (1901-2017)
  • NOAA_ERSSTv5_sst.mnmean.nc — NOAA SST monthly time series (1854 - 2019)
  • NOAA_ERSSTv5_sst.mon.ltm.1981-2010.nc— NOAA SST time series, long-term means
  • HadCRUT.4.6.0.0.median.nc — HadCRUtv4 — Hadley Centre/CRU merged temperature anomalies, 1850-2019

Download or copy the files to a “data folder” like "/Users/bartlein/Projects/ESSD/data/nc_files".

The following examples use functions from the raster and rasterVis packages to read and display a few data sets. The datasets used here happen to be stored as netCDF files, but the raster package can also read and write other formats that raster data are commonly stored in. The raster package implements three basic objects or classes: 1) a rasterLayer, which is a single 2-D, x-y array of values (e.g. a DEM), 2) a rasterStack, a set of individual co-registered (i.e. on the same grid) rasterLayer objects (usually different variables), and 3) a rasterBrick, representing a set of layers that make up 3-D data set of a particular variable, such as a climate variable observed over time, or a set of bands in a multi- or hyperspectral image. Individual layers can be assembled into a stack, and can also be formed by extracting a 2-D slice from a raster brick.

Load the appropriate packages (installing them first if they’re not present, i.e. install.packages("maptools"):

For subsequent use, read the world country-outlines shape file:

1.2 Reading a netCDF file using the raster package

Read a single-variable netCDF dataset describing global tree cover using the raster() function, which operates on a filename. Individual characteristics of the data can be displayed using some utility functions.

## File /Users/bartlein/Projects/ESSD/data/nc_files/treecov.nc (NC_FORMAT_CLASSIC):
## 
##      1 variables (excluding dimension variables):
##         float treecov[lon,lat]   
##             name: treecov
##             long_name: treecov
##             units: 1
##             _FillValue: -9
## 
##      2 dimensions:
##         lon  Size:720
##             standard_name: longitude
##             long_name: longitude
##             units: degrees_east
##             axis: X
##         lat  Size:360
##             standard_name: latitude
##             long_name: latitude
##             units: degrees_north
##             axis: Y
## 
##     3 global attributes:
##         source: UMD Tree Cover Data resampled to 0.5-degrees
##         data: gl-latlong-8km-landcover.bsq.gz
##         history: P.J. Bartlein, 20 Feb 2008
## class      : RasterLayer 
## dimensions : 360, 720, 259200  (nrow, ncol, ncell)
## resolution : 0.5, 0.5  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## source     : /Users/bartlein/Projects/ESSD/data/nc_files/treecov.nc 
## names      : treecov 
## zvar       : treecov
## [1] "/Users/bartlein/Projects/ESSD/data/nc_files/treecov.nc"
## [2] "TRUE"                                                  
## [3] "FALSE"

In the above example, the raster() function opens the netCDF file, and creates a raster object (tree). Printing this object using the print() function displays the netCDF metadata (or “attributes”), while simply “typing” the object name displays the attributes of the raster object. The functions filename(), hasValues() and inMemory() display the filename of the object, whether it currently contains values, and whether it is stored in memory, or still resides on disk. In this case, the data in the file /Users/bartlein/Projects/ESSD/data/nc_files/treecov.nc still resides on disk (inMemory(tree) = FALSE).

The range of tree (tree cover) can be gotten with the cellStats() function. Values less than 0.0 indicate barren land-surfac cover, and these values can be recoded to be equal to 0.0.

## [1] -2 80

Next, a basic plot of the data can be generated using the levelplot() function. (The rasterVis package implements a number of Lattice-type plots for raster data sets.)

A better map can be genrated by setting a rasterTheme (in this case a progressive color scale of green from the RColorBrewer palette), generating the levelplot() object, and then adding a layer to the plot consisting of the countru outlines (world.shp).

Notice in the code above the specification of the layer() function as latticeExtra::layer(). This distinguishes betwee the layer function in the the latticeExtra package from that in ggplot2.

Here is another version, with outlines, a better color scale, and “marginal plots”.

[Back to top]

2 rasterVis-style small-multiple maps

Small-multiple maps consist of an array of maps, each with the same cooridinate system, that can be used, for example, to plot long-term mean values for individual maps. Here, the long-term means of CRU TS 4.02 near-surface air temperature data will be plotted. First, read in another shapefile, with only coastlines.

Set the path, file, and variable names for the CRU data:

Read the CRU data as a raster stack:

## class      : RasterStack 
## dimensions : 360, 720, 259200, 12  (nrow, ncol, ncell, nlayers)
## resolution : 0.5, 0.5  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## names      : X1976.01.16, X1976.02.15, X1976.03.16, X1976.04.16, X1976.05.16, X1976.06.16, X1976.07.16, X1976.08.16, X1976.09.16, X1976.10.16, X1976.11.16, X1976.12.16

Replace the cryptic names in the raster stack with the names of months:

Create a rasterVis lattice-style levelplot. The plot will be rather large and time-consuming to plot, and so a more efficient way to produce it is to write it directly to a .png file. This is accomplished by braketing the necessary code for the levelplot with code to open a .png file:

Here’s what the image looks like:

3 Time-Series plots

A basic way of visualizing data in, for example, a 3-D netCDF file, with time being the third dimension, is to plot time series of individual grid points or areal averages. Here we’ll plot the CRU temperature data for the grid point closest to Eugene, both as a single time series with the monthly values in consecutive order, and in a multi-panel plot, with the data for each month of the year in a differnt panel. Begin by setting the path, file, and variable names:

Open the netCDF file

## File /Users/bartlein/Projects/ESSD/data/nc_files/cru_ts4.02.1901.2017.tmp.anm.nc (NC_FORMAT_NETCDF4):
## 
##      1 variables (excluding dimension variables):
##         float tmp_anm[lon,lat,time]   (Contiguous storage)  
##             units: degrees Celsius
##             _FillValue: 9.96920996838687e+36
##             long_name: near-surface air temperature anomalies
##             base_period: 1961 - 1990
## 
##      3 dimensions:
##         lon  Size:720
##             units: degrees_east
##             long_name: lon
##             axis: X
##         lat  Size:360
##             units: degrees_north
##             long_name: lat
##             axis: Y
##         time  Size:1404
##             units: days since 1900-1-1
##             long_name: time
##             axis: T
## 
##     6 global attributes:
##         title: CRU TS4.02 Mean Temperature
##         institution: Data held at British Atmospheric Data Centre, RAL, UK.
##         source: Run ID = 1811131722. Data generated from:tmp.1804231108.dtb
##         references: Information on the data is available at http://badc.nerc.ac.uk/data/cru/
##         history: P.J. Bartlein, Thu May 16 08:57:21 2019
##         Conventions: CF-1.4

Get longitude, latitude and time:

## [1] -179.75 -179.25 -178.75 -178.25 -177.75 -177.25
## [1] -89.75 -89.25 -88.75 -88.25 -87.75 -87.25
## [1]  720  360 1404
## $hasatt
## [1] TRUE
## 
## $value
## [1] "days since 1900-1-1"

Get the array of time series data:

## [1]  720  360 1404

Convert the time variable from the CF-Conventions “time-since” format to a more conventional form:

## [1] 380 410 439 470 500 531
## [1] 42930 42961 42992 43022 43053 43083
## [1] "1901-01-16 UTC" "1901-02-15 UTC" "1901-03-16 UTC" "1901-04-16 UTC" "1901-05-16 UTC" "1901-06-16 UTC"
## [1] "2017-07-16 UTC" "2017-08-16 UTC" "2017-09-16 UTC" "2017-10-16 UTC" "2017-11-16 UTC" "2017-12-16 UTC"

Get the beginning and ending year of the time series:

## [1] 1901 2017  117   12

Generate a decimal year value for plotting consecutive values:

## [1] 1901.000 1901.083 1901.167 1901.250 1901.333 1901.417
## [1] 2017.500 2017.583 2017.667 2017.750 2017.833 2017.917

Generate year and month values for creating the dataframe that will be used to get multi-panel plots:

## [1] 1901 1901 1901 1901 1901 1901
## [1] 2017 2017 2017 2017 2017 2017
## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun"
## [1] "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
##  Factor w/ 12 levels "Dec","Jan","Feb",..: 2 3 4 5 6 7 8 9 10 11 ...

Note that month is a factor, and without intervention, the monthly panels will plot in the default order for a factor: alphabetical. This would be a little counterintuitive to interpret, but the factor order can be reset using the code fragment:
month <- factor(month, levels=month_names), as was done above.

Now find the grid cell closest to Eugene (approximatly 123.1167 W and 44.08333 N). This is done by using the which.min() function:

## [1]  114.00 -123.25  269.00   44.25

3.1 Time-series plot of consecutive values

The data in the netCDF file are arranged in consecutive order, and so this is straightforward:

3.2 Multi-panel plot of monthly time series data

Now make a dataframe consisting of nt values of the YrMn variable, the temperature time series tmp_ts, and the year and month variables generated above:

## 'data.frame':    1404 obs. of  4 variables:
##  $ YrMn  : num  1901 1901 1901 1901 1901 ...
##  $ year  : int  1901 1901 1901 1901 1901 1901 1901 1901 1901 1901 ...
##  $ month : Factor w/ 12 levels "Dec","Jan","Feb",..: 2 3 4 5 6 7 8 9 10 11 ...
##  $ tmp_ts: num  -0.713 -0.903 -0.637 -1.267 -0.293 ...
##       YrMn year month     tmp_ts
## 1 1901.000 1901   Jan -0.7133333
## 2 1901.083 1901   Feb -0.9033332
## 3 1901.167 1901   Mar -0.6366666
## 4 1901.250 1901   Apr -1.2666662
## 5 1901.333 1901   May -0.2933328
## 6 1901.417 1901   Jun -2.3033333
##          YrMn year month     tmp_ts
## 1399 2017.500 2017   Jul  1.6066670
## 1400 2017.583 2017   Aug  2.5000007
## 1401 2017.667 2017   Sep  1.6066672
## 1402 2017.750 2017   Oct -0.2266665
## 1403 2017.833 2017   Nov  0.7766663
## 1404 2017.917 2017   Dec -0.4133331

Now, construct a ggplot2 multi-panel (e.g. “faceted”) plot. The plot will be constructed with a lowess-smoother line in the background, overplotted by the annual values of each month in a separate panel.

The plot seems quite wide relative to the vertical space occupied by each monthly time series, which makes it difficult to notice any trends in the daa. The aspect of the ratio can be adjusted, guided by the Bill Cleveland’s notion of “banking to 45 (degrees)”. The function bank_slopes() in the package ggthemes can find the aspect ratio that optimizes that notion for a single time series (here, the values for December):

## [1] 0.04973474

A good first-guess aspect ratio can then be found by multiplying this value by the number of panels. However, experience shows that multiplying by the half the number of panels provides a more pleasing result:

[Back to top]

4 Hovmöller and horizon plots

The rasterVis package provides a couple of interesting Lattice-type plots that can be used to visualize 3-D data (usually a function of latitude, longitude and time). The Hovmöller plot is a 2-D time/space diagram, where, for example, zonal (E-W) or meridional (N-S) averages are plotted relative to time. The horizon plot plots multiple time series (here average values for individual latitudinal zones) in a way that allows common trends to be visualized.

The data set used here is the NOAA Extended Sea-Surface Temperature (SST) data. There is considerable set up to do first, which illustrates some of the necessary data wrangling the goes on before a visualization can be constructed. First the “observed” monthly values are read, then the long-term means, and then anomalies are calculated. The data are read using the ncdf4 packages, which allows more flexibility in reading the data than does the raster package, and the resulting array is converted to a raster brick before plotting.

4.1 Set up

Read the SST data:

Get longitude, latitudes and time:

## [1]  0  2  4  6  8 10
## [1] 88 86 84 82 80 78
## [1]  180   89 1983
## $hasatt
## [1] TRUE
## 
## $value
## [1] "days since 1800-1-1 00:00:00"

Note that longitudes run from 0 to 360 degrees E, while latitudes run from 90 N to 90 S. The the N to S layout of latitudes (and of the data) will require various flipping of data arrays later.

Convert time:

## [1] 19723 19754 19782 19813 19843 19874
## [1] 79896 79927 79957 79988 80019 80047
## [1] "1854-01-01 UTC" "1854-02-01 UTC" "1854-03-01 UTC" "1854-04-01 UTC" "1854-05-01 UTC" "1854-06-01 UTC"
## [1] "2018-10-01 UTC" "2018-11-01 UTC" "2018-12-01 UTC" "2019-01-01 UTC" "2019-02-01 UTC" "2019-03-01 UTC"

Now get the SST data

## [1]  180   89 1983

Close the netCDF file:

Flip the latitudes and data so that the data are layed out with the origin in the “lower left” of the map:

##  [1] -88 -86 -84 -82 -80 -78 -76 -74 -72 -70 -68 -66 -64 -62 -60 -58 -56 -54 -52 -50 -48 -46 -44 -42 -40
## [26] -38 -36 -34 -32 -30 -28 -26 -24 -22 -20 -18 -16 -14 -12 -10  -8  -6  -4  -2   0   2   4   6   8  10
## [51]  12  14  16  18  20  22  24  26  28  30  32  34  36  38  40  42  44  46  48  50  52  54  56  58  60
## [76]  62  64  66  68  70  72  74  76  78  80  82  84  86  88

Note that in many cases, it’s desireable to have longitudes run from -180 to 180. This could be done by using the following code, but in the current example, this is NOT done to facilitate looking at plots focused on the Pacific.

Get a levelplot to check that the data has been read correctly:

Inspection of the time range of the data reveals that there are few months from 2019 in the data array. Get rid of these

## [1]  180   89 1980
## [1] "1854-01-01 UTC" "1854-02-01 UTC" "1854-03-01 UTC" "1854-04-01 UTC" "1854-05-01 UTC" "1854-06-01 UTC"
## [1] "2018-07-01 UTC" "2018-08-01 UTC" "2018-09-01 UTC" "2018-10-01 UTC" "2018-11-01 UTC" "2018-12-01 UTC"

Now get the long-term means (which have the same longitudes and latitudes as the monthly values, so no need to get those).:

## [1] 12
## $hasatt
## [1] TRUE
## 
## $value
## [1] "days since 1800-1-1 00:00:00"
## [1] 180  89  12

Flip the data (but not the latitudes; they are already in the right order):

Again, the data could also be rotated, but that’s not done here:

Map the data to check that it’s been read in ok:

Get the beginning and ending years

## [1] 1854 2018  165   12

Now get the anomalies:

Map the data to check the anomalies:

4.2 Hovmöller

The first Hovmöller plot attempts to reproduce the NOAA ESRL PSD Hovmöller plot of tropical Pacific temperatures between 3.5 S and 3.5 N along a transect from the Indian ocean to the eastern Pacific [View Plot]. Set up for selecting those data by getting the longitude j and latitude k indices of the bounding box:

## [1]  21  40 141 280  43  -4  47   4

Now create a raster brick from the 3-D array of anomalies, selecting data from the Pacific region just defined. Also just get data from 1989 onwards:

## class      : RasterBrick 
## dimensions : 5, 121, 605, 360  (nrow, ncol, ncell, nlayers)
## resolution : 1.983471, 1.6  (x, y)
## extent     : 40, 280, -4, 4  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## source     : memory
## names      :      layer.1,      layer.2,      layer.3,      layer.4,      layer.5,      layer.6,      layer.7,      layer.8,      layer.9,     layer.10,     layer.11,     layer.12,     layer.13,     layer.14,     layer.15, ... 
## min values : -2.369419098, -2.215007782, -1.634376526, -1.275438309, -1.187150955, -1.170444489, -1.187540054, -0.972883224, -0.894659042, -0.723871231, -0.934144974, -0.985055923, -0.846725464, -0.291933060, -0.687570572, ... 
## max values :   0.40519524,   0.28303909,   0.34576988,   0.30479622,   0.36996269,   0.72554016,   0.79590607,   0.76388931,   0.67235374,   0.84916687,   0.34784508,   0.56470108,   0.53544807,   0.82624435,   0.86936188, ...

Replace the layer names with appropriate date labels:

## class      : RasterBrick 
## dimensions : 5, 121, 605, 360  (nrow, ncol, ncell, nlayers)
## resolution : 1.983471, 1.6  (x, y)
## extent     : 40, 280, -4, 4  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## source     : memory
## names      :     Jan.1989,     Feb.1989,     Mar.1989,     Apr.1989,     May.1989,     Jun.1989,     Jul.1989,     Aug.1989,     Sep.1989,     Oct.1989,     Nov.1989,     Dec.1989,     Jan.1990,     Feb.1990,     Mar.1990, ... 
## min values : -2.369419098, -2.215007782, -1.634376526, -1.275438309, -1.187150955, -1.170444489, -1.187540054, -0.972883224, -0.894659042, -0.723871231, -0.934144974, -0.985055923, -0.846725464, -0.291933060, -0.687570572, ... 
## max values :   0.40519524,   0.28303909,   0.34576988,   0.30479622,   0.36996269,   0.72554016,   0.79590607,   0.76388931,   0.67235374,   0.84916687,   0.34784508,   0.56470108,   0.53544807,   0.82624435,   0.86936188, ... 
## time       : Jan 1989, Dec 2018 (min, max)

Here’s the Hovmöller plot:

Note that data for the globe could have been selected as follows

4.3 Horizon plot

The horizon plot plots multiple time series (here average values for individual latitudinal zones) in a way that allows common trends to be visualized. For this example, select a more latitudinally extensive chunk of data:

## [1]  90 178 146 290  30 -30  60  30
## class      : RasterBrick 
## dimensions : 31, 57, 1767, 360  (nrow, ncol, ncell, nlayers)
## resolution : 1.964912, 1.935484  (x, y)
## extent     : 178, 290, -30, 30  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## source     : memory
## names      :    layer.1,    layer.2,    layer.3,    layer.4,    layer.5,    layer.6,    layer.7,    layer.8,    layer.9,   layer.10,   layer.11,   layer.12,   layer.13,   layer.14,   layer.15, ... 
## min values : -2.4305210, -2.2150078, -1.6343765, -1.2754383, -1.3796940, -1.4751682, -1.1875401, -0.9724255, -0.9224682, -0.9408112, -0.9465122, -1.0817566, -1.1374607, -2.2438545, -1.4876442, ... 
## max values :  1.7494316,  1.8365631,  1.4213772,  1.4168472,  1.2032204,  0.8245010,  1.2394390,  1.0155792,  0.7823792,  0.4477158,  0.5992851,  0.8868752,  0.7261028,  0.8262444,  0.9126644, ...
## class      : RasterBrick 
## dimensions : 31, 57, 1767, 360  (nrow, ncol, ncell, nlayers)
## resolution : 1.964912, 1.935484  (x, y)
## extent     : 178, 290, -30, 30  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## source     : memory
## names      :   Jan.1989,   Feb.1989,   Mar.1989,   Apr.1989,   May.1989,   Jun.1989,   Jul.1989,   Aug.1989,   Sep.1989,   Oct.1989,   Nov.1989,   Dec.1989,   Jan.1990,   Feb.1990,   Mar.1990, ... 
## min values : -2.4305210, -2.2150078, -1.6343765, -1.2754383, -1.3796940, -1.4751682, -1.1875401, -0.9724255, -0.9224682, -0.9408112, -0.9465122, -1.0817566, -1.1374607, -2.2438545, -1.4876442, ... 
## max values :  1.7494316,  1.8365631,  1.4213772,  1.4168472,  1.2032204,  0.8245010,  1.2394390,  1.0155792,  0.7823792,  0.4477158,  0.5992851,  0.8868752,  0.7261028,  0.8262444,  0.9126644, ... 
## time       : Jan 1989, Dec 2018 (min, max)

Here’s the horizon plot:

4.4 Hovmöller plots – a second example

This second example shows how a Hovmöller plot can be written directly to a .png file, which may be useful when it takes a while to generate. The data here are a combined set of land and ocean temperature anomalies from 1850 through the beginning of 2019.

Use the brick() function in raster to read the data.

## class      : RasterBrick 
## dimensions : 36, 72, 2592, 2030  (nrow, ncol, ncell, nlayers)
## resolution : 5, 5  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## source     : /Users/bartlein/Projects/ESSD/data/nc_files/HadCRUT.4.6.0.0.median.nc 
## names      : X1850.01.16, X1850.02.15, X1850.03.16, X1850.04.16, X1850.05.16, X1850.06.16, X1850.07.16, X1850.08.16, X1850.09.16, X1850.10.16, X1850.11.16, X1850.12.16, X1851.01.16, X1851.02.15, X1851.03.16, ... 
## Date       : 1850-01-16, 2019-02-15 (min, max)
## varname    : temperature_anomaly
## [1] "/Users/bartlein/Projects/ESSD/data/nc_files/HadCRUT.4.6.0.0.median.nc"
## [2] "TRUE"                                                                 
## [3] "FALSE"

As before, replace the layer names (dates) with something more appropriate:

Produce the plot, and send it to a .png file:

Here’s what it looks like: