CRAN Spatial taskview: https://cran.r-project.org/web/views/Spatial.html
The bulk of the geospatial/GISci analysis tools are contained in the following packages:
maptools
reading and writing spatial data, particularly shapefilessp
manipulating data in one of the Spatial Data classesrgdal
R “bindings” to GDAL (Geospatial Data Abstraction Layer)rgeos
R interface to the GEOS “geometry engine” (overlays, etc.)The book (ASDAR2) R.S. Bivand, E. Pebesma, V. Gómez-Rubio (2013) Applied Spatial Data Analysis with R, 2nd Ed., Springer.
Here’s a .pdf: http://link.springer.com/content/pdf/10.1007%2F978-1-4614-7618-4.pdf (must be on UO Network)
There is also an R package maps
that includes a world database, and methods for plotting with the R core graphics.
There are several related “classes” of spatial data in R, each consisting of the specific spatial coordinate or geometry data, or the coordinate or geometry data and an associate data frame:
SpatialPoints
and SpatialPointsDataFrame
SpatialLines
and SpatialLinesDataFrame
SpatialPolygons
and SpatialPolygonDataFrame
SpatialPixels
and SpatialPixelDataFrame
SpatialGrid
and SpatialGridDataFrame
SpatialMultiPoints
and SpatialMultiPointsDataFrame
The names of the classes pretty much describe the kind of information they contain. One way to look at the landscape of geospatial data analysis in R is that maptools
and rgdal
cover reading and writing the spatial data classes, sp
handles plotting, conversions and manipulations (including projections with SpTransform()
) and rgeos
handles geospatial analysis tasks.
Many of the functions in these packages will be exercised below, but these examples are by now means exhaustive.
Here is a very simple example of reading and plotting a couple of shape files, that illustrates use use of some of the maptools
and sp
functions:
Load the sp
and rgdal
package:
When maptools
or sp
is loaded, they also check for the presence of the rgeos
package.
Read two shape files, one of Oreogon county outlines, the other of climate-station locations:
# read county outlines
otl_path <- "/Users/bartlein/Projects/ESSD/data/shp_files/OR/"
otl_name <- "orotl.shp"
otl_file <- paste(otl_path, otl_name, sep="")
orotl_shp <- readOGR(otl_file)
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/bartlein/Projects/ESSD/data/shp_files/OR/orotl.shp", layer: "orotl"
## with 36 features
## It has 1 fields
Check what kind of spatial data class this shapefile is
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x -124.55840 -116.46944
## y 41.98779 46.23626
## Is projected: NA
## proj4string : [NA]
## Data attributes:
## NAME
## Baker : 1
## Benton : 1
## Clackamas: 1
## Clatsop : 1
## Columbia : 1
## Coos : 1
## (Other) :30
Note that the shapefile is a SpatialLinesDataFrame
that in addition to the geometry of the counties (the outlines) also contains data (the names). Plot the county outline shapefile:
Now get the climate-station data
# read stations
orstations_path <- "/Users/bartlein/Projects/ESSD/data/shp_files/OR/"
orstations_name <- "orstations.shp"
orstations_file <- paste(orstations_path, orstations_name, sep="")
orstations_shp <- readOGR(orstations_file)
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/bartlein/Projects/ESSD/data/shp_files/OR/orstations.shp", layer: "orstations"
## with 92 features
## It has 12 fields
## Integer64 fields read as strings: station elevation pjan pjul pann
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
## Object of class SpatialPointsDataFrame
## Coordinates:
## min max
## coords.x1 -124.567 -116.967
## coords.x2 42.050 46.150
## Is projected: FALSE
## proj4string :
## [+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0]
## Number of points: 92
## Data attributes:
## station latitude longitude elevation tjan tjul
## 350197 : 1 Min. :42.05 Min. :-124.6 142 : 2 Min. :-6.000 Min. :12.20
## 350265 : 1 1st Qu.:43.60 1st Qu.:-123.2 18 : 2 1st Qu.:-1.225 1st Qu.:17.30
## 350304 : 1 Median :44.46 Median :-121.7 2 : 2 Median : 0.750 Median :19.20
## 350328 : 1 Mean :44.32 Mean :-121.3 24 : 2 Mean : 1.417 Mean :19.01
## 350412 : 1 3rd Qu.:45.28 3rd Qu.:-119.6 3 : 2 3rd Qu.: 4.150 3rd Qu.:20.20
## 350417 : 1 Max. :46.15 Max. :-117.0 6 : 2 Max. : 8.400 Max. :26.40
## (Other):86 (Other):80
## tann pjan pjul pann Sta
## Min. : 3.200 36 : 4 9 :12 289 : 3 MLB : 2
## 1st Qu.: 8.700 41 : 4 8 :11 256 : 2 ANT : 1
## Median :10.400 178 : 3 6 :10 267 : 2 ARL : 1
## Mean : 9.885 40 : 3 12 : 8 458 : 2 ASH : 1
## 3rd Qu.:11.200 58 : 3 5 : 8 1025 : 1 AST : 1
## Max. :12.600 151 : 2 7 : 8 1031 : 1 BAN : 1
## (Other):73 (Other):35 (Other):81 (Other):85
## Name
## ANTELOPE 1 N USA-OR : 1
## ARLINGTON USA-OR : 1
## ASHLAND 1 N USA-OR : 1
## ASTORIA WSO //R USA-O: 1
## BAKER FAA AIRPORT USA-OR : 1
## BAKER KBKR USA-OR : 1
## (Other) :86
Note that this shape file also contains some climate data for each station, as well as the station names and locations.
Plot the station data:
# plot the station data on top of the outline map
plot(orotl_shp)
points(orstations_shp, pch=16, cex=0.7, col="blue")
More examples of simple maps can be found in the materials for Lecture 5 in GEOG 4/595: http://geog.uoregon.edu/bartlein/courses/geog495/lectures/lec05.html
This example demonstrates how to extract data from a raster for a set of target points contained in a .csv file. In this case, the raster is the Ramankutty and Foley potential natural vegetation data set (https://www.nelson.wisc.edu/sage/data-and-models/global-land-use/index.php), and the target points are the Global Charcoal Database (GCDv3) site locations (http://www.gpwg.paleofire.org)
## Loading required package: lattice
## Loading required package: latticeExtra
## Loading required package: RColorBrewer
User the raster
package to read the vegetation data
# read potential natural vegetation data sage_veg30.nc:
vegtype_path <- "/Users/bartlein/Projects/ESSD/data/nc_files/"
vegtype_name <- "sage_veg30.nc"
vegtype_file <- paste(vegtype_path, vegtype_name, sep="")
vegtype <- raster(vegtype_file, varname="vegtype")
vegtype
## class : RasterLayer
## dimensions : 360, 720, 259200 (nrow, ncol, ncell)
## resolution : 0.5, 0.5 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : /Users/bartlein/Projects/ESSD/data/nc_files/sage_veg30.nc
## names : vegetation.type
## zvar : vegtype
Plot the vegetation data using a rasterVis
levelplot:
mapTheme <- rasterTheme(region=rev(brewer.pal(8,"Greens")))
levelplot(vegtype, margin=F, par.settings=mapTheme,
main="Potential Natural Vegetation")
Read the charcoal data locations:
# read GCDv3 sites
csv_path <- "/Users/bartlein/Projects/ESSD/data/csv_files/"
csv_name <- "v3i_nsa_globe.csv"
csv_file <- paste(csv_path, csv_name, sep="")
gcdv3 <- read.csv(csv_file)
str(gcdv3)
## 'data.frame': 703 obs. of 6 variables:
## $ Site_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Lat : num 44.7 44.9 45.7 45.9 46.3 ...
## $ Lon : num -111 -110 -115 -114 -115 ...
## $ Elev : num 2530 1884 2250 2300 1770 ...
## $ depo_context: Factor w/ 12 levels "BOSE","BUOR",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ Site_Name : Factor w/ 703 levels "17940 core","7-M",..: 130 605 75 33 219 530 179 62 55 97 ...
In order to use the extract()
function from raster
, the target points must be turned into a SpatialPoints
data set.
# turn into SpatialPoints
gcdv3_coords <- cbind(gcdv3$Lon, gcdv3$Lat)
gcdv3_pts <- SpatialPoints(coords=gcdv3_coords, proj4string = CRS("+proj=longlat +datum=WGS84"))
class(gcdv3_pts)
## [1] "SpatialPoints"
## attr(,"package")
## [1] "sp"
Plot the target points on top of ths source map:
plt <- levelplot(vegtype, margin=F, par.settings=mapTheme,
main="Potential Natural Vegetation")
plt + layer(sp.points(gcdv3_pts, col="blue", pch=16, cex=0.5))
Now extract the data for the target points:
# extract data from the raster at the target points
gcdv3_vegtype <- extract(vegtype, gcdv3_pts, method="simple")
class(gcdv3_vegtype)
## [1] "numeric"
## [1] 6 6 6 6 6 6
Make a dataframe of the extracted data that could be saved as a .csv file, and plot it:
pts <- data.frame(gcdv3$Lon, gcdv3$Lat, gcdv3_vegtype)
names(pts) <- c("Lon", "Lat", "vegtype")
head(pts, 10)
## Lon Lat vegtype
## 1 -110.6161 44.6628 6
## 2 -110.3467 44.9183 6
## 3 -114.9867 45.7044 6
## 4 -114.2619 45.8919 6
## 5 -114.6503 46.3206 6
## 6 -113.4403 45.8406 6
## 7 -114.3592 48.1658 6
## 8 -123.4600 42.0200 4
## 9 -122.5575 41.3467 6
## 10 -122.4954 41.2075 4
plotclr <- rev(brewer.pal(8,"Greens"))
plotclr <- c("#AAAAAA", plotclr)
cutpts <- c(0, 2, 4, 6, 8, 10, 12, 14, 16)
color_class <- findInterval(gcdv3_vegtype, cutpts)
plot(pts$Lon, pts$Lat, col=plotclr[color_class+1], pch=16)
Plot the extracted data at the target points on top of the source points. If the extraction is successful, the target-point colors should dissappear against the background.
plt <- levelplot(vegtype, margin=F, par.settings=mapTheme,
main="Potential Natural Vegetation")
plotclr <- rev(brewer.pal(8,"Greens"))
cutpts <- c(0, 2, 4, 6, 8, 10, 12, 14, 16)
color_class <- findInterval(gcdv3_vegtype, cutpts)
plt + layer(sp.points(gcdv3_pts, col=plotclr[color_class], pch=16, cex=0.6)) +
layer(sp.points(gcdv3_pts, col="black", pch=1, cex=0.6))
Here’s a second example of extracting values from an array, by referencing the specific cell or array element that a target point falls in. In this example, a netCDF file of bioclimatic variables is read using ncdf4
and the values of mtco
(mean temperature of the coldest month) are extracted.
# set path and filename
ncpath <- "/Users/bartlein/Projects/ESSD/data/nc_files/"
ncname <- "cru10min30_bio.nc"
ncfname <- paste(ncpath, ncname, sep="")
## File /Users/bartlein/Projects/ESSD/data/nc_files/cru10min30_bio.nc (NC_FORMAT_CLASSIC):
##
## 19 variables (excluding dimension variables):
## float climatology_bounds[nv,time]
## float gdd0[lon,lat]
## name: gdd0
## long_name: Growing-degree days, 0C base
## units: degdays
## _FillValue: -99
## float gdd5[lon,lat]
## name: gdd5
## long_name: Growing-degree days, 5C base
## units: degdays
## _FillValue: -99
## float chill[lon,lat]
## name: chill
## long_name: Number of days below 5C
## units: days
## _FillValue: -99
## float mtco[lon,lat]
## name: mtco
## long_name: Mean temperature coldest month
## units: C
## _FillValue: -99
## float mtwa[lon,lat]
## name: mtwa
## long_name: Mean temperature warmest month
## units: C
## _FillValue: -99
## float mipt[lon,lat]
## name: mipt
## long_name: Priestley-Taylor (alpha) parameter (AE/PE)
## units: ratio
## _FillValue: -99
## float aaetpt[lon,lat]
## name: aaetpt
## long_name: Actual evapotranspiration (AE)
## units: mm
## _FillValue: -99
## float apetpt[lon,lat]
## name: apetpt
## long_name: Potential evapotranspiration (PE)
## units: mm
## _FillValue: -99
## float miptever[lon,lat]
## name: miptever
## long_name: Alpha -- evergreen assimilation period
## units: ratio
## _FillValue: -99
## float aaetever[lon,lat]
## name: aaetever
## long_name: AE -- evergreen assimilation period
## units: mm
## _FillValue: -99
## float apetever[lon,lat]
## name: apetever
## long_name: PE -- evergreen assimilation period
## units: mm
## _FillValue: -99
## float miptdecd[lon,lat]
## name: miptdecd
## long_name: Alpha -- deciduous ssimilation period
## units: ratio
## _FillValue: -99
## float aaetdecd[lon,lat]
## name: aaetdecd
## long_name: AE -- deciduous assimilation period
## units: mm
## _FillValue: -99
## float apetdecd[lon,lat]
## name: apetdecd
## long_name: PE -- deciduous assimilation period
## units: mm
## _FillValue: -99
## float totsnpt[lon,lat]
## name: totsnpt
## long_name: Total snow (from surface water balance)
## units: mm
## _FillValue: -99
## float apr[lon,lat]
## name: apr
## long_name: Annual precipitation (mm)
## units: mm
## _FillValue: -99
## float pjanpann[lon,lat]
## name: pjanpann
## long_name: January/Annual precipitation ratio
## units: ratio
## _FillValue: -99
## float pjulpann[lon,lat]
## name: pjulpann
## long_name: July/Annual precipitation ratio
## units: ratio
## _FillValue: -99
##
## 4 dimensions:
## lon Size:720
## standard_name: longitude
## long_name: longitude
## units: degrees_east
## axis: X
## lat Size:360
## standard_name: latitude
## long_name: latitude
## units: degrees_north
## axis: Y
## time Size:12
## standard_name: time
## long_name: time
## units: days since 1900-01-01 00:00:00.0 -0:00
## axis: T
## calendar: standard
## climatology: climatology_bounds
## nv Size:2
##
## 3 global attributes:
## source: calculated using Sarah Shafer's version of bioclim.f
## data: e:\Projects\cru\work\cl_2.0_regrid\regions\cru10min30\cru10min30_bio.dat
## history: P.J. Bartlein, 6 Dec 2007 from 2 Dec 2006 data
## [1] -179.75 -179.25 -178.75 -178.25 -177.75 -177.25
## [1] -89.75 -89.25 -88.75 -88.25 -87.75 -87.25
## [1] 720 360
# get the mtco data
mtco <- ncvar_get(ncin,"mtco")
dlname <- ncatt_get(ncin,"mtco","long_name")
dunits <- ncatt_get(ncin,"mtco","units")
fillvalue <- ncatt_get(ncin,"mtco","_FillValue")
dim(mtco)
## [1] 720 360
Close the netCDF file using the nc_close()
function.
Plot the control data.
# levelplot of the slice
grid <- expand.grid(lon=lon, lat=lat)
cutpts <- c(-50,-40,-30,-20,-10,0,10,20,30,40,50)
levelplot(mtco ~ lon * lat, data=grid, at=cutpts, cuts=11, pretty=T,
col.regions=(rev(brewer.pal(10,"RdBu"))))
Now extract the data for the target points:
Get the indices (j’s and k’s) of the grid cell that each target point lies in. For each target point, figure out which column (j
) and row (k
) a target point falls in. This code is basically the same as that used in reshaping a “short” data frame into an array. The function that is defined and executed within the sapply()
function figures out which column (j
) and row (‘k’) in the control-data array a target point falls in. The j
’s and k
’s together describe the control-data grid cells the individual target points fall in.
j <- sapply(gcdv3$Lon, function(x) which.min(abs(lon-x)))
k <- sapply(gcdv3$Lat, function(x) which.min(abs(lat-x)))
head(cbind(j,k)); tail(cbind(j,k))
## j k
## [1,] 139 270
## [2,] 140 270
## [3,] 131 272
## [4,] 132 272
## [5,] 131 273
## [6,] 134 272
## j k
## [698,] 219 218
## [699,] 219 218
## [700,] 115 271
## [701,] 114 269
## [702,] 115 269
## [703,] 123 277
Get the data for each j, k combination. The way to do this is the convert the mtco
array to a vector, and then calculate an index jk
for each target value:
mtco_vec <- as.vector(mtco)
jk <- (k-1)*nlon + j
gcdv3_mtco <- mtco_vec[jk]
head(cbind(j,k,jk,gcdv3_mtco,lon[j],lat[k]))
## j k jk gcdv3_mtco
## [1,] 139 270 193819 -11.4 -110.75 44.75
## [2,] 140 270 193820 -10.7 -110.25 44.75
## [3,] 131 272 195251 -6.3 -114.75 45.75
## [4,] 132 272 195252 -6.4 -114.25 45.75
## [5,] 131 273 195971 -5.8 -114.75 46.25
## [6,] 134 272 195254 -6.6 -113.25 45.75
gcdv3_mtco[is.na(gcdv3_mtco)] <- -99
pts <- data.frame(gcdv3$Lon, gcdv3$Lat, gcdv3_mtco)
names(pts) <- c("Lon", "Lat", "mtco")
head(pts, 20)
## Lon Lat mtco
## 1 -110.6161 44.66280 -11.4
## 2 -110.3467 44.91830 -10.7
## 3 -114.9867 45.70440 -6.3
## 4 -114.2619 45.89190 -6.4
## 5 -114.6503 46.32060 -5.8
## 6 -113.4403 45.84060 -6.6
## 7 -114.3592 48.16580 -6.3
## 8 -123.4600 42.02000 3.3
## 9 -122.5575 41.34670 -0.6
## 10 -122.4954 41.20750 0.5
## 11 -122.5778 41.38360 -0.6
## 12 -122.5092 41.19130 -0.6
## 13 -110.1700 44.28000 -11.2
## 14 -123.5792 45.82420 4.4
## 15 -123.9067 46.10060 4.9
## 16 -119.9767 33.95639 -99.0
## 17 -127.0500 65.21667 -27.9
## 18 -108.1500 37.41667 -4.6
## 19 -105.5000 37.55000 -6.9
## 20 -112.2167 36.71667 -2.7
Plot the extracted values of mtco
. To do this, the colors for plotting the different levels of mtco
are generated from and RColorBrewer
palette, and augmented by gray ("#AAAAAA"
) to handle missing values (i.e. from marine charcoal records, or those from sites “off” the cru10min30 grid).
plotclr <- rev(brewer.pal(10,"RdBu"))
plotclr <- c("#AAAAAA", plotclr)
cutpts <- c(-50,-40,-30,-20,-10,0,10,20,30,40,50)
color_class <- findInterval(gcdv3_mtco, cutpts)
plot(gcdv3$Lon, gcdv3$Lat, col=plotclr[color_class+1], pch=16)
A common problem arises in dealing with spatial data is the “clipping” or “trimming” problem, in which one wants to know whether one or more points lie within the outlines of a particular polygon, or in the case of multiple polygons, which polygon an individual point lies in. More formally, this is known as the “point in polygon” problem. Here the process of clipping the na_10km_v2 climate grid to a tree-species shape file, in particular, that for Picea mariana (black spruce) is demonstrated. What we want is a list of climate data set points that lie withing the range of P. mariana.
## rgeos version: 0.4-3, (SVN revision 595)
## GEOS runtime version: 3.6.1-CAPI-1.10.1
## Linking to sp version: 1.3-1
## Polygon checking: TRUE
Read the shapefile for Picea mariana
# read the shape file for Picea Mariana
shp_path <- "/Users/bartlein/Projects/ESSD/data/shp_files/USGS_pp1650/"
shp_name <- "picemari.shp"
shp_file <- paste(shp_path, shp_name, sep="")
picea_shp <- readOGR(shp_file)
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/bartlein/Projects/ESSD/data/shp_files/USGS_pp1650/picemari.shp", layer: "picemari"
## with 1585 features
## It has 5 fields
## Integer64 fields read as strings: PICEMARI_ PICEMARI_I
Add a coordinate refernce system to the shapefile, and plot it:
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
Read the na10km_v2 points as a .csv file (for illustration, in practice it would be more efficient to read it as a netCDF file).
# read the na10km_v2 points (as a .csv file)
csv_path <- "/Users/bartlein/Projects/ESSD/data/csv_files/"
csv_name <- "na10km_v2.csv"
csv_file <- paste(csv_path, csv_name, sep="")
na10km_v2 <- read.csv(csv_file)
str(na10km_v2)
## 'data.frame': 277910 obs. of 7 variables:
## $ x : num 2690000 2700000 2710000 2720000 2730000 2740000 2750000 2760000 2770000 2780000 ...
## $ y : num -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 ...
## $ lon : num -77.3 -77.2 -77.1 -77 -77 ...
## $ lat : num 5.12 5.1 5.08 5.05 5.03 ...
## $ mask : int 1 1 1 1 1 1 1 1 1 1 ...
## $ etopo1: int 67 61 67 26 43 56 30 57 134 652 ...
## $ srtm30: int 78 39 52 22 79 49 32 50 110 318 ...
For convenience in display (because the na10km_v2 grid includes Europe and wraps around the dateline), reduce it to only the points west of 45W:
Make a SpatialPointsDataFrame
of the na10km_v2 data-point locations, and plot it:
# make a SpatialPointsDataFrame
na10km_v2_coords <- cbind(na10km_v2$lon, na10km_v2$lat)
na10km_v2_pts <- SpatialPoints(coords=na10km_v2_coords, proj4string = CRS("+proj=longlat +datum=WGS84"))
class(na10km_v2_pts)
## [1] "SpatialPoints"
## attr(,"package")
## [1] "sp"
Now overlay the points onto the polygons. Note that the input shape file was a SpatialPolygonsDataFrame
object, so convert it to a simpler “SpatialPolygons” file first. The over()
function from rgeos
takes a little while to run.
## [1] "SpatialPolygons"
## attr(,"package")
## [1] "sp"
## [1] "integer"
Make a dataframe from the “trimmed” points and plot it on top of the na10km_v2 points:
picea_pts2 <- data.frame(na10km_v2_coords[!is.na(picea_pts), ] )
plot(na10km_v2_pts, pch=16, cex=0.4)
points(picea_pts2, pch=16, cex=0.1, col="green")
Another common task is to grid or rasterize a set of point data, creating a gridded data set of counts, averages, minima, maxima, etc. This can be illustrated using the FPA-FOD daily fire-fire start data set: Spatial wildfire occurrence data for the United States, 1992-2013/Fire Program Analysis Fire-Occurrence Database [FPA_FOD_20150323] (3nd Edition) (Short, K.C., 2014, Earth Syst. Sci. Data, 6, 1-27) – http://www.fs.usda.gov/rds/archive/Product/RDS-2013-0009.3/. The idea here is to calculate the number and average area of the fires that occured in 0.5-degree grid cells that cover the coterminous U.S.
Read the fire-start data:
# read the FPA-FOD fire-start data
csv_path <- "/Users/bartlein/Projects/ESSD/data/csv_files/"
csv_name <- "fpafod_1992-2013.csv"
csv_file <- paste(csv_path, csv_name, sep="")
fpafod <- read.csv(csv_file) # takes a while
str(fpafod)
## 'data.frame': 1727476 obs. of 14 variables:
## $ datasource : Factor w/ 1 level "fpafod": 1 1 1 1 1 1 1 1 1 1 ...
## $ sourceid : int 1 2 3 4 5 6 7 8 9 10 ...
## $ latitude : num 40 38.9 39 38.6 38.6 ...
## $ longitude : num -121 -120 -121 -120 -120 ...
## $ year : int 2005 2004 2004 2004 2004 2004 2004 2005 2005 2004 ...
## $ mon : int 2 5 5 6 6 6 7 3 3 7 ...
## $ day : int 2 12 31 28 28 30 1 8 15 1 ...
## $ daynum : int 33 133 152 180 180 182 183 67 74 183 ...
## $ area_ha : num 0.0405 0.1012 0.0405 0.0405 0.0405 ...
## $ cause_original: int 9 1 5 1 1 1 1 5 5 1 ...
## $ cause1 : int 2 1 2 1 1 1 1 2 2 1 ...
## $ cause2 : int 8 1 5 1 1 1 1 5 5 1 ...
## $ stateprov : Factor w/ 52 levels "AK","AL","AR",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ agency : Factor w/ 11 levels "BIA","BLM","BOR",..: 6 6 6 6 6 6 6 6 6 6 ...
Convert the fire-start data to a SpatialPointsDataFrame
fpafod_coords <- cbind(fpafod$longitude, fpafod$latitude)
fpafod_pts <- SpatialPointsDataFrame(coords=fpafod_coords, data=data.frame(fpafod$area_ha))
names(fpafod_pts) <- "area_ha"
Create a couple of empty rasters to hold the rasterized data:
# create (empty) rasters
cell_size <- 0.5
lon_min <- -128.0; lon_max <- -65.0; lat_min <- 25.5; lat_max <- 50.5
ncols <- ((lon_max - lon_min)/cell_size)+1; nrows <- ((lat_max - lat_min)/cell_size)+1
us_fire_counts <- raster(nrows=nrows, ncols=ncols, xmn=lon_min, xmx=lon_max, ymn=lat_min, ymx=lat_max, res=0.5, crs="+proj=longlat +datum=WGS84")
us_fire_counts
## class : RasterLayer
## dimensions : 50, 126, 6300 (nrow, ncol, ncell)
## resolution : 0.5, 0.5 (x, y)
## extent : -128, -65, 25.5, 50.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
us_fire_area <- raster(nrows=nrows, ncols=ncols, xmn=lon_min, xmx=lon_max, ymn=lat_min, ymx=lat_max, res=0.5, crs="+proj=longlat +datum=WGS84")
us_fire_area
## class : RasterLayer
## dimensions : 50, 126, 6300 (nrow, ncol, ncell)
## resolution : 0.5, 0.5 (x, y)
## extent : -128, -65, 25.5, 50.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
Now rasterize the data, first as counts (number of fires in each grid cell), and then as the average size of the fires in each grid cell. Also plot the data (both on a log10 scale:
## class : RasterLayer
## dimensions : 50, 126, 6300 (nrow, ncol, ncell)
## resolution : 0.5, 0.5 (x, y)
## extent : -128, -65, 25.5, 50.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names : layer
## values : 1, 8755 (min, max)
## class : RasterBrick
## dimensions : 50, 126, 6300, 2 (nrow, ncol, ncell, nlayers)
## resolution : 0.5, 0.5 (x, y)
## extent : -128, -65, 25.5, 50.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names : ID, area_ha
## min values : 1.29922e+05, 4.04686e-03
## max values : 1677519.00, 18210.87
Write the two rasterized data sets out as variables in a netCDF data set. Create some variables, and replace the R NAs
with netCDF fillvalues:
# make necessary vectors and arrays
lon <- seq(lon_min+0.25, lon_max-0.25, by=cell_size)
lat <- seq(lat_max-0.25, lat_min+0.25, by=-1*cell_size)
print(c(length(lon), length(lat)))
## [1] 126 50
fillvalue <- 1e32
us_fire_counts2 <- t(as.matrix(us_fire_counts$layer, nrows=ncols, ncols=nrows))
dim(us_fire_counts2)
## [1] 126 50
us_fire_counts2[is.na(us_fire_counts2)] <- fillvalue
us_fire_area2 <- t(as.matrix(us_fire_area$area_ha, nrows=ncols, ncols=nrows))
dim(us_fire_area2)
## [1] 126 50
Write out a netCDF file:
# write out a netCDF file
library(ncdf4)
# path and file name, set dname
ncpath <- "/Users/bartlein/Projects/ESSD/data/nc_files/"
ncname <- "us_fires.nc"
ncfname <- paste(ncpath, ncname, sep="")
# create and write the netCDF file -- ncdf4 version
# define dimensions
londim <- ncdim_def("lon","degrees_east",as.double(lon))
latdim <- ncdim_def("lat","degrees_north",as.double(lat))
# define variables
dname <- "fpafod_counts"
dlname <- "Number of fires, 1992-2013"
v1_def <- ncvar_def(dname,"1",list(londim,latdim),fillvalue,dlname,prec="single")
dname <- "fpafod_mean_area"
dlname <- "Average Fire Size, 1992-2013"
v2_def <- ncvar_def(dname,"ha",list(londim,latdim),fillvalue,dlname,prec="single")
# create netCDF file and put arrays
ncout <- nc_create(ncfname,list(v1_def, v2_def),force_v4=TRUE)
# put variables
ncvar_put(ncout,v1_def,us_fire_counts2)
ncvar_put(ncout,v2_def,us_fire_area2)
# put additional attributes into dimension and data variables
ncatt_put(ncout,"lon","axis","X")
ncatt_put(ncout,"lat","axis","Y")
# add global attributes
ncatt_put(ncout,0,"title","FPA-FOD Fires")
ncatt_put(ncout,0,"institution","USFS")
ncatt_put(ncout,0,"source","http://www.fs.usda.gov/rds/archive/Product/RDS-2013-0009.3/")
ncatt_put(ncout,0,"references", "Short, K.C., 2014, Earth Syst. Sci. Data, 6, 1-27")
history <- paste("P.J. Bartlein", date(), sep=", ")
ncatt_put(ncout,0,"history",history)
ncatt_put(ncout,0,"Conventions","CF-1.6")
# Get a summary of the created file:
ncout
## File /Users/bartlein/Projects/ESSD/data/nc_files/us_fires.nc (NC_FORMAT_NETCDF4):
##
## 2 variables (excluding dimension variables):
## float fpafod_counts[lon,lat] (Contiguous storage)
## units: 1
## _FillValue: 1.00000003318135e+32
## long_name: Number of fires, 1992-2013
## float fpafod_mean_area[lon,lat] (Contiguous storage)
## units: ha
## _FillValue: 1.00000003318135e+32
## long_name: Average Fire Size, 1992-2013
##
## 2 dimensions:
## lon Size:126
## units: degrees_east
## long_name: lon
## axis: X
## lat Size:50
## units: degrees_north
## long_name: lat
## axis: Y
##
## 6 global attributes:
## title: FPA-FOD Fires
## institution: USFS
## source: http://www.fs.usda.gov/rds/archive/Product/RDS-2013-0009.3/
## references: Short, K.C., 2014, Earth Syst. Sci. Data, 6, 1-27
## history: P.J. Bartlein, Mon May 20 22:46:40 2019
## Conventions: CF-1.6
Another common task involves tranferring values from one gridded data set to another. When the grids are identical, this is trivial, but when the “target” grid is different from the source or “control” grid, this involves interpolation (as does also the case when the target points are irregularly distributed). The most widely used method for interpolation within a control grid is bilinear interpolation, which involves finding the control grid points that surround a particular target point, and then simultaneously (linearlly) interplating in the x- and y-directions. The method is implemented in the raster
package, and relatively fast version is implemented in the fields
package.
The example here uses a lower-resolution verions of the ETOPO1
global DEM as the source file, and the locations of the (land-only) points in the na10km_v2 grid as the targets. These are read in here from a .csv file, but they also could have come from a netCDF file.
Load the appropriate packages.
Read the etopo1 netCDF file. This particular file is one in which the original 30-sec data have been aggregated (by averaging) to six minutes or one-tenth of a degree (to speed up the execution of the examples). In practice, one would work with the original higher resolution data. Do some setup an open the file, and list its contents.
# set path and filename
ncpath <- "/Users/bartlein/Projects/ESSD/data/nc_files/"
ncname <- "etopo1_ig_06min.nc"
ncfname <- paste(ncpath, ncname, sep="")
dname <- "elev"
## File /Users/bartlein/Projects/ESSD/data/nc_files/etopo1_ig_06min.nc (NC_FORMAT_CLASSIC):
##
## 1 variables (excluding dimension variables):
## short elev[lon,lat]
## long_name: elevation
## units: m
## valid_min: -10360
## valid_max: 6365
## scale_factor: 1
## add_offset: 0
## _FillValue: -32768
## source: z in etopo1_ig.nc -- averaged
##
## 2 dimensions:
## lon Size:3600
## standard_name: longitude
## long_name: longitude
## units: degrees_east
## axis: X
## lat Size:1800
## standard_name: latitude
## long_name: latitude
## units: degrees_north
## axis: Y
##
## 7 global attributes:
## title: 1-Minute Gridded Global Relief Data (ETOPO1)
## data: grid-registered data -- ETOPO1_Ice_g.grd
## institution: http://www.ngdc.noaa.gov/ngdc.html
## source: http://www.ngdc.noaa.gov/mgg/global/global.html
## history: This file created by P.J. Bartlein, 28 Nov 2008
## comment: average over 7x7 1-min cells
## Conventions: CF-1.0
Get the “control” longitudes and latitudes.
## [1] -179.95 -179.85 -179.75 -179.65 -179.55 -179.45
## [1] -89.95 -89.85 -89.75 -89.65 -89.55 -89.45
## [1] 3600 1800
Now read the elevations, and various attributes:
# get elevations
etopo1_array <- ncvar_get(ncin,dname)
dlname <- ncatt_get(ncin,dname,"long_name")
dunits <- ncatt_get(ncin,dname,"units")
fillvalue <- ncatt_get(ncin,dname,"_FillValue")
dsource <- ncatt_get(ncin, "elev", "source")
dim(etopo1_array)
## [1] 3600 1800
# get global attributes
title <- ncatt_get(ncin,0,"title")
institution <- ncatt_get(ncin,0,"institution")
datasource <- ncatt_get(ncin,0,"source")
history <- ncatt_get(ncin,0,"history")
Conventions <- ncatt_get(ncin,0,"Conventions")
Close the netCDF file using the nc_close()
function.
Produce a quick map to check that the data make sense. (Always do this!)
# levelplot of elevations
grid <- expand.grid(lon=lon, lat=lat)
cutpts <- c(-7000, -6000, -4000, -2000, 0, 500, 1000, 1500, 2000, 3000, 4000, 5000)
levelplot(etopo1_array ~ lon * lat, data=grid, at=cutpts, cuts=11, pretty=T,
col.regions=topo.colors(12))
Open and read the .csv file containing the “target”" points.
# read na10km_v2 grid-point locations -- land-points only
csv_path <- "/Users/bartlein/Projects/ESSD/data/csv_files/"
csv_name <- "na10km_v2_pts.csv"
csv_file <- paste(csv_path, csv_name, sep="")
na10km_v2 <- read.csv(csv_file)
str(na10km_v2)
## 'data.frame': 277910 obs. of 5 variables:
## $ x : num 2690000 2700000 2710000 2720000 2730000 2740000 2750000 2760000 2770000 2780000 ...
## $ y : num -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 ...
## $ lon : num -77.3 -77.2 -77.1 -77 -77 ...
## $ lat : num 5.12 5.1 5.08 5.05 5.03 ...
## $ mask: int 1 1 1 1 1 1 1 1 1 1 ...
Get the number of target points:
## [1] 277910
Set up to do the interpolation. Generate x’s and y’s for the target grid. In practice, these might be read in from a netCDF file. Here we need them to define an array that will hold the interpolated values.
# set dimesions of output array, and define "marginal" x- and y-values
nx <- 1078; ny <- 900
x <- seq(-5770000, 5000000, by=10000)
y <- seq(-4510000, 4480000, by=10000)
# define a vector and array that will contiain the interpolated values
interp_var <- rep(NA, ntarg)
interp_mat <- array(NA, dim=c(nx, ny))
Get the array subscripts for each point. This is similar to the work done in reshaping a “short” data frame to an array, as in Week 4
# get array subscripts for each target point
j <- sapply(na10km_v2$x, function(c) which.min(abs(x-c)))
k <- sapply(na10km_v2$y, function(c) which.min(abs(y-c)))
head(cbind(j,k,na10km_v2$lon,na10km_v2$lat))
## j k
## [1,] 847 1 -77.29202 5.124395
## [2,] 848 1 -77.20858 5.099891
## [3,] 849 1 -77.12515 5.075297
## [4,] 850 1 -77.04173 5.050615
## [5,] 851 1 -76.95832 5.025843
## [6,] 852 1 -76.87492 5.000983
## j k
## [277905,] 1073 900 4.212445 47.09974
## [277906,] 1074 900 4.238897 47.00791
## [277907,] 1075 900 4.265386 46.91605
## [277908,] 1076 900 4.291913 46.82415
## [277909,] 1077 900 4.318477 46.73222
## [277910,] 1078 900 4.345078 46.64025
Now do bilinear interpolation using the fields
package:
# bilinear interpolation from fields package
control_dat <- list(x=lon, y=lat, z=etopo1_array)
interp_var <- interp.surface(control_dat, cbind(na10km_v2$lon,na10km_v2$lat))
# put interpolated values into a 2-d array
interp_mat[cbind(j,k)] <- interp_var[1:ntarg]
Get a quick map:
grid <- expand.grid(x=x, y=y)
cutpts <- c(-7000, -6000, -4000, -2000, 0, 500, 1000, 1500, 2000, 3000, 4000, 5000)
levelplot(interp_mat ~ x * y, data=grid, at=cutpts, cuts=11, pretty=T,
col.regions=topo.colors(12))
At this point, interp_mat
could be written out as a variable in a netCDF file (along with dimension and attribute data). Also make a data frame of the interpolated data, which could be written out as a .csv file:
# make a dataframe of the interpolated elevations
out_df <- data.frame(cbind(na10km_v2$x, na10km_v2$y, na10km_v2$lon, na10km_v2$lat, interp_var))
names(out_df) <- c("x", "y", "lon", "lat", "elev")
head(out_df); tail(out_df)
## x y lon lat elev
## 1 2690000 -4510000 -77.29202 5.124395 64.91837
## 2 2700000 -4510000 -77.20858 5.099891 87.74857
## 3 2710000 -4510000 -77.12515 5.075297 69.43258
## 4 2720000 -4510000 -77.04173 5.050615 52.84844
## 5 2730000 -4510000 -76.95832 5.025843 54.49635
## 6 2740000 -4510000 -76.87492 5.000983 54.86496
## x y lon lat elev
## 277905 4950000 4480000 4.212445 47.09974 471.2087
## 277906 4960000 4480000 4.238897 47.00791 383.3458
## 277907 4970000 4480000 4.265386 46.91605 381.3808
## 277908 4980000 4480000 4.291913 46.82415 404.7478
## 277909 4990000 4480000 4.318477 46.73222 345.8679
## 277910 5000000 4480000 4.345078 46.64025 313.1289