The aim of this course is to review some of the developments in the analysis and visualization of Earth-System Science (ESS) data using the R language and data-analysis environment. ESS data sets are generally large (in terms of both the number of attributes and number of data points), and are therefore frequently used as examples of “big data”. They are often well organized, in the sense of being represented as raster “slices” or “bricks” with dimensions like longitude, latitude and time, but can also instances of traditional “rectangular” data sets, where the rows represent individual locations and the columns variables or attributes.
Concurrently with the development of such large data sets, the tools for analyzing them have proliferated. These tools include special-purpose packages specifically designed for particular data sets (like NCL for analyzing and mapping weather and climate data), individual programming languages and environments like Matlab, Python and R, as well as traditional programming languages like Fortran or C. Among these options, R easily has the best developed set of data-analytical, visualization, and statistical tools (with thousands of individual packages available), and also has the necessary tools for reading and writing ESS data in their various “native” forms.
The goal of this course is to describe the nature of the ESS data and data-set formats, the tools for reading and write such data, and the procedures for visualizing and analysis the data. In addition, the general ideas of “reproducible research” will be discussed and put to use in developing individual projects that explore some particular data sets.
The specific topics that will be examined include:
ncdf4
and raster
)These topics implement and document a particular data-analysis “design pattern” that involves
ncdf4
, rhdf5
, raster
, or an ODBC relational database package)Week Topic Tasks
1: Introduction and infrastructure Install R, RStudio, GitHub account, etc.
2: R for data visualization and analysis Simple data analyses with R
3: Earth-system science data Markdown authoring
4: Data input and output (ncdf4 and raster) Project data-set selection
5: Geospatial analyses and mapping in R
6: R Markdown and a project web site
7: Visualization Project progress report
8: Visualization of high-dimensional data
9: Multivariate analyses
10: Other R packages and project discussion Project presentation and discussion
Setting up an effective and efficient environment for data analysis (i.e. a “tool chain”) can be as much of a time-waster as a time-saver. We will describe and use a basic set of tools, including:
knitr
, rmarkdown
)Student effort in the course will involve
Some examples of analyses and documentation follow:
Here’s an example of an R Markdown HTML page describing the analysis of the Global Charcoal Database (GCD):
and a simpler, one-HTML-document description of a particular analysis comparing two approaches for curve-fitting can be found at:
Here’s a link to a web page describing the development of a daily fire-start data set for the U.S. and Canada:
This is a link to a “supplemental file” accompanying an article on biomass-burning contribution to climate-carbon cycle feedback, created as a Word document (and converted to a .pdf):
… and here’s a link to the article:
Also, this web page, as well as that for GEOG 4/595 Geographic Data Analysis provide examples of web pages created using R, RStudio and RMarkdown:
As to the two elements of participating in collaborative research, the first bullet (asking questions) is the more important of the two.