NOTE:  This page has been revised for the 2024 version of the course, but there may be some additional edits.  

1 Course objectives

The aim of this course is to review some of the developments in the visualization and analysis of Earth-System Science (ESS) data using the R language and data-analysis environment. ESS data sets are generally large (in terms of both the number of attributes or variables and number of data points), and are therefore frequently used as examples of “big data”. They are often well organized, in the sense of being represented as raster “slices” or “bricks” with dimensions like longitude, latitude and time, but can also be instances of traditional “rectangular” data sets, where the rows represent individual locations and the columns variables or attributes.

Concurrently with the development of such large data sets, the tools for analyzing them have proliferated. These tools include special-purpose packages specifically designed for visualizing particular data sets (like NCL (the NCAR Command Language)for analyzing and mapping weather and climate data), individual programming languages and environments like Matlab, Python and R, as well as traditional programming languages like Fortran or C. Among these options, R easily has the best developed set of data-analytical, visualization, and statistical tools (with thousands of individual packages available), and also has the necessary tools for reading and writing ESS data in their various “native” forms. R also has extensive geospatial analysis “built in”.

The goal of this course is to describe the nature of the ESS data and data-set formats, the tools for reading and writing such data, and the procedures for visualizing and analyzing the data. In addition, the general ideas of “reproducible research” will be discussed and put to use in developing individual projects that explore some particular data sets.

2 Topics covered

The specific topics that will be examined include:

These topics implement and document a particular data-analysis “design pattern” that involves

  1. data input (using, for example, ncdf4, rhdf5, raster, terra, or an ODBC relational database package)
  2. recasting the raster brick input data into a rectangular data frame
  3. analysis and visualization
  4. recasting a “results” data frame back to a raster
  5. data output, using the same packages as in step 1

3 Schedule

Topic/Lecture                                      Tasks
   1:  Introduction and infrastructure               Install R, RStudio 
   2:  Using R for data visualization and analysis   Using R exercise
   3:  Earth-system science data                                     
   4:  netCDF in R                                   Install netCDF and Panoply      
   5:  terra and netCDF in R                         File transfer  
   6:  Plots (1)                                     Text editor  
   7:  Plots (2)                                     Project dataset selection
   8:  Maps (1)                                      Markdown and Markdown
   9:  Maps (2)                                      Pandoc  
  10:  Geospatial analysis in R                      GitHub and UO pages.uoregon.edu  
  11:  Correlation and regression                    Local web pages
  12:  Other predictive models  
  13:  Principal components analysis  
  14:  Singular value decomposition  
  15:  High-resolution and high-dimension data  
  16:  Multivariate methods  
  17:  Time-series analysis  
  18:  Other languages  
  19:  Project presentations  
  20:  Project presentations  
Other Topics  
       sf, terra, and stars
       HDF in R

Setting up an effective and efficient environment for data analysis (i.e. a “tool chain”) can be as much of a time-waster as a time-saver. We will describe and use a basic set of tools, including:

4 Project and Tasks

Student effort in the course will involve:

Completion of the tasks should not be a major effort, and aren’t really gradable (except for doneness). The project will require more effort, ranging from a project that involves getting data, doing some basic visualizations, and “publishing” the results (text and images) on a simple web page, to something more elaborate, involving some advanced statistical analyses, and a more involved publication such as the course web page, or some other RMarkdown product.

It will be up to individual students how far or how deeply they want to go. A general principle that’s worth following here is that a simpler story told well is better than a complicated story told in a half-assed fashion. Because the end product will be publicly available, it can readily contribute to a portfolio, and experience shows that including a URL to a nice-looking product on a resume or job application letter pays off.

Collaboration would be fine, with the development of an agree-upon “Author Responsibility” document ahead of time.

To summarize, the project will involve:

5 Grading

As will become apparent (and see the examples below), there are various “levels” of Projects that can be attempted, ranging from a simple Markdown “Notebook” hosted on pages.uoregon.edu to a multi-page Markdown web site, with an accompanying GitHub code repository and web site, and a complete and tidy project at any level should be the goal to get “full marks” (e.g. an A- or B-level grade for the course). However, there is a difference in the minimum effort between the course levels:

GEOG 490: Completion of the Tasks in a timely fashion, and a Project consisting of an RMarkdown Notebook or web page hosted on pages.uoregon.edu.

GEOG 590: Completion of the Tasks in a timely fashion, and a Project consisting of an Markdown web page or web site, accompanied by a GitHub repository, hosted either on pages.uoregon.edu or GitHub.

6 Examples

Some examples of analyses and documentation follow:

Here’s an example of two simple RMarkdown products:

Here’s an example of an R Markdown HTML page describing the analysis of the Global Charcoal Database (GCD):

and a simpler, one-HTML-document description of a particular analysis comparing two approaches for curve-fitting can be found at:

Here’s a link to a web page describing the development of a daily fire-start data set for the U.S. and Canada:

This is a link to a “supplemental file” accompanying an article on biomass-burning contribution to climate-carbon cycle feedback, created as a Word document (and converted to a .pdf):

… and here’s a link to the article:

Here are some links to a dissertation chapter published by a former student in this course, Adriana Uscanga, that illustrates the application of R for analyzing a geospatial data set, and shows another typical example of a publication “package”, including:

Publication packages like this, that combine a traditional paper with code and data are now the norm in scientific publication, because they allow a reader to reproduce the results presented in the paper. This encourages collaboration, accelerates the pace of research, and contributes to overall “quality assurance”.

Also, this web page, as well as that for GEOG 4/595 Geographic Data Analysis provide examples of web pages created using R, RStudio and RMarkdown:

7 Getting help

One thing that will become immediately apparent is that R produces cryptic error messages. Because it’s hardly ever the case that you’ll be typing new code into an empty document, many errors arise from simple editing mistakes or typos. Simply Googling the error message will quickly resolve an issue, and often will take you to one of the following links:

ChatGPT is another option for getting some ideas on error messages, and for getting simple code fragments, but it can easily get off track (Garbage In -> Garbage Out).

8 Other stuff

Text:. No textbook; .pdfs and URLs will be posted on the course web page. See the Other > Readings tab .

Web pages:

Grading: See above.

Incompletes: There is a new, less flexible policy for Incompletes: [https://provost.uoregon.edu/grades-incompletes-policy]. Do not use the possibility of an “automatic” incomplete as a time-management tool at the end of the term.

Attendance (Student: You really should attend class. The last half hour of class each day (3:20 - 3:50pm) will be devoted to debugging and other non-lecture-type activities.

Attendance (Instructor): UO has announced a pilot “Floating Holiday” for employees (not students, Ah-ha-ha–sorry) [https://hr.uoregon.edu/floating-holiday-pilot-2023-24]. I would generally claim the day of Perihelion for its climatic and Earth-system science significance, but Perihelion occurred before classes started this year (Jan 2nd). Maybe Imbolc or St. Brigid’s day–I’ll let you know).

Communicating with instructors: Pat Bartlein, , OH at 2-3pm Wednesdays (via Zoom), or by email. GE or by email.

Technical Requirements: You will need access to the internet, to UO’s Canvas site, and to a browser at a minimum, and ideally a personal computer you have administrative rights on. If you have questions about accessing and using Canvas, visit the Canvas Support for Students page. Canvas and Technology Support also is available by phone or live chat, Monday-Sunday 6 a.m. to 12 a.m. 541-346-4357 livehelp.uoregon.edu. If you face Internet access challenges, computer labs are open for students on campus. To learn more about options visit Information Services’ Internet Access Resources.

Other topics, including classroom behavior: Computers are welcome, for note taking, viewing course-related material, and trying out code. Using a computer, tablet, or phone during class for anything else would be unprofessional.

Standard University Policies and Sources for Help:

The support provided by the following may be useful:

**Covid Containment Plan*: [https://provost.uoregon.edu/covid-containment-plan-classes]