NOTE: This page has been revised for Winter 2021, but may undergo further edits.

**Geography 4/595: Geographic Data Analysis**

**Winter 2021**

**Exercise 5: Some data wrangling and matrix algebra**

**Finish by Friday, February 12**

**1. Introduction**

The first aim of this exercise is to illustrate the idea of “data wrangling” or the reshaping or restructuring of input data into the “tidy” form (of a rectangular data set) with variables in columns and observations or cases in rows. The second part of the exercise consists of a few examples that illustrate the features and application of matrix algebra.

**2. Data and packages**

A the concept of data wrangling, or the reshaping of non-rectangular data set into a rectangular one, can be illustrated using a small sample of monthly climate data for Eugene. These data are not currently part of the `geog495.RData`

workspace file (because they may be read in in different ways–as data frames or “tibbles”), but they can be downloaded here:

- EugeneClim-short.csv – Three years (2013-2015) of monthly climate data for Eugene;
- EugeneClim-short-alt-tvars.csv – Temperature data for those three years in an alternative format that is not tidy (i.e. columns are months (or observations) and rows are variables); and
- EugeneClim-short-alt-pvars.csv – Precipitation data, also in an alternative format.

The full data set can be downloaded from here: EugeneClim.csv

(Download these to your current working directory, which can be found using `getwd()`

.)

Also install the `tidyverse`

package, which in turn installs a number of individual packages that are used in reshaping data.

```
# install the "tidyverse" suite of packages
install.packages("tidyverse")
# library
library(tidyverse)
```

Read in a typical “tidy” `.csv`

file, that has variables in columns and observations in rows. This can be done in the usual way using the `read.csv()`

function, which creates a standard data frame, or, if the `reader`

packages has been loaded by `library(tidyverse)`

, using the `read_csv()`

function, which creates a “tibble”. The data here are a “short” three-year-long subset of Eugene monthly climate data.

```
# read a .csv file using the `readr` package
<- "/Users/bartlein/Documents/geog495/data/csv/EugeneClim-short.csv"
csv_file <- read_csv(csv_file)
eugclim eugclim
```

(In the above, you would substitute the path to your working directory.)

Produce a few plots to look at the time series of individual variables, and to look at the annual cycle of each. The first use of `plot()`

below plots the time series of monthly average temperature (`tavg`

), while the second illustrates what the annual cycle looks like.

```
# time-series plot
plot(eugclim$tavg ~ eugclim$yrmn, type="o", pch=16, xaxp=c(2013, 2016, 3))
```

```
# by month
plot(eugclim$mon, eugclim$tavg, pch=16, xaxp=c(1, 12, 1))
```

Repeat the plots for some other variables, in particular `prcp`

(monthly total precipitation).

(See (see https://pjbartlein.github.io/GeogDataAnalysis/lec08.html#variables for a listing of variables)

Q1. Describe the annual cycles of the temperature and moisture-related variables. When during the year is it colder and when is it warmer, and when is it wetter and when is it drier?

**3. Transforming (reshaping) an alternatively shaped table of data**

An alternative layout for the data table (of just the precipitation-related variables) has the data arranged with variables in rows and months in columns. Read those data in:

```
# alternative layout of precipitation data
<- "/Users/bartlein/Documents/geog495/data/csv/EugeneClim-short-alt-pvars.csv"
csv_file <- read_csv(csv_file)
eugclim_alt eugclim_alt
```

Q2 Describe the different form of the two tables (

`eugclim`

and`eugclim_alt`

). Can you think of a way to produce a time-series plot of precipitation using the data in`eugclim_alt`

? (If so, show the code for doing that, and if not, why not?)

Now use the `gather()`

and `spread()`

functions from the `tidyr`

package to reshape the data. This is done here in two steps:

```
# reshape by gathering and spreading
# 1) gather
<- gather(eugclim_alt, `1`:`12`, key="month", value="cases")
eugclim_alt2 $month <- as.integer(eugclim_alt2$month)
eugclim_alt2 eugclim_alt2
```

```
# 2) spread
<- spread(eugclim_alt2, key="param", value=cases)
eugclim_alt3 eugclim_alt3
```

Plot the reshaped data (`eugclim_alt3`

) to verify that they indeed have been reshaped correctly.

```
# plot the reshaped data
$yrmn <- eugclim_alt3$year + (as.integer(eugclim_alt3$month)-1)/12
eugclim_alt3plot(eugclim_alt3$prcp ~ eugclim_alt3$yrmn, type="o", pch=16, col="blue", xaxp=c(2013, 2016, 3))
```

Q3: Compare

`eugclim_alt2`

and`eugclim_alt3`

. What did the application of`gather()`

do in creating`eugclim_alt2`

, and what did the application of`spread()`

do in creating`eugclim_alt3`

?

Q4: What is the benefit of reshaping the data in

Ras opposed to simply doing that in Excel or a text editor?

**4. A little matrix algebra**

Create three matrices, **A**, **B**, and **C**:

```
# create three matrices
# default fill method: byrow = FALSE
<- matrix(c(6, 9, 12, 13, 21, 5), nrow=3, ncol=2)
A A
```

```
# same elements, but byrow = TRUE
<- matrix(c(6, 9, 12, 13, 21, 5), nrow=3, ncol=2, byrow=TRUE)
B B
```

```
# a third matrix
<- matrix(c(1,2,3,4,5,6,7,8,9), nrow=3, ncol=3)
C C
```

Q5 Describe the shapes of the three matrices. (Note that the

`dim()`

function applied to a matrix (e.g.`dim(A)`

) displays the number of rows and the number of columns in the matrix.)

*Matrix addition*

Add **A** and **B**:

```
# matrix addition
<- A + B
F F
```

Now try adding **A** and **C**:

`<- A + C G `

Q6: What happend? Can

AandCbe added? Why not? (Again, the`dim()`

function might be useful.)

*Matrix multiplication*

Matrix multiplication (as distinct from element-by-element multiplication) produces a new matrix whose elements are sums of squares and cross products of the elements of matrices being multiplied (see matrix.pdf). Matrix multiplication uses `%*%`

as the operator.

“Postmultiply” the matrix **C** by **A**:

```
# matrix multiplication
<- C %*% A
Q Q
```

… and try to postmultiply **A** by **B** (e.g. `T <- A %*% B`

)

Q7: What happens here? What are the dimensions of

C? What does the message`non-conformable arguments`

imply about the shapes ofAandB?

*Matrix inversion*

To illustrate matrix inversion (i.e. the matrix algebra version of scalar division), a realistic matrix can be used, in this case the correlation matrix of the temperature variables in the `orstationc`

data set:

```
# a realistic matrix, orstationc temperature-variable correlation matrix
<- cor(cbind(orstationc$tjan, orstationc$tjul, orstationc$tann))
R R
```

Get the inverse of **R**:

```
# matrix inversion
<- solve(R)
Rinv Rinv
```

One property of the inverse matrix is that when pre- or postmultiplied by the original matrix, the identity matrix, **I** should be produced.

Q8: Check to see if

`Rinv`

is indeed the inverse of`R`

. (Show the results of the check.)