Lab 4: Time Series Analysis in R

Introduction

In Lab 3, you learned how to use GIS data to extract values from remote sensing images in R. You also learned how to use functions to complete repetitive tasks more efficiently. In this lab, you will apply these skills to perform a time series analysis of remote sensing images. Time series analysis examines how the value of a remote sensing product changes over time at a given location. It is widely used to monitor ecological disturbances, such as logging, wildfires, and drought. Time series analysis can reveal the magnitude of disturbance and how long it takes for an ecosystem to return to its pre-disturbance state.

Your Assignment:

  1. Open RStudio.
  2. As you read through this document, type each of the examples into the console of your RStudio window. Press enter to execute each line of code.
  3. Your console should produce the same outputs as the examples shown here. If it does not produce the same output, make sure that you typed the line of code correctly.
  4. Complete the tasks on your answer sheet when directed to do so.

Review

As you work through your lab assignments, it is important that you stop and think about each example and examine the syntax closely. By examining the syntax and typing the examples into the console, you are practicing your programming skills. This practice will benefit you when you are working on your creative project and you don’t have step-by-step instructions to guide you. Don’t hesitate to refer to previous labs if you forget how to complete a task or use a function. Function help pages are also valuable sources of information. Before we start working on a time series analysis, let’s take a few minutes to practice the skills that you have learned so far.

Vectorized Functions

First, I want to show you a trick that will help you complete Task #1. Sometimes you need to calculate several different values using the same function. For example, you might need to calculate the square roots of 4, 16, and 64. You could call the sqrt() function three different times with three different input values.

## [1] 2
## [1] 4
## [1] 8

But there’s a more efficient way to calculate these values. Many functions will accept a vector of input values and repeat the calculation for each value in the vector. The function will return a vector of output values, where the first value in the output vector corresponds with the first value in the input vector, and so on. These functions are called vectorized functions. They enable you to perform multiple calculations with one line of code. Take a look at the example below.

## [1] 2 4 8

Not all functions are vectorizable. Often, a function can be vectorized if it accepts a single input value and returns a single output value. When you can use them, vectorized functions will keep your code tidy and may speed up your analysis.

Complete task #1 on your answer sheet now.

Preparing Your Workspace

Any time you open RStudio, you need to prepare your workspace before you begin your analysis. This includes setting your working directory and loading any necessary libraries. If you’re using a Windows operating system, make sure that the slashes in the file path are facing the right direction. We’ll be using the raster library again in this assignment.

After you set your working directory and load any necessary libraries, it is a good idea to load all of the data that you will need for your analysis. In the assignment, we’ll be using five different Landsat scenes. The images are very similar to the image that you used in Labs 2 and 3, but they were collected on different dates.

Whenever you recieve a remote sensing data set, it is crucial to understand what pre-processing steps have already been applied to the data. The images that you will be using were collected by the Landsat Thematic Mapper sensor on the following dates:

  • June 21, 2007
  • July 25, 2008
  • July 12, 2009
  • July 15, 2010
  • July 18, 2011

Each image has six bands with a 30 meter spatial resolution, ordered from shortest to longest wavelength (blue to SWIR). The Thematic Mapper thermal band (sensor band 6) is not included in the files. The images were orthorectified and corrected to surface reflectance by the U.S. Geological Survey. The images have a scale factor of 10,000, so a value of 10,000 indicates 100% reflectance. I masked out all erroneous pixel values that might have been caused by reflectance retrieval errors or sensor saturation. As a result, you will notice that some pixels located over the ocean do not contain reflectance data. In order to reduce the file size, I cropped the images using a shapefile from the U.S. Census Bureau. The images cover the same extent as the image that you used in Labs 2 and 3. UCSB is located near the center of the images.

Let’s go ahead and load the images as RasterBricks. Let’s also rescale the images to a scale factor of 100 and give the images more convenient band names.

Let’s plot the 2007 Landsat scene to make sure that it loaded correctly.

That looks good. Let’s also load the two shapefiles as SpatialPolygonsDataFrames.

Your computer should have enough random access memory (RAM) to load all of these files at once. If you ever work with large image files, or if you work on a computer with limited memory, you might need to process your images one at a time so your computer doesn’t run out of memory.

You can see how much memory an object is using in the environment pane. In the top right of the environment pane, click on “List” and change it to “Grid”. The size column indicates the total memory used by each object. If you want to delete an object from memory, you can use the rm() function. This will free up the memory used by the object.

Complete task #2 on your answer sheet now.

Raster and Vector Attributes

R objects have attributes that affect how they behave when you analyze them. You already have experience accessing and editing the attributes of R objects. In Lab 3, you edited the column names of your reflectance data frame. Column names are an attribute of a data frame. Different types of objects have different types of attributes depending on how the object is used and what type of data is stores.

RasterLayers, RasterBricks, and RasterStacks are collectively referred to as Raster* objects. Raster* objects are the most commonly used raster data format in R. Spatial* objects are the most commonly used vector data format. In the previous example, you loaded the shapefiles as SpatialPolygonsDataFrames. There are also several other types of Spatial* objects, such as SpatialPoints. Together, Raster* and Spatial* objects are the primary classes of geospatial data in R.

Raster* objects and Spatial* objects have two attributes that georeference the data sets. The coordinate reference system (CRS) attribute defines an object’s map projection and coordinate system. The extent attribute defines an object’s geographic location on Earth’s surface. The extent coordinates are reported in terms of the coordinate system.

A quick way to summarize the attributes of a Raster* or Spatial* object is to type its name into the R console. Let’s examine the attributes of the 2007 Landsat scene.

## class      : RasterBrick 
## dimensions : 773, 2007, 1551411, 6  (nrow, ncol, ncell, nlayers)
## resolution : 30, 30  (x, y)
## extent     : 203175, 263385, 3802755, 3825945  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : memory
## names      :  blue, green,   red,   NIR, SWIR1, SWIR2 
## min values :  1.38,  1.44,  0.78,  0.60,  0.15,  0.20 
## max values : 32.87, 42.26, 46.83, 61.98, 54.97, 61.12

The CRS field provides several details about the map projection and coordinate system. The extent field indicates the geographic extent of the image reported in terms of the coordinate system. Georeferenced Spatial* objects have the same attribute fields. Let’s examine the metadata of the Gap polygon.

## class       : SpatialPolygonsDataFrame 
## features    : 1 
## extent      : 6499.944, 18266.5, -395404, -389380.4  (xmin, xmax, ymin, ymax)
## crs         : +proj=aea +lat_1=34 +lat_2=40.5 +lat_0=0 +lon_0=-120 +x_0=0 +y_0=-4000000 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0 
## variables   : 18
## names       : OBJECTID, YEAR_, STATE, AGENCY, UNIT_ID, FIRE_NAME,  INC_NUM, ALARM_DATE,  CONT_DATE, CAUSE, COMMENTS, REPORT_AC, GIS_ACRES, C_METHOD, OBJECTIVE, ... 
## value       :    16324,  2008,    CA,    USF,     LPF,       GAP, 00000020, 2008/07/01, 2008/07/17,     9,       NA,      9443,   9544.73,        2,         1, ...

You can also extract individual attributes from an object. This enables you to edit the attribute and copy an attribute from one object to another. The crs() function extracts the coordinate reference system from a georeferenced object.

## CRS arguments:
##  +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84
## +towgs84=0,0,0

The crs() function returns a string of text in what is known as the PROJ format. PROJ is an open source framework for definining coordinate reference systems that is used by several software packages.

The PROJ string defines the basic attributes of an object’s coordinate reference system. In this case, the PROJ string defines the file’s map projection (Universal Transverse Mercator Zone 11), datum (World Geodetic System 1984), units (meters), and ellipsoid (World Geodetic System 1984).

One of the most common coordinate reference systems is unprojected latitude and longitude data. GPS receivers generally collect data in this format. The PROJ string for unprojected latitude and longitude data looks like this:

The EPSG Geodetic Parameter Dataset is another centralized database of coordinate reference systems. EPSG stands for the European Petroleum Survey Group, which is the organization that originally created the database. The EPSG database gives each coordinate reference system a unique identification number known as an EPSG code. QGIS uses EPSG codes. R also accepts EPSG codes, but PROJ strings are easier to use and interpret.

The extent() function returns the spatial extent of a georeferenced object. The extent attribute contains four values, which indicate the minimum and maximum X and Y coordinates. Let’s examine the extent of the 2007 Landsat scene.

## class      : Extent 
## xmin       : 203175 
## xmax       : 263385 
## ymin       : 3802755 
## ymax       : 3825945

In order to extract image values using a polygon, your raster and vector data need to have the same coordinate reference system. Let’s examine the PROJ strings for the 2007 Landsat scene and the Gap polygon to determine if they have the same coordinate reference system.

## CRS arguments:
##  +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84
## +towgs84=0,0,0
## CRS arguments:
##  +proj=aea +lat_1=34 +lat_2=40.5 +lat_0=0 +lon_0=-120 +x_0=0
## +y_0=-4000000 +datum=NAD83 +units=m +no_defs +ellps=GRS80
## +towgs84=0,0,0

Oh no! The image and the polygon do not have the same coordinate reference system. In order to extract image values using the polygon, you need to reproject one of the files.

The projectRaster() function can be used to reproject Raster* objects. The projectRaster() function is part of the raster library, so it will not be available unless you install and load the raster library. The spTransform() function can be used to reproject Spatial* objects. The spTransform() function is part of the sp library. Fortunately, functions from the raster library rely on functions from the sp library, so the sp library is automatically loaded whenever you load the raster library. This is known as a dependency because one library depends on another library to function properly.

The coordinate reference system for the Landsat image is UTM Zone 11N. This map projection is widely used for data near Santa Barbara, so let’s reproject the polygon. Examine the help page for the spTransform() function to see if you can figure out what arguments it requires.

The spTransform() function only requires two arguments: the Spatial* object to be reprojected and a CRS object that defines the new coordinate system. The CRS object is the value created by the crs() function, which is described above. Fortunately, you don’t have to create the CRS object from scratch. You can simply extract it from the RasterBrick that contains the Landsat image. Let’s go ahead and reproject the Gap polygon using the CRS from the RasterBrick.

The map projection for the new SpatialPolygonsDataFrame should match the map projection for the Landsat scene.

## CRS arguments:
##  +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84
## +towgs84=0,0,0
## CRS arguments:
##  +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84
## +towgs84=0,0,0

That looks good. Now that the raster and vector data have the same map projection, you should be able to plot them together. The plot() function can also be used to plot Spatial* objects. The “col” argument sets the fill color and the “border” argument sets the outline color. The “lwd” argument sets the outline width. If you want to plot multiple objects in the same plot, use the argument “add = TRUE” for any objects after the first object.

Let’s plot the 2007 Landsat scene as a true color image and then plot the reprojected Gap polygon on top of it. Setting “col” to NA will create a hollow polygon with no fill.

Masking and Cropping

In Lab 3, you used the extract() function to extract raster values from different regions of interest. You examined the values and then hypothesized about the land cover at each site. Another way to complete this task is to create an entirely new RasterBrick that only contains the reflectance values for the region of interest. You can use a polygon to mask an image, which removes all of the raster values outside the region of interest.

The raster library contains a mask() function to achieve this. Examine the help page for the mask() function to see if you can figure out what arguments it requires.

The mask() function requires two arguments: a Raster* object and a Spatial* object to define the mask boundaries. There are several other optional arguments, but let’s keep the default settings for now. Any pixels outside of the mask will be changed to NA values. The new object will also be a RasterBrick.

Let’s mask the 2007 Landsat image using the reprojected Gap polygon. Masking can be slow if you are using large data sets. The process is completed when the > symbol reappears in the R console.

Let’s plot the new RasterBrick as a false color composite to see how it looks.

The result looks fine, but there is a lot of empty white space around the Gap polygon. The white space contains pixels with NA values. It appears that the extent of the image did not change. Let’s examine the extent objects for the two images to test that hypothesis.

## class      : Extent 
## xmin       : 203175 
## xmax       : 263385 
## ymin       : 3802755 
## ymax       : 3825945
## class      : Extent 
## xmin       : 203175 
## xmax       : 263385 
## ymin       : 3802755 
## ymax       : 3825945

Our hypothesis was correct – the image was masked using the Gap polygon but the extent did not change. We don’t need all of those NA values around the edge of the image. Let’s change the extent of the masked image so it matches the extent of the Gap polygon. You can do this using the crop() function. Examine the help page for the crop() function to see what arguments it requires.

The crop() function requires two arguments: a Raster* object and an extent object. Extent objects are the values created by the extent() function, which we just used. You can extract the extent object from the Gap polygon instead of creating one from scratch.

Let’s crop the masked image to remove most of the NA values around the edge of the image. Then, let’s plot the new RasterBrick as a false color composite to see how it looks.

That looks good. The new image only contains the reflectance values for your study site, and there are far fewer NA values around the edge of the image.

Now that you have created the image you need, you can delete the earlier version of the image from memory. This will free up the memory used by that object. Deleting unneeded objects helps your computer run smoothly and ensures that it does not run out of memory.

You can use the getValues() function to create a matrix with all of the values from your new image. This will convert the image data to tabular format (i.e., a spreadsheet), which is compatible with many R functions.

You can use the View() function to view the new matrix. There are still some NA values around the edge of the image, so you might need to scroll down a bit to view the valid reflectance values.

Those values seem reasonable.

Ultimately, the extract() function and the mask() function achieve similar things. They subset a remote sensing data set using a Spatial* object, so you can analyze the pixel values for your study site(s). Which method you choose will depend on your analysis and personal preference. If your original image is much larger than your study sites, the mask() function can drastically reduce the file size and speed up processing. If you prefer the simplicty of the extract() function instead of creating new RasterBricks, you may opt for that method instead.

Complete task #3 on your answer sheet now.

Time Series Analysis

Hopefully you have figured out that the two shapefiles represent fire scars. The Gap Fire burned 9,443 acres north of Goleta in July 2008. The Jesusita Fire burned 8,733 acres north of Santa Barbara in May 2009. By analyzing how the reflectance values of the fire scars change over time, you can quantify the severity of the fires and monitor vegetation regrowth.

Control Site

When you are examining an ecological disturbance, it can be helpful to have a control site that was not affected by the disturbance. This provides a baseline that enables you to assess ecosystem recovery. Let’s create a third study site that is not located within the fire scars, but has a similar landscape position.

Complete task #4 on your answer sheet now.

Normalized Difference Vegetation Index

The Normalized Difference Vegetation Index (NDVI) is sensitive to vegetation cover and vegetation health, so it is useful for monitoring post-fire vegetation regrowth. The NDVI function that we created in Lab 3 will be helpful here.

Let’s examine how the mean NDVI of the Gap Fire scar changed over time from 2007 to 2011. The workflow involves four steps:

  1. Extract the reflectance values for the fire scar
  2. Calculate the mean red reflectance for all pixels in the fire scar
  3. Calculate the mean NIR reflectance for all pixels in the fire scar
  4. Calculate the mean NDVI using the mean red reflectance and the mean NIR reflectance

We’ll need to replicate that process for each image. This is a great time to use a function. You have a defined workflow that you need to repeat multiple times. Creating a function to execute the workflow will reduce some of the repitition. It will also decrease the likelihood of making errors.

Let’s create the function now. Examine the syntax closely and make sure that you understand how the function works. You will probably need to create a function like this when you are working on your creative project.

Notice that the new function calls the NDVI function that we created above. It is perfectly okay for one function to call another function in its underlying code. I also added the optional argument na.rm = TRUE to the mean() function. NA values will cause the mean() function to fail, so the na.rm argument removes any NA values before it performs the mean calculation.

Let’s test the new function to make sure it works.

## [1] 0.5316895

The range of valid NDVI values is -1 to 1, so that value is reasonable. It suggests that the study site is covered by moderately healthy vegetation.

We need to replicate that analysis for each Landsat image in order to determine how the mean NDVI value changes over time. That’s a relatively easy task now that we have created a function to do the hard work.

Let’s compile our data into a new data frame. First, we’ll create a vector with the years and a vector with the NDVI values. Then, we’ll use the vectors to build a data frame.

Let’s examine the data frame to make sure that it looks okay.

Finally, let’s plot the results. If you have a function with many arguments, you can put each argument on a separate line. That helps keep your code tidy and easy to read.

Beautiful! You have conducted a robust scientific analysis of vegetation regrowth after the Gap Fire. Your data clearly demonstrates that it took approximately three years for the Gap Fire scar to recover to pre-fire conditions. Time series analysis is becoming increasingly common in remote sensing research, especially as it becomes easier to access and analyze large amounts of remote sensing data. Performing this type of analysis on a different set of study sites would make an excellent creative project.

Complete task #5 on your answer sheet now.

\(~\)