fortedata: Data Collection, Preparation, and Management

Overview

At the core of FoRTE is the collection of heterogeneous data, from many different instruments, requiring multiple different approaches, to measure different environmental variables. Below we outline the collection and preparation procedures for all data products in fortedata.

For each data product listed includes a Data Preparation section that includes:

the location of raw data in Google Drive
the format and structure of the raw data
any pre-processing of data
an outline of the process needed to move from raw data to package-ready data

Canopy Structural Traits/Terrestrial Lidar

Data Preparation

CST data are located in the /data/lidar on Google Drive. The raw data from the PCL instrument is in the form of a two column .csv file containing raw distance from instrument values (i.e. lidar pulse returns) and intensity values for each measured transect. These data are processed externally using forestr in R, with the output for each transect containing several .csv files: 1) a two row .csv containing a head of CST metrics and a second row containing values for those metrics; 2) an adjusted leaf area/vegetation area hit grid matrix; 3) a three column, x, z, VAI file; 4) summary matrix file containing x, z, and column specific values (NEED TO ADJUST); and 5) a hit grid plot of VAI.

The first of these files, the two row CST metrics file(s) for each transect are collated into one file where each row represents each transect–the source canopy_structural_traits.csv file in /inst/extdata/ from which fd_canopy_structure draws.

#set random seed
cst <- read_csv_file("canopy_structural_traits.csv")

# show the top of 
str(cst)
## tibble[,53] [195 x 53] (S3: tbl_df/tbl/data.frame)
##  $ subplot_id        : chr [1:195] "C01E" "C01E" "C01W" "C01W" ...
##  $ year              : int [1:195] 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
##  $ mean.height.mean  : num [1:195] 9.75 10.3 9.57 8.05 9.61 ...
##  $ height.2          : num [1:195] 4.87 4.37 4.31 4.3 5.97 ...
##  $ mean.height.median: num [1:195] 8.89 9.52 10.32 7.6 7.7 ...
##  $ mean.height.var   : num [1:195] 23.7 19.1 18.6 18.5 35.6 ...
##  $ mean.height.rms   : num [1:195] 10.9 11.2 10.5 9.1 11.3 ...
##  $ transect.length   : int [1:195] 40 40 40 40 40 40 40 20 40 10 ...
##  $ can.max.ht        : num [1:195] 22.6 23.3 21.5 21.1 25.3 ...
##  $ moch              : num [1:195] 14.7 15.2 13.8 13.1 16.7 ...
##  $ can.max.ht.median : num [1:195] 16 17 14.9 13.9 20.1 ...
##  $ vai.mean          : num [1:195] 5.98 6.17 6.27 5.68 5.93 ...
##  $ vai.sd            : num [1:195] 2.81 2.61 2.45 2.69 2.31 ...
##  $ vai.median        : num [1:195] 7.72 7.71 7.82 7.2 6.56 ...
##  $ vai.column.max    : int [1:195] 8 8 8 8 8 8 8 8 8 8 ...
##  $ vai.max.ht.mean   : num [1:195] 10.18 9.93 9.3 7.72 9.82 ...
##  $ vai.max.ht.sd     : num [1:195] 6.43 5.67 5.5 5.64 7.83 ...
##  $ vai.max.ht.median : num [1:195] 11 10 8 5.5 8 9 10 9 4 4 ...
##  $ vai.max           : num [1:195] 7 7.43 8 6.6 7 ...
##  $ vai.mean.peak     : num [1:195] 2.95 3.13 3.6 3.19 3.03 ...
##  $ vai.peak.sd       : num [1:195] 1.85 1.92 2.11 1.76 1.62 ...
##  $ vai.peak.median   : num [1:195] 2.86 3.21 3.44 3.42 2.96 ...
##  $ deep.gaps         : int [1:195] 2 0 1 0 0 0 0 0 0 0 ...
##  $ deep.gap.fraction : num [1:195] 0.05 0 0.025 0 0 0 0 0 0 0 ...
##  $ porosity          : num [1:195] 0.696 0.669 0.733 0.714 0.716 ...
##  $ std.std           : num [1:195] 264 115 205 260 793 ...
##  $ mean.std          : num [1:195] 11.89 8.29 10.73 9.95 18.44 ...
##  $ rugosity          : num [1:195] 11.06 6.82 9.46 12.69 21.29 ...
##  $ top.rugosity      : num [1:195] 5.79 6.07 5.46 4.92 7.46 ...
##  $ mean.return.ht    : num [1:195] 7.1 7.98 7.53 5.54 7.18 ...
##  $ sd.return.ht      : num [1:195] 4.1 4.55 4.9 4.46 6.05 ...
##  $ median.ht         : num [1:195] 6.67 6.84 6.2 3.28 5.53 ...
##  $ sky.fraction      : num [1:195] 18.72 11.9 8.75 14.02 9.85 ...
##  $ cover.fraction    : num [1:195] 81.3 88.1 91.3 86 90.1 ...
##  $ max.ht            : num [1:195] 22.6 23.3 21.5 21.5 26.7 ...
##  $ scan.density      : num [1:195] 2962 2638 2324 2992 2244 ...
##  $ rumple            : num [1:195] 6.2 5 5.2 4.33 5.15 ...
##  $ clumping.index    : num [1:195] 0.93 0.929 0.944 0.875 0.9 ...
##  $ enl               : num [1:195] 18.6 19.2 18 15 18.3 ...
##  $ fhd               : num [1:195] 2.83 2.81 2.85 2.5 2.71 ...
##  $ gini              : num [1:195] 1.82 1.62 1.68 1.38 1.53 ...
##  $ mean.intensity    : num [1:195] 55 57 56.3 50 51.1 ...
##  $ median.intensity  : int [1:195] 57 59 58 49 52 54 54 57 47 45 ...
##  $ sd.intensity      : num [1:195] 14 13.7 13.3 12.4 12.9 ...
##  $ max.intensity     : num [1:195] 100 108 104 94 94 ...
##  $ min.intensity     : num [1:195] 6 6 0.612 6 6 6 6 6 3 2 ...
##  $ skew.intensity    : num [1:195] -0.22053 -0.50636 -0.3968 0.00147 -0.3191 ...
##  $ kurtosis.intensity: num [1:195] 2.93 3.41 3.52 3.14 2.98 ...
##  $ p10               : num [1:195] 2.06 2.91 1.96 1.76 2.23 ...
##  $ p25               : num [1:195] 3.33 3.94 3.11 2.36 2.88 ...
##  $ p50               : num [1:195] 6.67 6.84 6.2 3.28 5.53 ...
##  $ p75               : num [1:195] 10.3 11.21 11.46 9.59 8.06 ...
##  $ p90               : num [1:195] 12.5 14.2 14.8 12.2 19.9 ...

#General Data Format and Preparation

This section summarizes general formatting and preparation guidelines for all data shared and uploaded to fortedata.

Naming files and folders

Names should be lowercase with "_" separating words.
File Type: CSV (Comma delimited)
Filenames: “data_type_year.csv” (i.e. “canopy_dendroband_2019.csv”)
Folder Names: “data_type” (i.e. “canopy_dendroband”)
README.txt file included in every folder (see README.txt file structure)
Each year should be 1 data file (but does this make sense for instrument data?)

General File Structure and Format

Each row is one observation
Column names should:

Be lowercase with "_" separating words
Include units if quantitative (i.e. “DBH_cm”; Flux units excluded)
Be added to forte_table_metadata.csv (if not already listed)

Required columns:

subplot_id – replicate (“A-”) plot (“-01-”) subplot (“-E”) (i.e “A01E”); SHOULD NOT include separate replicate, plot, and subplot columns
date – Date of data collection (YYYY-MM-DD)

# call forte_table_metadata
fd_metadata()
## # A tibble: 240 x 5
##    table     field     description                                  class  units
##    <chr>     <chr>     <chr>                                        <chr>  <chr>
##  1 fd_inven~ subplot_~ <NA>                                         chara~ <NA> 
##  2 fd_inven~ tag       Tree tag ID number                           integ~ <NA> 
##  3 fd_inven~ species   Species code from the USDA Plants Database   chara~ <NA> 
##  4 fd_inven~ dbh_cm    Bole diameter at 1.37 m                      numer~ cm   
##  5 fd_inven~ health_s~ Live (L), moribund (M), or dead (D)          chara~ <NA> 
##  6 fd_inven~ canopy_s~ Overstory dominant (OD), overstory submissi~ chara~ <NA> 
##  7 fd_inven~ date      Date of measurement                          date   <NA> 
##  8 fd_inven~ notes     <NA>                                         chara~ <NA> 
##  9 fd_inven~ replicate (from plots table)                           chara~ <NA> 
## 10 fd_inven~ plot      (from plots table)                           integ~ <NA> 
## # ... with 230 more rows

FoRTE Team

2021-05-16

Overview

Canopy Structural Traits/Terrestrial Lidar

Data Preparation

Naming files and folders

General File Structure and Format