Load eBird data files — read.ebd • skimmr

Reads a .txt eBird data file and creates a data frame from it, with cases corresponding to lines (rows) and variables to fields (columns) in the file.

The most commonly used types of eBird data files are the eBird Basic Dataset (EBD; which may contain three subtypes of files) and the My Data download (which contains all data associated with a specific eBird account). The two differ in their download file type, column naming format, available columns, etc.

read.ebd and read.mydata import the EBD and My Data files respectively. Since EBD contains several columns, which may not all be required for a given usecase, cols_sel can be used to import only a subset of the columns. To see the list of all columns names to choose from, run read.ebd(ebd_path, cols_print_only = TRUE).

This function is a wrapper around utils::read.delim(), which is considerably faster than the readr::read_delim() used in auk::read_ebd(). Moreover, unlike the latter which uses snake case for column names, this function uses uppercase with period separators.

Usage

read.ebd(path, cols_sel = "all", cols_print_only = FALSE)

read.mydata(
  path = "MyEBirdData.csv",
  cols_sel = "all",
  cols_print_only = FALSE,
  cols_style_ebd = FALSE
)

Arguments

path: character; the path to the downloaded EBD .txt file
cols_sel: character; vector of column names to be imported from the dataset
cols_print_only: logical; whether or not to only print the full set of column names
cols_style_ebd: logical; if TRUE (default), change column names in My Data to uppercase and separated by period (COLUMN.STYLE), as in read.ebd()

Value

A data frame (cols_print_only == FALSE), or a character vector of column names (cols_print_only == TRUE)

Examples

# to see list of column names before choosing
test1 <- data.frame(SAMPLING.EVENT.IDENTIFIER = "S0000001", COMMON.NAME = "Indian Peafowl")
tf <- tempfile()
write.table(test1, file = tf, col.names = TRUE, row.names = FALSE, sep = "\t",
            quote = FALSE) # quote = TRUE surrounds column names by quotes
read.ebd(tf, cols_print_only = TRUE)
#> [1] "SAMPLING.EVENT.IDENTIFIER" "COMMON.NAME"              

# select columns and import data
read.ebd(tf, cols_sel = c("SAMPLING.EVENT.IDENTIFIER", "COMMON.NAME"))
#>   SAMPLING.EVENT.IDENTIFIER    COMMON.NAME
#> 1                  S0000001 Indian Peafowl