This function is used to split a data set of polygons into two data sets used for training and testing. Generally the splitting is done based on the response variable to achieve similar class distributions between the training and testing set. Additionally, a second grouping variable can be used to achieve stratification based on that group. The function works on categorical and numerical data. For numeric variables the value range will be categorized with cut function and users are required to specify the value breaks.

train_split(
  aoi,
  idcol,
  response,
  group = NULL,
  p,
  seed = round(runif(1, 1L, 1000L)),
  verbose = TRUE,
  ...
)

Arguments

aoi

A sf object with polygons which are to be split into training and test. The object should contain one column which uniquely identfies the polygons, a column with a response variable and potentially a additional grouping variable used for stratification.

idcol

A character vector indicating the name of the column that uniquely identifies the polygons.

response

A character vector indicating the name of the column that contains the response variable which can be either numerical or charachter/factor. If it is a numerical variable it is mandatory to specify the behavior of cut to discretize the value range into categories.

group

A optional character identifying the name of a column used for stratification of the training-test split. The values of that column need to be charachters or factors since grouping by numeric variables currently is not supported.

p

A numeric between 0 and 1 indicating the fraction of the polygons which shall enter the training set.

seed

A numeric value used to ensure reproducibility of the split.

verbose

A logical indicating the level of verbosity.

...

Additional parameters used by cut to discretize the value range of a numeric response variable into categories.

Value

The original sf object amended by a column named "split" which indicates if a polygon belongs to the training or testing data set.

Author

Darius Görgen (MapTailor Geospatial Consulting GbR) info@maptailor.net
Maintainer: MAPME-Initiative contact@mapme-initiative.org
Contact Person: Dr. Johannes Schielein
Copyright: MAPME-Initiative
License: GPL-3