geometr package generalises the way to interact with spatial and other geometric objects by providing functions that access and modify data components in the same manner across classes.
geometr provides a data structure (of class
geom) that represents the different data components in a truly tidy manner, allowing to generate geometric objects that are easily accessible and play well with other tidy tools.
One could argue that spatial objects are merely a special case of geometric objects, where the coordinates of points refer to real locations on the surface of the earth instead of some virtual (cartesian) coordinate system.
Geometric and spatial objects typically contain a collection of points that outline a geometric shape, or feature.
A feature in
geometr is defined as a set of points that form no more than one single unit of one of the types point, line, polygon or grid.
In contrast to the simple features standard, there are no multi-* features in
geometr, sets of features that belong together beyond their geometric connectedness are instead assigned a common group.
geom is primarily made up of three tables that contain information on points (their coordinates), features and groups.
The tables are related with feature and group IDs (
gid respectively) and can be provided with additional attributes (more on this in the chapter "Attributes of a
This vignette outlines in detail first how
geometr improves interoperability, then it describes the data-structure of a
geom, how different feature types are cast into one another and shows how to visualise geometric objects with
Interoperable software is designed to easily exchange information with other software, which can be achieved by providing the output of functionally similar operations in a common arrangement or format, standardising access to the data. This principle is not only true for software written in different programming languages, but can also apply to several packages within the R ecosystem. R is an open source environment which means that no single package or class will ever be the sole source of a particular data structure and this is also the case for spatial and other geometric data.
Interoperable data is data that has a common arrangement and that uses the same terminology, resulting ideally in semantic interoperability. As an example, we can think of the extent of a geometric object. An extent reports the minimum and maximum value of all dimensions an object resides in. There are, however, several ways in which even this simple information can be reported, for example as vector or as table and with or without names. Moreover, distinct workflows provide data so that the same information is not at the same location or with the same name in all structures, e.g., the minimum value of the x dimension is not always the first information and is not always called ‘xmin’.
The following code chunk exemplifies this by showing various functions, which are all considered standard in R to date, that derive an extent from specific spatial objects:
nc_sf <- st_read(system.file("shape/nc.shp", package="sf")) #> Reading layer `nc' from data source `/home/se87kuhe/R/x86_64-pc-linux-gnu-library/3.6/sf/shape/nc.shp' using driver `ESRI Shapefile' #> Simple feature collection with 100 features and 14 fields #> geometry type: MULTIPOLYGON #> dimension: XY #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965 #> CRS: 4267 st_bbox(nc_sf) #> xmin ymin xmax ymax #> -84.32385 33.88199 -75.45698 36.58965 nc_sp <- as_Spatial(nc_sf) bbox(nc_sp) #> min max #> x -84.32385 -75.45698 #> y 33.88199 36.58965 ras <- raster(system.file("external/test.grd", package="raster")) extent(ras) #> class : Extent #> xmin : 178400 #> xmax : 181600 #> ymin : 329400 #> ymax : 334000
st_bbox() provides the information as a named vector and presents first minimum and then maximum values of both dimensions,
bbox() provides a table with minimum and maximum values in columns and
extent() provides the information in an S4 object that presents first the x and then the y values.
Neither data structures, nor names or positions of the information are identical.
For a human user the structure of those information might not matter because we recognise, in most cases intuitively, where which information is to be found in such a simple data-structure.
In the above case it is easy to recognise how the combination of column and rownames (of
bbox()) refers to the already combined names (of
However, this capacity of humans to recognise information relative to the context needs to be programmed into software, for it to have that ability.
Think, for example, of a new custom function that is designed to extract and process information from an arbitrary spatial input, i.e., without knowing in advance what spatial class the user will provide.
This would require an extensive code-logic to handle all possible input formats, complicated further by classes that may become available only in the future.
geometr improves interoperability in R for geometric and thus spatial classes by following the Bioconductor standard for S4 classes.
Here, getters and setters are used as accessor functions, and as pathway to extract or modify information of a given data structure.
geometr thus provides getters that provide information in identical arrangement from a wide range of classes, and likewise setters that modify different classes in the same way, despite those classes typically need differently formatted input, arguments and functions.
The following code chunk shows how different input classes yield the same output object.
myInput <- nc_sf getExtent(x = myInput) #> # A tibble: 2 x 2 #> x y #> <dbl> <dbl> #> 1 -84.3 33.9 #> 2 -75.5 36.6 myInput <- nc_sp getExtent(x = myInput) #> # A tibble: 2 x 2 #> x y #> <dbl> <dbl> #> 1 -84.3 33.9 #> 2 -75.5 36.6 myInput <- ras getExtent(x = myInput) #> # A tibble: 2 x 2 #> x y #> <dbl> <dbl> #> 1 178400 329400 #> 2 181600 334000
The output of the getters provided by
This ensures that the information retrieved with getters are compatible with a tidy workflow and that a custom function that processes geometric information requires merely one very simple row of code to extract those information from a potentially wide range of distinct classes.
geometr comes with the S4 class
geom, a geometric (spatial) class that has primarily been developed for its interoperability and easy access.
All objects of this class are structurally the same, no slots are removed or added when modifying an object and all properties are labelled with the same terms in each object of that class.
This interoperability is true for objects representing point (and grid), line or polygon features, for objects that contain a single or several features and for objects that are either merely geometric or indeed spatial/geographic because they contain a coordinate reference system (crs).
geom contains, moreover, only direct information, i.e., such information that can’t be derived from other of its information, such as the extent (which is in fact only the minimum and maximum coordinates that make up the geometry).
geom contains as its backbone the three slots
Each of those slots are a named list that contains as many tables as there are layers in the
The exact values stored in those tables are explained in Tab. 3.1, along the other slots of a
||the type of how the
||the coordinates in x and y dimension and the ID of the feature the point is part of (
||the feature ID (
||the group ID (
||the coordinates of a rectangular polygon that outlines the "enclosing area" of the
||depending on crs and usecase, the coordinates of points can be documented as
||the coordinate reference system, currently in proj4 notation. In case no crs has been set, this is shown as 'cartesian'.|
||all of the functions of
geom of type grid is a special case of a point
geom in that it is made up of a systematically distributed lattice of points, thereby resembling
geom of type grid contains in the
@point slot merely a table that contains the minimum and maximum value and the cell size for the x and y dimensions, while a
geom of type point, line or polygon explicitly contains all the coordinates of the points that make up features.
When using the getter
getPoints(), this slot is “unpacked” into a form that is interoperable with the other
gtGeoms$grid$categorical@point #> # A tibble: 3 x 2 #> x y #> <dbl> <dbl> #> 1 0 0 #> 2 60 56 #> 3 1 1 getPoints(x = gtGeoms$grid$categorical) #> # A tibble: 3,360 x 3 #> fid x y #> <int> <dbl> <dbl> #> 1 1 0.5 0.5 #> 2 2 1.5 0.5 #> 3 3 2.5 0.5 #> 4 4 3.5 0.5 #> 5 5 4.5 0.5 #> 6 6 5.5 0.5 #> 7 7 6.5 0.5 #> 8 8 7.5 0.5 #> 9 9 8.5 0.5 #> 10 10 9.5 0.5 #> # … with 3,350 more rows
In contrast to
Raster* objects of the
raster package, the values in a grid
geom are run-length encoded, in case that results in a smaller object, which is often the case for rasters with categorical values.
As with points, the getter
getFeatures() unpacks the
@feature slot into its interoperable form.
gtGeoms$grid$categorical@feature #> $categorical #> # A tibble: 726 x 2 #> val len #> <int> <int> #> 1 31 2 #> 2 47 10 #> 3 44 7 #> 4 21 27 #> 5 27 14 #> 6 31 1 #> 7 47 11 #> 8 44 8 #> 9 21 5 #> 10 41 4 #> # … with 716 more rows getFeatures(x = gtGeoms$grid$categorical) #> # A tibble: 3,360 x 3 #> fid gid values