A LAS class is a representation in R of a las file that aims to respect as closely as possible the official LAS specification that describes the file format. “As closely as possible” means that, due to R internal limitations, it is not possible to represent a las file exactly as it should be represented. Additionally, some aspects of the las format specifications are not supported in lidR. Still, the contents of a
LAS object must reflect the fact that it is a representation of a standardized format, so some restrictions are imposed on users.
readLAS reads one or several .las or .laz file(s) to build a LAS object.
LASfile <- system.file("extdata", "Megaplot.laz", package="lidR") las <- readLAS(LASfile) print(las) #> class : LAS (v1.2 format 1) #> memory : 6.2 Mb #> extent : 684766.4, 684993.3, 5017773, 5018007 (xmin, xmax, ymin, ymax) #> coord. ref. : NAD83 / UTM zone 17N #> area : 53112.69 m² #> points : 81.6 thousand points #> density : 1.54 points/m²
LAS object is composed of four slots:
@bbox, and inherits
Spatial from package
data of a LAS object contains a
data.table with the data read from .las or .laz file(s). The columns of the table are named after the LAS specification version 1.4. Each name is reserved and is associated with a given type:
ScannerChannel(int) (point format >= 6)
Overlap_flag(bool) (point format >= 6)
ScanAngleRank(int) (point format < 6)
ScanAngle(dbl) (point format >= 6)
Here we can already see some deviations from the official las format specifications. For example, the attribute ‘Classification’ should be an
unsigned char stored on 8 bits. However, the R language does not support this data type and consequently this attribute is stored in a 32-bit signed
int. One can read the official las specifications to figure out the other deviations from the original file format induced by the fact that R only has 32-bit signed integers and 64-bit signed decimal numbers.
LAS object contains a slot
@header that represents the header of the las file. The header is stored in a
LASheader object. A
LASheader object contains two slots:
@PHB for the public header block and
@VLR for the variable length records. Both slots are lists labelled according to the las file format specification. See public documentation of las file format for more information about las headers. Users should never normally have to worry about the header as long as they use functions from lidR. Everything is managed internally to ensure that objects are valid. However, users still need to know that the contents of the header are important, especially when writing
LAS objects into las or laz files.
print(las@header) #> File signature: LASF #> File source ID: 0 #> Global encoding: #> - GPS Time Type: GPS Week Time #> - Synthetic Return Numbers: no #> - Well Know Text: CRS is GeoTIFF #> - Aggregate Model: false #> Project ID - GUID: 00000000-0000-0000-0000-000000000000 #> Version: 1.2 #> System identifier: LAStools (c) by rapidlasso GmbH #> Generating software: las2las (version 171231) #> File creation d/y: 0/0 #> header size: 227 #> Offset to point data: 321 #> Num. var. length record: 1 #> Point data format: 1 #> Point data record length: 28 #> Num. of point records: 81590 #> Num. of points by return: 55756 21493 3999 342 0 #> Scale factor X Y Z: 0.01 0.01 0.01 #> Offset X Y Z: 0 0 0 #> min X Y Z: 684766.4 5017773 0 #> max X Y Z: 684993.3 5018007 29.97 #> Variable length records: #> Variable length record 1 of 1 #> Description: by LAStools of rapidlasso GmbH #> Tags: #> Key 1024 value 1 #> Key 3072 value 26917 #> Key 3076 value 9001 #> Key 4099 value 9001
@proj4string is inherited from the
Spatial class from the
sp package. It is a
CRS object that stores the coordinate reference system (CRS) of the las file. In the official las specifications the CRS is stored in the header. In a LAS object the CRS is stored in the header using the EPSG code of the CRS or a WKT string (depending how it has been recorded in the file), but it is also stored in the slot
@proj4string. This is to ensure it meets R standards and is in accordance with other spatial data packages in the R ecosystem. Consequently, to get a valid LAS object properly written into a las file it is important to set the CRS using the function
projection(). This function updates the header of the LAS object and the proj4string.
According to LAS specification the CRS system can also be a WKT string when the WKT bit flag is set to
las@header@PHB[["Global Encoding"]][["WKT"]] = TRUE projection(las) <- sp::CRS("+init=epsg:26917") projection(las) #>  "+proj=utm +zone=17 +datum=NAD83 +units=m +no_defs" # Header has been updated but users do not need to take care of that las@header@VLR[["WKT OGC CS"]][["WKT OGC COORDINATE SYSTEM"]] #>  "PROJCRS[\"NAD83 / UTM zone 17N\",\n BASEGEOGCRS[\"NAD83\",\n DATUM[\"North American Datum 1983\",\n ELLIPSOID[\"GRS 1980\",6378137,298.257222101,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4269]],\n CONVERSION[\"UTM zone 17N\",\n METHOD[\"Transverse Mercator\",\n ID[\"EPSG\",9807]],\n PARAMETER[\"Latitude of natural origin\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8801]],\n PARAMETER[\"Longitude of natural origin\",-81,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8802]],\n PARAMETER[\"Scale factor at natural origin\",0.9996,\n SCALEUNIT[\"unity\",1],\n ID[\"EPSG\",8805]],\n PARAMETER[\"False easting\",500000,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8806]],\n PARAMETER[\"False northing\",0,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8807]],\n ID[\"EPSG\",16017]],\n CS[Cartesian,2],\n AXIS[\"(E)\",east,\n ORDER,\n LENGTHUNIT[\"metre\",1,\n ID[\"EPSG\",9001]]],\n AXIS[\"(N)\",north,\n ORDER,\n LENGTHUNIT[\"metre\",1,\n ID[\"EPSG\",9001]]],\n USAGE[\n SCOPE[\"unknown\"],\n AREA[\"North America - 84°W to 78°W and NAD83 by country\"],\n BBOX[23.81,-84,84,-78]]]"
@bbox is inherited from the
Spatial class from
sp. It is a
matrix object that stores the XY bounding box of the point cloud. In the official las specifications the bounding box is stored in the header. In a
LAS object the bounding box is stored both in the header and also stored in the slot
@bbox (to be in compliance with R standards and other spatial data R packages). The user should never change the bounding box manually. However, doing that will have few consequences because this slot is of little practical use.
R users who are used to manipulating spatial data are likely to be very familiar with the
sp package and all the classes used to store spatial data, such as
SpatialPolygonsDataFrame, and so on. The data contained in these classes are freely modifiable by the user because they can be of any type. A
LAS object is not freely modifiable because it is a strongly standardized representation of a las file.
For example, users cannot replace the
Classification attribute with the value
0 is a decimal number in R and the ‘Classification’ attribute is an integer. The following throws an error:
0L is an integer and thus the following is allowed:
It would be possible to automatically cast the input into the correct type without throwing an error. But for the lidR package we chose to be very pedantic on this point to avoid any potential problems and because we would prefer users to be careful about the content of their data.
The addition of a new column is also restricted. For example, one may want to add an attribute
R corresponding to the red channel.
This is not allowed because a LAS object should always be valid. By allowing the user to add an R column the LAS object would no longer be valid for two reasons:
Ris a reserved name of the core attributes and must be an integer. In the example above it is a decimal number.
In consequence, adding a column must be done via the functions
add_lasattribute. This way users are forced to read the documentation of these two functions. And yet some restrictions are still in place. For example, the following is not allowed for the same reasons as above:
But anyway, R being R, there is no way to completely restrict editing of objects. Users can always by-pass the restrictions to make LAS objects that are not strictly valid:
In conclusion, a LAS object is not actually immutable but at least there are some restrictions to ensure that the user is aware that not everything is authorized.
As we have seen, a LAS object contains a core of attributes associated with reserved names in accordance with the las specifications. It is possible, however, to add more attributes to a LAS object even if they are not part of the core attributes imposed by the las specifications.
Extra attributes are just like adding a column in a regular table in R. One can freely modify the data using the function
add_attribute. It is thus possible to add an attribute to a LAS object. For example, it is possible to attribute an ID to each point and use this value in subsequent code:
But it is important to understand that this attribute is invalid with respect to the las specifications. Thus it can be used at the R level but will not be written in a las file and thus will be lost at write time. Depending on the purpose of this attribute it may or may not be useful to be able to write this extra data. Most of the time the information is only useful at the R level but sometimes it might be appropriate to store the data in a file.
The las specifications allow for storing extra attributes that are not part of the core attributes. but the way to do this is more complex. Basically it is called extra bytes attributes and it implies modification of the LAS object header to indicate that the contents of the file contains more than the core attributes. This is abstracted with the function
Using this function, the header is updated according to the las specification and thus the extra bytes attributes can be written in the file. lidR supports up to 10 extra bytes attributes. The extra bytes attributes are limited to being of type numeric. Indeed, the las specifications do not allow for storing extra bytes attributes of type string or type boolean. Thus the following fails:
It is common that users report bugs arising from the fact that a point cloud is invalid. This is why we introduced the function
las_check to perform a deep inspection of LAS objects. This function checks if a LAS object is in accordance with the las specifications but also it checks for weird point clouds that could be valid with respect to the specifications but invalid for actual processing. For example, it often happens that a las file contains duplicated points for no valid reason. This may lead to trees being detected twice, to invalid metrics, or to errors in DTM generation, and so on…
las_check(las) #> #> Checking the data #> - Checking coordinates... ✓ #> - Checking coordinates type... ✓ #> - Checking coordinates quantization... ✓ #> - Checking attributes type... ✓ #> - Checking ReturnNumber validity... ✓ #> - Checking NumberOfReturns validity... ✓ #> - Checking ReturnNumber vs. NumberOfReturns... ✓ #> - Checking RGB validity... ✓ #> - Checking absence of NAs... ✓ #> - Checking duplicated points... ✓ #> - Checking degenerated ground points... skipped #> - Checking attribute population... #> ⚠ 'PointSourceID' attribute is not populated. #> ⚠ 'EdgeOfFlightline' attribute is not populated. #> - Checking gpstime incoherances ✓ #> - Checking flag attributes... ✓ #> - Checking user data attribute... ✓ #> Checking the header #> - Checking header completeness... ✓ #> - Checking scale factor validity... ✓ #> - Checking point data format ID validity... ✓ #> - Checking extra bytes attributes validity... ✓ #> - Checking the bounding box validity... ✓ #> - Checking coordinate reference sytem... ✓ #> Checking header vs data adequacy #> - Checking attributes vs. point format... ✓ #> - Checking header bbox vs. actual content... ✓ #> - Checking header number of points vs. actual content... ✓ #> - Checking header return number vs. actual content... ✓ #> Checking preprocessing already done #> - Checking ground classification... no #> - Checking normalization... yes #> - Checking negative outliers... ✓ #> - Checking flightline classification... no
lidR provides a simple
plot function to plot a LAS object in 3D. It is based on the
rgl package. The
rgl package is amazing but has some problems working with large point clouds. We are currently developing our own viewer to overcome this issue. This viewer is fully compatible with
lidR but still in heavy development.
color expects the name of the attribute you want to use to colorize the points. Default is
If your file contains RGB data the string
"RGB" is supported:
trim parameter enables trimming of values when outliers break the color palette range. For example, Intensity often contains large outliers. The palette range would be too large and most of the values will be considered as “very low”, so everything will appear in the same color.
This section is of major importance because there are many instances where R is weak at memory management.
Firstly, it is important to note that R only enables manipulation of 32-bit integers and 64-bit decimal numbers. But the las specification states, for example, that the intensity is stored on 16 bits (see previous sections). When read in R it must be converted to 32 bits and therefore will use twice as much memory than is needed. Worse, the return numbers are stored on 3 bits in las files but 32 bits in R, therefore using 11 times more memory than is required. Last but not least, flags are stored on 1 bit, whereas R uses 32 bits. This is 32 times more memory than is needed. As a consequence, a LAS object is 2 to 3 times larger than it needs to be.
Secondly, the way the point cloud is stored and the way R works implies that copies will be made of the point cloud either in the user’s workspace or internally. Considering that point clouds can be huge it is important to be aware of this point.
Let’s assume we have loaded a large las file that uses 1 GB of R memory.
Suppose we now want to remove a few outliers above 50 m. One can write the following:
And the user now has two objects:
las.originalof size 1 GB
las.denoisedthat is also 1 GB, because we only removed a dozen or so points out of millions.
This uses 2 GB of memory. This is how R works. When a vector is subsetted it is necessarily copied. We talk about deep copies. In regular data processing it rarely matters and this behavior is barely noticeable. Indeed, it is rare that data uses a lot of memory. But LiDAR datasets are often massive, and this necessitates that users must carefully consider memory usage to avoid running out of RAM.
In the previous example we showed a deep copy. A deep copy means that the point cloud is actually copied into the memory. A deep copy occurs when the number of points of the output is different from the number of points of the input. But many functions return the same number of point as the input. In such cases only shallow copies are made. For example, when classifying points into ground and non-ground:
In this case the vectors that store the X Y Z coordinates as well as those that store the Intensity, ReturnNumber, NumberOfReturn and other attributes were not modified by the function. Only the contents of the ‘Classification’ attribute were modified. In this case
las.original, even though they are two different objects, share the same memory for X Y Z, and so on, but the attributes ‘Classification’ are different. In conclusion:
las.originalis of size 1 GB
las.classifiedis also 1 GB.
But both together they are not equal to 2 GB, but ~1.1 GB because they share the same memory. The content of the original LAS object was shallow copied. An understanding of the concepts of deep and shallow copies is important for optimizing your scripts.
As we have seen, because of the way R is designed, lidR uses a large amount of memory anyway. To deal with this limitation
readLAS has two optimizations: the parameter
select and the parameter
To save memory only useful data can be loaded.
readLAS can take an optional parameter
select which enables the user to selectively load the data of interest. For example, one can load only the
X Y Z fields. This selection is done at the C++ level while reading and is memory-optimized.
select enables the user to select “columns” (or attributes) while reading files,
filter allows selection of “rows” (or points) while reading. Again, the selection is done at the C++ level and is memory-optimized so not a single bit is lost at the R level. Removing data at reading time that is superfluous for your purposes saves memory and decreases computation time.