Chapter 04

Data Formats and Spatial Indexing

A national mapping agency now produces between 30 and 100 terabytes of LiDAR every year, and the format this annual harvest is written in decides whether the data will still be queryable in five years or stranded behind an obsolete tool chain. The decision sounds bureaucratic until one tries to reach into a 50 GB ALS tile to compute a single statistic and discovers that without a spatial index the query takes hours, because every algorithm that touches a point's neighbourhood is quadratic in the worst case, and on a hundred million points that means $10^{16}$ distance computations. A multi-station TLS campaign tells the same story from the terrestrial side: a dozen stations arrive in the scanner-native E57 format, are converted to LAS for the downstream tool chain, and in that conversion a quiet decision is made about which attributes survive the round trip and which silently disappear. I have watched point cloud projects fail at this stage more often than at any deep methodological step that follows, and the failure usually surfaces weeks later, when a clean experimental result proves unreproducible because the data was rewritten through a lossy converter no one logged. We open by laying out the major formats (LAS, LAZ, COPC, E57, PLY, PCD, EPT) and then turn to the indexing structures (kd-tree, octree, R-tree, grid, ball tree) that make billion-point queries feasible.

Unlock with Book + Videos

With Book + Videos

Video 4.1

Inside the LAS and LAZ format

A real LAS file opened byte by byte: the header, the variable length records, the point records, and what LAZ compression does to each field.

These two themes, storage and retrieval, structure the chapter. We treat the file formats first and the spatial indexing structures second, because a format is the thing one receives and an index is the thing one builds on top of it.

Point cloud file formats

The choice of file format affects storage size, read and write speed, attribute support, and interoperability with downstream software.

LAS and LAZ

The LAS format, maintained by the ASPRS (American Society for Photogrammetry and Remote Sensing), is the de facto standard for airborne LiDAR data worldwide. The format has evolved through several versions: LAS 1.0 (2003), 1.1, 1.2, 1.3, and the current LAS 1.4 (first ratified in 2011, with revisions through 2019), which introduced 64-bit point counts, point record formats 6 to 10 with expanded fields (for example, 8-bit classification supporting 256 classes instead of the earlier 5-bit limit of 32), and support for up to 15 returns per pulse. Its key characteristics are:

Binary format with a fixed-size public header, optional Variable Length Records (VLRs) for metadata (coordinate reference system, extra attributes, waveform descriptors), and a contiguous block of point data records.
Multiple point record formats (numbered 0 to 10) that define which attributes are stored per point. The table below summarises the most common formats.
Scaled integer coordinates: $x$ , $y$ , and $z$ are stored as 32-bit integers with user-defined scale factors and offsets, so that $x_{\text{float}} = x_{\text{int}} \cdot s_x + o_x$ . This is more storage-efficient than 64-bit floating point and avoids precision loss when coordinates are far from the origin.

Paid chapter

Read the rest of Data Formats and Spatial Indexing

You have just read the opening section. Pick one and keep reading.

Book€39

Read all 19 chapters online here, and download the complete PDF on release.

Book + Videos€149

Everything in the book, plus the companion video course as it ships.

By Abderrazzaq Kharroubi, geomatics engineer: a decade of LiDAR fieldwork, 2,000+ students taught. One payment, lifetime access, 30-day refund.

Anything you highlight or note stays with you after you buy. Compare all tiers or sign in if you already bought.

Related chapters

05Sampling and ResamplingPreview available 06Filtering and DenoisingPreview available 03Acquisition SystemsFree

Point cloud file formats

The choice of file format affects storage size, read and write speed, attribute support, and interoperability with downstream software.

LAS and LAZ

Binary format with a fixed-size public header, optional Variable Length Records (VLRs) for metadata (coordinate reference system, extra attributes, waveform descriptors), and a contiguous block of point data records.

Multiple point record formats (numbered 0 to 10) that define which attributes are stored per point. The table below summarises the most common formats.

Scaled integer coordinates:

x

y

, and

z

are stored as 32-bit integers with user-defined scale factors and offsets, so that

x_{\text{float}} = x_{\text{int}} \cdot s_x + o_x

. This is more storage-efficient than 64-bit floating point and avoids precision loss when coordinates are far from the origin.

Read the rest of Data Formats and Spatial Indexing

You have just read the opening section. Pick one and keep reading.

Book€39

Read all 19 chapters online here, and download the complete PDF on release.

Book + Videos€149

Everything in the book, plus the companion video course as it ships.

By Abderrazzaq Kharroubi, geomatics engineer: a decade of LiDAR fieldwork, 2,000+ students taught. One payment, lifetime access, 30-day refund.

Anything you highlight or note stays with you after you buy. Compare all tiers or sign in if you already bought.