A modern terrestrial laser scanner can produce two million range measurements every second. By the end of a working day it has handed back a file with several hundred million points and no indication of which of them describe the cathedral wall, which describe the scaffolding the surveyor walked past, and which are noise from a pigeon that crossed the beam. The raw output of these instruments is, in the most literal sense, unstructured. Turning it into a digital terrain model, a building reconstruction, or an arcaded cloister fit for restoration planning is the work of the next eighteen chapters. This first chapter sets out what a point cloud is, where it comes from, and the pipeline of operations that this book follows.
A laser scanner emits millions of laser pulses per second toward a surface, recording where each beam hits. The resulting set of millions of measured positions in three-dimensional space is called a point cloud.
More formally, a point cloud is a discrete set of points sampled from surfaces in three-dimensional Euclidean space:
Figure pending
Unlike images (regular 2D grids of pixels) or meshes (vertices with explicit connectivity), a point cloud is an unstructured, unordered collection of samples with no inherent grid, no predefined neighbour relationships, and no guarantee of uniform spacing. This simplicity is both the greatest advantage and the primary challenge.
Advantage. A point cloud can represent any geometry without requiring connectivity information or topological constraints. A tree with thousands of leaves, a rocky cliff face with overhangs, the interior of a cave, all can be captured by adding more points to increase the resolution.
Challenge. Neighbourhood relationships, surface normals, and connectivity must all be computed from the raw points. This requires building spatial data structures (such as kd-trees or octrees) to answer queries like "which points are close to this point?" efficiently. Every subsequent processing step depends on these computed relationships.
A modern terrestrial laser scanner can acquire up to 2 million points per second. A single scan of a building façade might contain 50 million points, and an airborne LiDAR survey of a city can produce several billion. At this scale, even computing the nearest-neighbour distance for every point requires careful algorithmic design to remain tractable.
Real data pending
Each point typically carries more than spatial coordinates. Depending on the sensor and the processing pipeline, a point may include a variety of attributes.
Intensity is the most widely available per-point attribute. It measures the power of the returned laser pulse and depends on surface reflectivity, incidence angle, and range. Because different materials reflect laser light differently, intensity values can distinguish, for example, asphalt (low reflectivity) from painted road markings (high reflectivity) even when their geometry is identical.
When a camera is co-registered with the scanner, RGB colour can be projected onto each point, adding photorealistic appearance to the geometric data. This combination is standard in mobile mapping systems and increasingly common in terrestrial laser scanning.
For airborne LiDAR, the return number and number of returns are critical attributes. A single emitted laser pulse can generate multiple returns when it hits semi-permeable surfaces such as vegetation canopies. The first return typically corresponds to the top of the canopy, intermediate returns to branches and leaves, and the last return to the ground beneath. This multi-return capability is one of the key advantages of LiDAR over photogrammetry for mapping forested terrain.
A semantic classification label is often assigned during post-processing, identifying each point as ground, vegetation, building, water, or another category drawn from a standard such as the ASPRS classification scheme. Whether the classification was produced by a hand-tuned rule, a random forest, or a deep neural network is a question this book returns to in chapter 11.
Real data pending
Several alternative representations exist, each with different trade-offs in flexibility, memory, and ease of processing.
A mesh (also called a triangulated irregular network, or TIN, in geomatics) consists of vertices connected by edges to form triangular or polygonal faces. A mesh explicitly encodes the topology of the surface: which vertices are connected, which faces are adjacent, and what the surface looks like between sample points. This makes meshes ideal for visualisation, simulation (for example, finite element analysis), and volume computation. The cost is that constructing a mesh from raw measurements requires a surface reconstruction step (chapter 14), and the resulting mesh is always an approximation of the true surface.
A voxel grid discretises 3D space into a regular grid of small cubes (volume elements, or voxels). Each voxel is either occupied or empty, or it may store a value such as density or colour. Voxels are the 3D analogue of pixels in a 2D image. They are particularly useful for volumetric analysis and for algorithms that require regular spatial sampling (such as convolutional neural networks). However, the memory cost grows cubically with resolution ( for a grid of side length ), which limits the achievable level of detail. Sparse representations like octrees mitigate this cost by only storing occupied regions of space.
A depth map stores a single height or distance value for each cell in a regular 2D grid. This "2.5D" representation is appropriate when each position has at most one value, as in many terrain models (DTMs) and building rooftops viewed from above. Depth maps cannot, however, represent overhangs, tunnels, or multi-layered structures like vegetation canopies.
Real data pending
What I want the reader to carry out of this chapter is a sense that the field has matured precisely because the point cloud is so honest. It does not pretend to know the surface, it does not pretend to know the connectivity, it does not pretend to know what is signal and what is noise. Every subsequent chapter is an answer to a question the raw cloud refuses to answer for itself. The reader who internalises this will move through the rest of the book with the right kind of patience.
The companion videos take the chapter's theory into practice on actual scans. Free previews are open to everyone. The rest are included with Book + Videos.
One email when the complete book ships. Nothing else.