Skip to content

TypeScript package to validate COPC files in NodeJS or Browser environments

License

Notifications You must be signed in to change notification settings

hobuinc/copc-validator

Repository files navigation

COPC Validator

Table of Contents

  1. Introduction
    1. Getting Started
  2. Usage
    1. CLI
      1. Options
    2. Import
      1. Options
  3. Scans
    1. Quick scan
    2. Full scan
    3. Output
  4. Details
    1. Checks
      1. Status & Check Objects
      2. Functions
      3. Suites
      4. Parsers
      5. Collections
      6. All checks
    2. Report
      1. Report schema
  5. Future Plans

Introduction

COPC Validator is a library & command-line application for validating the header and content of a Cloud-Optimized Point Cloud (COPC) LAS file. Extending the copc.js library, it accepts either a (relative) file path or COPC url, and runs a series of checks against the values parsed by copc.js.

Getting Started

  1. Install from npm

    npm i -g copc-validator
    

    Global install is recommended for CLI usage

  2. Scan copc.laz file with copcc CLI

Examples:

  • Default

    copcc ./path/to/example.copc.laz
    
  • Deep scan, output to <pwd>/output.json

    copcc --deep path/to/example.copc.laz --output=output.json
    
  • Deep & Minified scan with worker count = 64, showing a progress bar

    copcc path/to/example.copc.laz -dmpw 64
    

Usage

COPC Validator has two main usages: via the copcc Command-Line Interface (CLI), or imported as the generateReport() function

CLI

copcc [options] <path>

The usage and implementation of COPC Validator is meant to be as simple as possible. The CLI will only need one file path and will automatically run a shallow scan by default, or a deep scan if provided with the --deep option. All other functionality is completely optional.

Options

Option Alias Description Type Default
deep d Read all points of each node; Otherwise, read only root point boolean false
name n Replace name in Report with provided string string <path>
mini m Omit Copc or Las from Report, leaving checks and scan info boolean false
pdal P Output a pdal.metadata object containing header & vlr data in pdal info format boolean false
workers w Number of Workers to create - Use at own (performance) risk number CPU-count
queue q Queue size limit for reading PDR data. Useful for very high node counts (>10000) number Unlimited
sample s Select a random sample of nodes to read & validate number All nodes
progress p Show a progress bar while reading the point data boolean false
output o Writes the Report out to provided filepath; Otherwise, writes to stdout string N/A
help h Displays help information for the copcc command; Overwrites all other options boolean N/A
version v Displays copc-validator version (from package.json) boolean N/A

Import

  1. Add to project:

    yarn add copc-validator
      # or
    npm i copc-validator
  2. Import generateReport():

    import { generateReport } from 'copc-validator'
  • Example:
    async function printReport() {
      const report = await generateReport({
        source: 'path/to/example.copc.laz',
        options: {} // default options
      })
      console.log(report)
    }
  1. Copy laz-perf.wasm to /public (for browser usage)

Options

generateReport accepts most* of the same options as the CLI through the options property of the first parameter:

TypeScript:

const generateReport = ({
  source: string | File
  options?: {
    name?: string          //default: source | 'COPC Validator Report'
    mini?: boolean         //default: false
    pdal?: boolean         //default: false
    deep?: boolean         //default: false
    workers?: number       //default: CPU Thread Count
    queueLimit?: number    //default: Infinity
    sampleSize?: number    //default: All nodes
    showProgress?: boolean //default: false
  },
  collections?: {copc, las, fallback}
})

See below for collections information

* Key option differences:

  • No output, help, or version options
  • queue is renamed to queueLimit
  • sample is renamed to sampleSize
  • progress is renamed to showProgress
    • Not usable in a browser
  • Any Alias (listed above) will not work

Scans

COPC Validator comes with two scan types, shallow and deep

(see requirements.md for more details)

The report output supports a custom scan type, intended to be used by other developers that may extend the base functionality of COPC Validator. It is not currently used anywhere in this library.

Shallow scan

The shallow scan checks the LAS Public Header Block and various Variable Length Records (VLRs) to ensure the values adhere to the COPC specificiations (found here)

This scan will also check the root (first) point of every node (in the COPC Hierarchy) to ensure those points are valid according to the contents of the Las Header and COPC Info VLR

Deep scan

The deep scan performs the same checks as a shallow scan, but scans every point of each node rather than just the root point, in order to validate the full contents of the Point Data Records (PDRs) against the COPC specs and Header info

Output

COPC Validator outputs a JSON report according to the Report Schema, intended to be translated into a more human-readable format (such as a PDF or Webpage summary)

Details

Checks

A Check ultimately refers to the Object created by calling a Check.Function with performCheck(), which uses the Check.Suite property name to build the returned Check.Status into a complete Check.Check. This already feels like a bit much, without even mentioning Check.Parsers or Check.Collections, so we'll break it down piece-by-piece here

Pseudo-TypeScript:

namespace Check {
  type Status = {
    status: 'pass' | 'fail' | 'warn'
    description: string
    info?: string
  }
  type Check = Status & { id: string }

  type Function<T> =
    | (c: T) => Status
    | (c: T) => Promise<Status>

  type Suite<T> = { [id: string]: Check.Function<T> }
  type SuiteWithSource<T> = { source: T, suite: Suite<T>}

  type Parser<Source, Parsed> = (s: Source) => Promise<SuiteWithSource<Parsed>>

  type Collection = (SuiteWithSource<any> | Promise<SuiteWithSource<any>>)[]
}
type Check = Check.Check

See ./src/types/check.ts for the actual TypeScript code

Status & Check Objects

A Check.Status Object contains a status property with a value of "pass", "fail", or "warn", and optionally contains an info property with a string value.

A Check Object is the same as a Status Object with an additional string property named id

pass means file definitely matches COPC specificiations
fail means file does not match any COPC specifications
warn means file may not match current COPC specifications or recommendations

Functions

Check.Functions maintain the following properties:

  • Single (Object) parameter
  • Syncronous or Asyncronous
  • Output: Check.Status (or a Promise)
  • Pure function

Suites

A Check.Suite is a map of string ids to Check.Functions, where each Function uses the same Object as its parameter (such as the Copc Object, for ./src/suites/copc.ts). The id of a Function becomes the id value of the Check Object when a Check.Suite invokes its Functions

The purpose of this type of grouping is to limit the number of Getter calls for the same section of a file, like the 375 byte Header

All Suites (with their Check.Functions) are located under src/suites

Parsers

Check.Parsers are functions that take a source Object and return a Check.SuiteWithSource Object. Their main purpose is to parse a section of the given file into a usable object, and then return that object with its corrosponding Suite to be invoked from within a Collection.

All Parsers are located under src/parsers (ex: nodeParser)

nodes.ts

src/parsers/nodes.ts is unique among Parsers, in that it's actually running a Suite repeatedly as it parses. However, the data is not returned from the multithreaded Workers like a regular Check.Suite, so nodes.ts then gives the output data to the (new) pointDataSuite for sorting into Check.Statuses

worker.js

src/utils/worker.js essentially matches the structure of a Suite because it used to be the src/suites/point-data.ts Suite. To increase speed, the pointDataSuite became per-Node instead of per-File, which maximizes multi-threading, but creates quite a mess since worker.js must be (nearly) entirely self-contained for Worker/Web Worker threading. So src/suites/point-data.ts now parses the output of src/utils/worker.js, all of which is controlled by the src/parsers/nodes.ts Parser


Collections

Check.Collections are arrays of Check.Suites with their respective source Object (Check.SuiteWithSource above). They allow Promises in order to use Check.Parsers internally without having to await them.

All Collections are located under src/collections (ex: CopcCollection)

Replacing Collections is the primary way of generating custom reports through generateReport, as you can supply different Check.Suites to perform different Check.Functions per source object.

Custom scan

generateReport has functionality to build customized reports by overwriting the Check.Collections used within:

Pseudo-Type:

import type {Copc, Getter, Las} from 'copc'
type Collections = {
  copc: ({
    filepath: string,
    copc: Copc,
    get: Getter,
    deep: boolean,
    workerCount?: number
  }) => Promise<Check.Collection>,
  las: ({
    get: Getter,
    header: Las.Header,
    vlrs: Las.Vlr[]
  }) => Promise<Check.Collection>,
  fallback: (get: Getter) => Promise<Check.Collection>
}

const generateReport = async ({
  source: string | File,
  options?: {...},
  collections?: Collections
}) => Promise<Report>

All Checks

ID Description Scan Suite
minorVersion copc.header.minorVersion is 4 Shallow Header
pointDataRecordFormat copc.header.pointDataRecordFormat is 6, 7, or 8 Shallow Header
headerLength copc.header.headerLength is 375 Shallow Header
pointCountByReturn Sum of copc.header.pointCountByReturn equals copc.header.pointCount Shallow Header
legacyPointCount header.legacyPointCount follows COPC/LAS specs Shallow manualHeader
legacyPointCountByReturn header.legacyPointCountByReturn follows COPC/LAS specs Shallow manualHeader
vlrCount Number of VLRs in copc.vlrs matches copc.header.vlrCount Shallow Vlr
evlrCount Number of EVLRs in copc.vlrs matches copc.header.evlrCount Shallow Vlr
copc-info Exactly 1 copc info VLR exists with size of 160 Shallow Vlr
copc-hierarchy Exactly 1 copc hierarchy VLR exists Shallow Vlr
laszip-encoded Checks for existance of LasZIP compression VLR, warns if not found Shallow Vlr
wkt Ensures wkt string can initialize proj4 Shallow manualVlr
bounds within cube Copc cube envelops Las bounds (min & max) Shallow Copc
rgb RGB channels are used in PDR, if present Shallow PointData
rgbi Checks for 16-bit scaling of RGBI values, warns if 8-bit Shallow PointData
xyz Each point exists within Las and Copc bounds, per node Shallow PointData
gpsTime Each point has GpsTime value within Las bounds Shallow PointData
sortedGpsTime The points in each node are sorted by GpsTime value, warns if not Deep PointData
returnNumber Each point has ReturnNumber <= NumberOfReturns Shallow PointData
zeroPoint Warns with list of all pointCount: 0 nodes in the Hierarchy Deep* PointData
nodesReachable Every Node ('D-X-Y-Z') in the Hierarchy is reachable Shallow PointData
pointsReachable Each Node pageOffset + pageLength leads into another Node page Shallow PointData
...ID ...Description Shallow ...

Checks and their IDs are subject to change as I see fit

Report

Report schema

See JSON Schema

TypeScript pseudo-type Report:

import * as Copc from 'copc'

type Report = {
  name: string
  scan: {
    type: 'shallow' | 'deep' | 'custom' | string  //| 'shallow-X/N' | 'deep-X/N'
    filetype: 'COPC' | 'LAS' | 'Unknown'
    start: Date
    end: Date
    time: number
  }
  checks: ({
    id: string
    status: 'pass' | 'fail' | 'warn'
    info?: string
  })[]

  // When scan.filetype === 'COPC'
  copc?: {
    header: Copc.Las.Header
    vlrs: Copc.Las.Vlr[]
    info: Copc.Info
    wkt: string
    eb: Copc.Las.ExtraBytes
  }

  // When scan.filetype === 'LAS'
  las?: {
    header: Copc.Las.Header
    vlrs: Copc.Las.Vlr[]
  }
  error: {
    message: string
    stack?: string
  }

  // When scan.filetype === 'Unknown'
  error: {
    message: string
    stack?: string
  }
  copcError?: {
    message: string
    stack?: string
  } // only used if Copc.create() and Las.*.parse() fail for different reasons
}

Future Plans

  • Add more Check.Functions - waiting on laz-perf chunk table
  • Rewrite LAS Check.Collection to validate LAS 1.4 specifications
  • Continue to optimize for speed, especially large (1.5GB+) files
<style type="text/css"> ol ol { list-style-type: upper-alpha; } ol ol ol {list-style-type: upper-roman;} </style>