Improve computing time and memory consumption of ordinary Kriging #50

LSchueler · 2019-12-10T10:13:22Z

As seen in issue #36 our current Kriging implementation runs into memory problems, if the data sets get larger.

As Kriging is a local predictor, it should be sufficient, to only use the nearest observations of the target point, say everything within 2 times the correlation length.

I just tested some existing stuff from scipy and it looks pretty promising:

import numpy as np
from scipy import sparse
from scipy.spatial import distance, cKDTree


rng = np.random.RandomState(seed=58476)
n = 10
arr = rng.rand(n, 2)
max_dist = 0.5

# current implementation
dist = distance.cdist(arr, arr)

# sugestions
neighbours = cKDTree(arr)

# calculate distances within 2-distance of 0.5
# this output_type does not store zeros for too distant pairs
D = neighbours.sparse_distance_matrix(neighbours, max_dist, p=2.0, output_type='ndarray')

# upper triangle is enough
DU = D[D['i'] < D['j']]

# create sparse matrix
dist_sparse = sparse.coo_matrix((DU['v'], (DU['i'], DU['j'])), (n, n))

print(f'dist =\n{dist}')

print(f'dist_sparse = \n{dist_sparse}')

Not only would we cut off a lot of meaningless points, but we would also facilitate a sparse matrix. Of course, this means some refactoring of the Kriging code.

MuellerSeb · 2020-01-03T14:54:18Z

The cKDTree is really a sneaky tool.

But applying this would mean, that the kriging matrix has to be inverted for each field point. A moving window could be used at that point, to define a neighborhood for a set of field points. But this would also mean, we couldn't use the cython code to speed things up.
The only real problem occurs, when the conditioning points are too much, so that the kriging matrix (with all conditioning points) itself get to big for the memory.
For the field generation I would suggest to provide a chunk_size keyword as proposed here:

GSTools/gstools/krige/base.py

Line 74 in d44345e

self, pos, mesh_type="unstructured", ext_drift=None, chunk_size=None

In the proposed new kriging method, the kriging matrix is calculated once the kriging class is initialized.

I think, this is the way to go. What do you think?

MuellerSeb · 2020-03-26T23:08:53Z

A related issues was created at PyKrige:
GeoStat-Framework/PyKrige#143

I think here we can draw the line between GSTools and PyKrige, since we could implement this approach fully in cython in PyKrige, since the variogram models are provided in cython there as well.

So for the moment the chunk_size approach in the GSTools.krige module should be sufficient.

LSchueler assigned MuellerSeb and LSchueler Dec 10, 2019

LSchueler added Performance Performance related stuff. Refactoring Code-Refactoring needed here labels Dec 10, 2019

LSchueler added this to the 1.2 milestone Dec 10, 2019

MuellerSeb modified the milestones: 1.2, 1.3 Mar 20, 2020

MuellerSeb closed this as completed Mar 26, 2020

MuellerSeb removed this from the 1.3 milestone Mar 26, 2020

MuellerSeb added the wontfix This will not be worked on label Mar 26, 2020

jacobgrasmick mentioned this issue Aug 7, 2020

Perform Local Kriging by specifying maximum number of nearest observations #99

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve computing time and memory consumption of ordinary Kriging #50

Improve computing time and memory consumption of ordinary Kriging #50

LSchueler commented Dec 10, 2019

MuellerSeb commented Jan 3, 2020

MuellerSeb commented Mar 26, 2020 •

edited

Loading

Improve computing time and memory consumption of ordinary Kriging #50

Improve computing time and memory consumption of ordinary Kriging #50

Comments

LSchueler commented Dec 10, 2019

MuellerSeb commented Jan 3, 2020

MuellerSeb commented Mar 26, 2020 • edited Loading

MuellerSeb commented Mar 26, 2020 •

edited

Loading