Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve computing time and memory consumption of ordinary Kriging #50

Closed
LSchueler opened this issue Dec 10, 2019 · 2 comments
Closed
Assignees
Labels
Performance Performance related stuff. Refactoring Code-Refactoring needed here wontfix This will not be worked on

Comments

@LSchueler
Copy link
Member

As seen in issue #36 our current Kriging implementation runs into memory problems, if the data sets get larger.

As Kriging is a local predictor, it should be sufficient, to only use the nearest observations of the target point, say everything within 2 times the correlation length.

I just tested some existing stuff from scipy and it looks pretty promising:

import numpy as np
from scipy import sparse
from scipy.spatial import distance, cKDTree


rng = np.random.RandomState(seed=58476)
n = 10
arr = rng.rand(n, 2)
max_dist = 0.5

# current implementation
dist = distance.cdist(arr, arr)

# sugestions
neighbours = cKDTree(arr)

# calculate distances within 2-distance of 0.5
# this output_type does not store zeros for too distant pairs
D = neighbours.sparse_distance_matrix(neighbours, max_dist, p=2.0, output_type='ndarray')

# upper triangle is enough
DU = D[D['i'] < D['j']]

# create sparse matrix
dist_sparse = sparse.coo_matrix((DU['v'], (DU['i'], DU['j'])), (n, n))

print(f'dist =\n{dist}')

print(f'dist_sparse = \n{dist_sparse}')

Not only would we cut off a lot of meaningless points, but we would also facilitate a sparse matrix. Of course, this means some refactoring of the Kriging code.

@LSchueler LSchueler added Performance Performance related stuff. Refactoring Code-Refactoring needed here labels Dec 10, 2019
@LSchueler LSchueler added this to the 1.2 milestone Dec 10, 2019
@MuellerSeb
Copy link
Member

The cKDTree is really a sneaky tool.

But applying this would mean, that the kriging matrix has to be inverted for each field point. A moving window could be used at that point, to define a neighborhood for a set of field points. But this would also mean, we couldn't use the cython code to speed things up.
The only real problem occurs, when the conditioning points are too much, so that the kriging matrix (with all conditioning points) itself get to big for the memory.
For the field generation I would suggest to provide a chunk_size keyword as proposed here:

self, pos, mesh_type="unstructured", ext_drift=None, chunk_size=None

In the proposed new kriging method, the kriging matrix is calculated once the kriging class is initialized.

I think, this is the way to go. What do you think?

@MuellerSeb MuellerSeb modified the milestones: 1.2, 1.3 Mar 20, 2020
@MuellerSeb
Copy link
Member

MuellerSeb commented Mar 26, 2020

A related issues was created at PyKrige:
GeoStat-Framework/PyKrige#143

I think here we can draw the line between GSTools and PyKrige, since we could implement this approach fully in cython in PyKrige, since the variogram models are provided in cython there as well.

So for the moment the chunk_size approach in the GSTools.krige module should be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Performance related stuff. Refactoring Code-Refactoring needed here wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants