Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v.edit: Implement the batch option for batch editing #3450

Open
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

HuidaeCho
Copy link
Member

@HuidaeCho HuidaeCho commented Feb 24, 2024

This PR implements the new batch option for batch editing of a lot of features. For example, without this option, we need this script to extract the upper halves of roadsmajor from the NC sample dataset,

# in nc_spm_08_grass7/user1

# copy roadsmajor
g.copy vect=roadsmajor,roadsmajor_upper_half_wo_batch

# find road centers
v.db.select map=roadsmajor -c col=cat |
        awk '{printf "P %d %d 50%%\n", $1, $1}' |
        v.segment roadsmajor output=roadsmajor_center

# break roads at their center
v.to.db -p map=roadsmajor_center option=coor sep=comma |
        sed '/^cat/d' |
        while read cat x y; do
        v.edit map=roadsmajor_upper_half_wo_batch tool=break cats=$cat coords=$x,$y
done

# delete lower halves
v.to.db -p map=roadsmajor option=end sep=comma |
        sed '/^cat/d' |
        while read cat x y; do
        v.edit map=roadsmajor_upper_half_wo_batch tool=delete cats=$cat coords=$x,$y
done

It took 11.938 seconds on i9-12900. On the same CPU, with the new option, it took 2.180 seconds:

# in nc_spm_08_grass7/user1

# copy roadsmajor
g.copy vect=roadsmajor,roadsmajor_upper_half_w_batch

# find road centers
v.db.select map=roadsmajor -c col=cat |
        awk '{printf "P %d %d 50%%\n", $1, $1}' |
        v.segment roadsmajor output=roadsmajor_center --o

# create a batch table and pipe it to v.edit
(
# use these batch columns
echo "tool|cats|coords"

# break roads at their center
v.to.db -p map=roadsmajor_center option=coor sep=space |
        awk '!/^cat/{printf "break|%d|%s,%s\n", $1, $2, $3}'

# delete lower halves
v.to.db -p map=roadsmajor option=end sep=space |
        awk '!/^cat/{printf "delete|%d|%s,%s\n", $1, $2, $3}'

# the above three commands produce the following table:
# tool|cats|coords
# break|1|637108.11963224,257241.783755701
# break|2|636792.27198317,254339.365766968
# break|3|638449.758677739,248299.357582029
# ...
# delete|1|637209.083058167,257970.129540226
# delete|2|636981.336042672,256517.602235172
# delete|3|637960.264160529,248293.868427705
# ...
) | v.edit map=roadsmajor_upper_half_w_batch batch=-

Red is the upper halves.
image

@github-actions github-actions bot added vector Related to vector data processing C Related code is in C HTML Related code is in HTML module docs labels Feb 24, 2024
@HuidaeCho HuidaeCho self-assigned this Feb 24, 2024
@HuidaeCho HuidaeCho added the enhancement New feature or request label Feb 24, 2024
@HuidaeCho HuidaeCho changed the title v.edit: Implement the batch option for batching editing v.edit: Implement the batch option for batch editing Feb 24, 2024
vector/v.edit/args.c Outdated Show resolved Hide resolved
vector/v.edit/v.edit.html Outdated Show resolved Hide resolved
vector/v.edit/v.edit.html Outdated Show resolved Hide resolved
@landam landam added this to the 8.4.0 milestone Feb 25, 2024
@HuidaeCho HuidaeCho requested a review from landam February 25, 2024 14:03
@HuidaeCho
Copy link
Member Author

HuidaeCho commented Feb 25, 2024

This is for a separate PR?

grass/vector/v.edit/main.c

Lines 473 to 478 in 87d2d42

/* build topology only if requested or if tool!=select */
if (action_mode != MODE_SELECT && action_mode != MODE_NONE &&
params.topo->answer != 1) {
Vect_build_partial(&Map, GV_BUILD_NONE);
Vect_build(&Map);
}

I think we need && ret > 0 check to avoid building topology unnecessarily when there were no edits.

@HuidaeCho
Copy link
Member Author

@HuidaeCho I tested the new separator parameter, everything works as expected. I just have little comment on running v.edit on a file with different separator than used in input file:

head -n2 /tmp/batch 
tool,cats,coords
break,1,"637108.11963224,257241.783755701"

v.edit map=roadsmajor_upper_half batch=/tmp/batch 
ERROR: Unknown batch column 'tool,cats,coords' in line 1

v.edit map=roadsmajor_upper_half batch=/tmp/batch sep=,
ERROR: Unable to open vector map <roadsmajor_upper_half> on topological
       level. Try to rebuild vector topology by v.build.

Please try it again.

@HuidaeCho
Copy link
Member Author

This is for a separate PR?

grass/vector/v.edit/main.c

Lines 473 to 478 in 87d2d42

/* build topology only if requested or if tool!=select */
if (action_mode != MODE_SELECT && action_mode != MODE_NONE &&
params.topo->answer != 1) {
Vect_build_partial(&Map, GV_BUILD_NONE);
Vect_build(&Map);
}

I think we need && ret > 0 check to avoid building topology unnecessarily when there were no edits.

I was wrong. Level 2 vector always requires topology building to remain level 2 even if there are no changes. Just tried this:

Vect_open_update2();
/* do nothing */
Vect_close();

Level 2 vector became level 1. I'm not sure why and how...

@metzm
Copy link
Contributor

metzm commented Feb 26, 2024

I was wrong. Level 2 vector always requires topology building to remain level 2 even if there are no changes. Just tried this:

Vect_open_update2();
/* do nothing */
Vect_close();

Level 2 vector became level 1. I'm not sure why and how...

Please try #3459. If this works for you, you are welcome to include this single line in your PR and I close mine without merging.

@HuidaeCho
Copy link
Member Author

HuidaeCho commented Feb 26, 2024

Also, JSON streaming is something we need to consider for large data streaming. e.g., one edit (object) per line. Does JSON have the standard format for this? If this is not somehow supported, JSON data will need to be generated completely before we can start editing any features. Another consideration is its data size (almost twice 556 bytes vs. 270 bytes for the pipe format in the following example).

Go read about .jsonl, json line. It's quite useful and I think it's what you're looking for. It's like CSV, where there's one record per line, but each line is a separate JSON. So you can append to that file. I used that a couple years ago. And I was able to analyze with power BI/excel get&transform. Some other software already works with it. Other json format extensions I've heard of is json5 and jsonc, which you probably might happened to interact with if you already used vscode and their settings.

Now with support for multiple batch tables in one input, I don't see a strong need for JSON implementation in the module. I still think multiple formats should be supported more generally outside individual modules. The new option at least supports a subset1 of a well-documented version of CSV and Python users should be able to use the csv module.

Footnotes

  1. Not supporting multiline columns because of its unpredictable length, but might be useful for the add tool to embed ASCII input; I think one edit per line should be better?

@HuidaeCho
Copy link
Member Author

I was wrong. Level 2 vector always requires topology building to remain level 2 even if there are no changes. Just tried this:

Vect_open_update2();
/* do nothing */
Vect_close();

Level 2 vector became level 1. I'm not sure why and how...

Please try #3459. If this works for you, you are welcome to include this single line in your PR and I close mine without merging.

Thanks for the PR. With that change, Vect_close() did the job, but G_fatal_error() still downgrades the level to 1 without an error handler (minor annoyance ;-).

@metzm
Copy link
Contributor

metzm commented Feb 26, 2024

Thanks for the PR. With that change, Vect_close() did the job, but G_fatal_error() still downgrades the level to 1 without an error handler (minor annoyance ;-).

Should be fixed with the latest commit to #3459. In case of a crash, the original topo info is (should be) preserved.

@wenzeslaus
Copy link
Member

Can you please add a test? Both grass.gunittest and pytest would work here. pytest is now little easier than before because grass.script.create_location() does not require an existing session. grass.gunittest would be needed if you want to turn your example into a test. Python doc is quite good for CSV writing, but here is an example of CSV writer from a Python dictionary I recently wrote.

@HuidaeCho
Copy link
Member Author

Can you please add a test? Both grass.gunittest and pytest would work here. pytest is now little easier than before because grass.script.create_location() does not require an existing session. grass.gunittest would be needed if you want to turn your example into a test. Python doc is quite good for CSV writing, but here is an example of CSV writer from a Python dictionary I recently wrote.

I'll do that when I get a chance.

@landam
Copy link
Member

landam commented Mar 2, 2024

head -n2 /tmp/batch 
tool,cats,coords
break,1,"637108.11963224,257241.783755701"

v.edit map=roadsmajor_upper_half batch=/tmp/batch 
ERROR: Unknown batch column 'tool,cats,coords' in line 1

v.edit map=roadsmajor_upper_half batch=/tmp/batch sep=,
ERROR: Unable to open vector map <roadsmajor_upper_half> on topological
       level. Try to rebuild vector topology by v.build.

Please try it again.

Tested successfully.

Copy link
Member

@landam landam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view, this PR may be merged when a new test for a batch option will be also added (and required CI tests will pass)

@landam
Copy link
Member

landam commented Jun 3, 2024

@HuidaeCho Can we manage for 8.4.0 or will be postpone for 8.5? It is a new feature so I am not sure if it's a good idea to introduce it in 8.4.1 (?)

@wenzeslaus
Copy link
Member

This PR needs to wait for 8.5. It is failing existing unrelated tests which just happen to use gs.run_command("v.edit", map=name, tool="create").

This also reveals the issue that there are no tests for the current functionality. These can be added in a separate PR.

@wenzeslaus wenzeslaus modified the milestones: 8.4.0, 8.5.0 Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C Related code is in C docs enhancement New feature or request HTML Related code is in HTML module vector Related to vector data processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants