Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

textdiff has inconsistent range output between formats #334

Open
JonahSussman opened this issue Feb 1, 2024 · 3 comments
Open

textdiff has inconsistent range output between formats #334

JonahSussman opened this issue Feb 1, 2024 · 3 comments
Assignees
Labels

Comments

@JonahSussman
Copy link

Hi all,

I compiled the latest version of Gumtree from the source and have been using it. Really great project! I'm having some issues with the textdiff command, unfortunately. It appears that the output between the three formats is inconsistent with each other, mainly with their ranges.

(Not that I think it matters, but I'm currently running Fedora 38.)

Steps to Reproduce

I have the following Java files, A.java, B1.java, and B2.java, which I have attached to this issue:

  • A.java is the original file and contains a simple class with one field called qwerty.
  • B1.java changes the field's name to asdfg and adds System.out.println("hello") to the constructor.
  • B2.java is the same as B1.java, but with 10 extra lines added immediately before the curly brace opening the class.

I then ran the following (I've attached these files to my issue as well):

./gumtree textdiff A.java B1.java -f TEXT > A-B1-text.txt
./gumtree textdiff A.java B1.java -f JSON > A-B1-json.txt
./gumtree textdiff A.java B1.java -f XML  > A-B1-xml.txt
./gumtree textdiff A.java B2.java -f TEXT > A-B2-text.txt
./gumtree textdiff A.java B2.java -f JSON > A-B2-json.txt
./gumtree textdiff A.java B2.java -f XML  > A-B2-xml.txt

Behavior

All three formats handled update-node the same way, but there is an inconsistency between their handling of insert-tree. Each insert-tree has tree as an ExpressionStatement and parent as a Block. This is correct. However, the ranges differ between formats. Here's the inconsistency shown in a table:

Format A to B1 A to B2
TEXT tree: [116, 144]
parent: [111, 116]
tree: [126, 154]
parent: [111, 116]
JSON tree: [116, 144]
parent: [110, 148]
tree: [126, 154]
parent: [120, 158]
XML tree: [116, 144]
parent: [110, 148]
tree: [126, 154]
parent: [120, 158]

It appears that TEXT is taking parent from A.java, while JSON and XML are taking parent from B1/2.java. You can verify this because TEXT's parent fields do not change between the two diffs, while JSON's and XML's increase by 10.

Which file format has the correct version? Personally, I think TEXT makes the most sense, as you are inserting the tree at a point in the old tree.

Files

@JonahSussman
Copy link
Author

I did some digging within the code and found this line in this file:

Action ins = new Insert(x, copyToOrig.get(z), k);

Insert is defined as the following:

public class Insert extends Addition {
public Insert(Tree node, Tree parent, int pos) {
super(node, parent, pos);
}
@Override
public String getName() {
return "insert-node";
}
}

From this, we can see that the parent field is supposed to reference the source tree, not the destination tree. This confirms that the expected behavior should be what TEXT is doing, and JSON and XML are somehow doing something wrong.

@jrfaller jrfaller self-assigned this Feb 4, 2024
@jrfaller jrfaller added the bug label Feb 4, 2024
@jrfaller
Copy link
Member

jrfaller commented Feb 4, 2024

Thanks a lot for the detailed report! You're correct, it seems like a bug. I will try to investigate this.

@jrfaller
Copy link
Member

jrfaller commented Feb 14, 2024

OK I see the problem.

The text serializer uses this code to output the text for insert actions. It defers the computation to this method, which resort to the parent node stored in the action, which belongs to the source tree as you noticed here.

For the other serializer, XML for instance, it uses this code which uses the parent node supplied as a parameter here that is the parent coming from the destination tree which is a different behavior.

I think we should indeed have consistent behavior but I am not sure what is the right solution.

An inserted node only exists in the destination tree therefore if one wants to look up the code corresponding to this node together with the parent, she should look inside the destination file, that's why I am more convinced by the behavior of the XML and JSON formatters. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants