Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce StatementGroup #507

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Introduce StatementGroup #507

wants to merge 2 commits into from

Conversation

Benestar
Copy link
Contributor

Per #479 (comment)

The reasons for this are that in the UI we already group all Statements by PropertyId and will also group them by rank in future. To support clean indexing, we have to replicate this structure in the DataModel as well.

@Benestar Benestar changed the title [DRAFT] Introduce StatementGroup [RFC] Introduce StatementGroup Jun 23, 2015
@JeroenDeDauw
Copy link
Contributor

I'm not seeing the value in indexing the things in the group by rank. We do not currently need that or benefit from having it, so YAGNI.

That the UI will use some kind of grouping is not all that relevant here. Especially if you consider that there is a layer between the UI and the PHP: the DataModel serialization.

@Benestar
Copy link
Contributor Author

The issue is that we will encounter the same indexing issue we currently have with StatementList. In the UI/Api we want to provide one index for the statement group and one for a statement within the group. However, if the statements are grouped by rank, this has to be taken into consideration when applying the index (it's then an index for a statement per term group inside the statement group).

Also, this indexing makes it easier to find the best statements of a StatementGroup etc. I think it is logical and intuitive to do such indexing because we have lots of relevant usecases for it.

@JeroenDeDauw
Copy link
Contributor

How about first doing the refactoring that introduces StatementGroup, and once that is done do the rank stuff?

The need for the rank stuff is not clear to me. Which does not mean it is not there. It does mean that before I can +2 this, I need to look into that more. So if the part without the rank stuff is done first, we can already move forwards with that.

class StatementGroup implements PropertyIdProvider {

/**
* @var Statement[][]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most IDEs don't understand type hints with [][]. I prefer array[].

@thiemowmde
Copy link
Contributor

This draft is incomplete. Currently there is no way to access the statements. I think having such methods in the proposal is crucial to understand it's practical impact.

In general, I wonder what the benefit over StatementList::getByRank is? Other than performance implications? I can see that this proposal will perform much better in specific use cases. But I do not see a new feature that we currently miss.

@Benestar
Copy link
Contributor Author

@thiemowmde let's talk about this when I'm at the office next week, ok? I just had this idea but didn't complete it because of time but still wanted to upload the things I started. It's basically a step to fix https://phabricator.wikimedia.org/T99243

@thiemowmde
Copy link
Contributor

Sure. My comment was more a note to myself and to let others know what I think. Not meant to be a voting of any kind.

@Benestar Benestar changed the title [RFC] Introduce StatementGroup Introduce StatementGroup Aug 26, 2015
@Benestar
Copy link
Contributor Author

Possible follow-ups:

  • Create a StatementGroupList
  • Add methods to remove/reindex statements
  • Use this as the default representation of statements (ie. replace StatementListProvider with StatementGroupListProvider, the reasoning behind this is that in the serialization, the ui and lots of other places we access statements by property id so this seems to be the natural data structure to work with.)

@thiemowmde
Copy link
Contributor

I disagree with "the default representation" proposal. This is more a Wikidata specific UI thing and should not be enforced on all implementations that use statements. However, I think this is a welcome addition. Please try to add the proposed StatementGroupList to this or a dependent patch, otherwise this class alone is not really helpful.

@JeroenDeDauw
Copy link
Contributor

@Benestar do you have any interest in continuing this?


I did a quick review of the code and noticed how StatementGroup is basically just StatementList. After some thinking the only sane approach I could find is to use StatementList internally in StatementGroup. This allows not duplicating StatementList functionality in StatementGroup while constraining the allowed statements and while not violating LSP.

That does not leave much in StatementGroup (which is not a problem), though likely some of the things from https://github.com/wmde/WikibaseDataModelServices/pull/157/files also belong in it.


@thiemowmde do you remember why you went with creating a StatementListChanger instead of the approach that was already started here? StatementGroup + StatementGroupList seems better to me.

@thiemowmde
Copy link
Contributor

This patch here is incomplete. The one essential thing the proposed StatementGroup class does is checking if the statements all have the same property. But this alone is not enough. Further guarantees are needed, e.g. an Item should not contain two groups with the same property. Where is this going to be implemented? In the Item class? In the existing StatementList? In a new StatementGroupList?

How would a migration path for any of this look like? Phase out StatementList? If not, where will the new class be used, and how often? How often will flat lists be converted into groups, and back, and how will this affect performance in a real-live production scenario?

I checked the code that interacts with StatementLists multiple times over the years, and only very few places work with statements that are grouped by property. Grouping is not always done by property. Sometimes it's done by rank, for example. Most code does not care, and would become unnecessary complex if it could not get a flat list of statements any more, or needs to convert objects first. What I wrote in #507 (comment) as well as wmde/WikibaseDataModelServices#157 still holds.

@JeroenDeDauw
Copy link
Contributor

I suggest to leave Item as it is. No need for us to deal with migration or the problems you just mentioned. StatementGroup stuff would just be used by code that deals with groups, meaning we just need to be able to go from StatementList to StatementGroup stuff and back, which is easy to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants