Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StatementListChanger #157

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
"wikimedia/assert": "~0.2.2"
},
"require-dev": {
"phpunit/phpunit": "~4.8",
"mediawiki/mediawiki-codesniffer": "~0.7",
"phpmd/phpmd": "~2.3"
},
Expand Down Expand Up @@ -57,7 +58,7 @@
"vendor/bin/phpcs src/* tests/* --standard=phpcs.xml --extensions=php -sp"
],
"phpmd": [
"vendor/bin/phpmd src/,tests/unit/ text phpmd.xml"
"vendor/bin/phpmd src/ text phpmd.xml"
]
}
}
174 changes: 174 additions & 0 deletions src/Statement/StatementListChanger.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
<?php

namespace Wikibase\DataModel\Services\Statement;

use Wikibase\DataModel\Entity\PropertyId;
use Wikibase\DataModel\Statement\Statement;
use Wikibase\DataModel\Statement\StatementList;

/**
* A collection of higher level utility functions to manipulate StatementList objects.
*
* @since 3.7
*
* @license GPL-2.0+
* @author Thiemo Mättig
*/
class StatementListChanger {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels to me like this should not be a service, but rather a builder or cursor style object: it should wrap a StatementList it gets in the constructor, and offer specialized operations on that StatementList. Perhaps the name should indicate what kind of extra operations these are.

Copy link
Contributor Author

@thiemowmde thiemowmde Nov 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very open for suggestions.

I definitely do not want this to be a set of static methods. This is something I would have done a few years ago. I learned. :-)

A builder does not work because I definitely want this to manipulate the original object, not create new ones.

Passing the StatementList via the constructor instead of each method does not change much, on the one hand. On the other hand it makes it much harder to replace this implementation with an other one. You can not simply make this an interface and inject an other implementation. You do not know the StatementList object on construction time. This approach would need a factory.

In conclusion I still think my current approach is the best.

the name should indicate what kind of extra operations these are.

What do you mean with "extra operations"? What is not obvious enough from the method names?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method names are fine, but the class name is very generic. "Changer" could be anything. The methods mainly relate to groups, so maybe the class should have "Groups" in the name. But I can't think of anything nice offhand. So whatever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a StatementGrouper interface with multiple implementations (including a ByPropertyIdStatementGrouper), as well as a ByPropertyIdArray (that's what we need to get rid of) and a ByPropertyIdGrouper. I wanted this here to be different from these.

This is a layer of code that works with a StatementList while being aware of the concept of "grouping by property ID". ByPropertyIdAwareStatementListChanger?


/**
* @var StatementList
*/
private $statementList;

public function __construct( StatementList $statementList ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

$this->statementList = $statementList;
}

/**
* Makes sure all statements with the same property are next to each other (forming a group),
* and reorders them if necessary. The position of the group in the list is determined by the
* first statement with the same property.
*/
public function groupByProperty() {
$byId = [];

foreach ( $this->statementList->toArray() as $statement ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is toArray needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is optional because StatementList itself is Traversable. But leaving the ->toArray() out will require an additional /** @var Statement $statement */ for type safety, and in most places we decided to go with ->toArray() instead.

$id = $statement->getPropertyId()->getSerialization();
$byId[$id][] = $statement;
}

// FIXME: Use StatementList::clear, see https://github.com/wmde/WikibaseDataModel/pull/649
$this->clear();

foreach ( $byId as $statements ) {
foreach ( $statements as $statement ) {
$this->statementList->addStatement( $statement );
}
}
}

private function clear() {
foreach ( $this->statementList->toArray() as $statement ) {
$this->statementList->removeStatementsWithGuid( $statement->getGuid() );
}
}

/**
* Adds a new statement to the list while respecting existing PropertyId groups.
*
* In contrast to the previously used ByPropertyIdArray implementation this method never moves
* existing statements around. The provided index is only a hint. The new statement is either
* added to an existing group of statements with the same PropertyId, or at a group border close
* to the provided index.
*
* @param Statement $newStatement
* @param int|null $index An absolute index in the list. If the index is not next to a statement
* with the same property, the closest possible position is used instead. Default is null,
* which adds the new statement after the last statement with the same property, or to the end.
*/
public function addToGroup( Statement $newStatement, $index = null ) {
$statements = $this->statementList->toArray();
$id = $newStatement->getPropertyId();

if ( $index === null ) {
$index = $this->getLastIndexWithinGroup( $statements, $id );
} else {
// Limit search range to avoid looping non-existing positions
$validIndex = min( max( 0, $index ), count( $statements ) );
$index = $this->getClosestIndexWithinGroup( $statements, $id, $validIndex );
if ( $index === null ) {
$index = $this->getClosestIndexAtGroupBorder( $statements, $validIndex );
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh This makes it really obvious that we should model groups directly as nested arrays, and get rid of the whole index crap completely.

Copy link
Contributor Author

@thiemowmde thiemowmde Feb 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean with "crap". Can you please avoid using such offensive language?

@Benestar suggested to replace our naive StatementList with one that's aware of PropertyId groups a long time ago. I'm not opposing such a change.

I still believe the code proposed here is a helpful step in a longer migration path that might end with replacing StatementList.

Note that the code I'm proposing here does have a very specific advantage: It allows us to add new statements where they belong, without the need to reorder existing statement lists. Not that this is of much relevance, because users usually do not see diffs on the JSON level. But still. This smooth migration is not possible with a StatementList that enforces grouping on construction time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was not a complaint about this patch. It's just one more place where it becomes obvious that insertion by index into a grouped list is broken by design. It's just bad interface design. And that bad design (aka "crap") is my fault; I remember the session with Denny where we designed this interface. So, sigh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the proposals we discussed had two numeric indexes: one for a relative position in the group, and one for moving the group to an other position. I found this harder to use than the current absolute index.

I'm still very happy with the super-trivial StatementList we have. Only very few places in our application need statements to be grouped, and the requirements are always a bit different. A "baked in" grouping on the DataModel level may help in some cases, but will not solve all use-cases, and will make others more complicated.


$this->statementList->addStatement( $newStatement, $index );
}

/**
* @param Statement[] $statements
* @param PropertyId $id
*
* @return int|null
*/
private function getLastIndexWithinGroup( array $statements, PropertyId $id ) {
// Start searching from the end and stop at the first match
for ( $i = count( $statements ); $i > 0; $i-- ) {
if ( $statements[$i - 1]->getPropertyId()->equals( $id ) ) {
return $i;
}
}

return null;
}

/**
* @param Statement[] $statements
* @param PropertyId $id
* @param int $index
*
* @return int|null
*/
private function getClosestIndexWithinGroup( array $statements, PropertyId $id, $index ) {
$longestDistance = max( $index, count( $statements ) - $index );

for ( $i = 0; $i <= $longestDistance; $i++ ) {
if ( $this->isWithinGroup( $statements, $id, $index - $i ) ) {
return $index - $i;
} elseif ( $i && $this->isWithinGroup( $statements, $id, $index + $i ) ) {
return $index + $i;
}
}

return null;
}

/**
* @param Statement[] $statements
* @param int $index
*
* @return int|null
*/
private function getClosestIndexAtGroupBorder( array $statements, $index ) {
$longestDistance = max( $index, count( $statements ) - $index );

for ( $i = 0; $i <= $longestDistance; $i++ ) {
if ( $this->isGroupBorder( $statements, $index - $i ) ) {
return $index - $i;
} elseif ( $i && $this->isGroupBorder( $statements, $index + $i ) ) {
return $index + $i;
}
}

return null;
}

/**
* @param Statement[] $statements
* @param PropertyId $id
* @param int $index
*
* @return bool
*/
private function isWithinGroup( array $statements, PropertyId $id, $index ) {
$count = count( $statements );

// Valid if the index either precedes or succeeds a statement with the same property
return $index > 0 && $index <= $count && $statements[$index - 1]->getPropertyId()->equals( $id )
|| $index >= 0 && $index < $count && $statements[$index]->getPropertyId()->equals( $id );
}

/**
* @param Statement[] $statements
* @param int $index
*
* @return bool
*/
private function isGroupBorder( array $statements, $index ) {
// First and last possible position is always a border
return $index <= 0
|| $index >= count( $statements )
|| !$statements[$index - 1]->getPropertyId()->equals( $statements[$index]->getPropertyId() );
}

}
Loading