Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues/1649 JanusGraph persistence documentation #77

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
312 changes: 151 additions & 161 deletions docs/developer-guide/getting-started-with-persistence.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,65 +4,33 @@

This page contains explanations and code samples for developers who need to store their entities into the database.

The Strongbox project uses [OrientDB](http://orientdb.com/orientdb/) as its internal persistent storage through the
corresponding `JPA` implementation and `spring-orm` middle tier. Also we use `JTA` for transaction management and
`spring-tx` implementation module from Spring technology stack.
The Strongbox project uses [JanusGraph](https://janusgraph.org/) as its internal persistent storage through the
corresponding [Gremlin](https://tinkerpop.apache.org/gremlin.html) implementation and [spring-data-neo4j](https://spring.io/projects/spring-data-neo4j#overview) middle tier. We also use `JTA` for transaction management and the `spring-tx` implementation module from the Spring technology stack.

## OrientDB Studio
## Persistence stack

As you are learning about Strongbox persistence, you may want to explore the existing persistence implementation.
For development environments, Strongbox includes an embedded OrientDB server as well as an embedded instance of
OrientDB Studio. By default, when you run the application from the source tree, you'll use the embedded database
server. However, OrientDB Studio is disabled by default.
We're using the following technology stack to deal with persistence:

### Running OrientDB Studio From Source Tree
- Embedded Cassandra as direct storage (`CassandraDaemon` allows us to have the Cassandra instance inside the same JVM as the application)
- JanusGraph as our graph DBMS (it is not directly a data storage, it just allows you to have access to data in the form of a graph)
- [Apache TinkerPop](http://tinkerpop.apache.org/docs/current/reference/) as a set of tools to interact with the database
- [spring-data-neo4j](https://github.com/spring-projects/spring-data-neo4j) to manage transactions in Spring with `Neo4jTransactionManager` and implement custom Cypher queries with Spring Data repositories (by custom queries via the `@org.springframework.data.neo4j.annotation.Query` annotation)
- [cypher-for-gremlin](https://github.com/opencypher/cypher-for-gremlin) which translates Cypher queries into Gremlin traversals (it has some issues which prevent us from using it for `neo4j-ogm` CRUD operations, these issues will be explained below)
- [neo4j-ogm](https://github.com/neo4j/neo4j-ogm) to map Java POJOs into Vertices and Edges of Graph
- We also use custom `EntityTraversalAdapters`, which implement anonymous Gremlin traversals for CRUD operations under `neo4j-ogm` entities.

To enable OrientDB Studio, you need only to set the property `strongbox.orientdb.studio.enabled` to `true`. You
can do this on the Maven command line by running Strongbox as follows:
## Vertices and Edges

```
$ mvn spring-boot:run -Dspring-boot.run.jvmArguments="-Dstrongbox.orientdb.studio.enabled=true"
```

There are two additional properties that can be used to configure OrientDB Studio:

- `strongbox.orientdb.studio.ip.address`
- `strongbox.orientdb.studio.port`

### Running OrientDB Studio From The Distribution

If you're running from the `tar.gz`, or `rpm` distributions, you can start Strongbox as follows to enable OrientDB Studio:

```
$ cd /opt/strongbox
$ STRONGBOX_VAULT=/opt/strongbox-vault STRONGBOX_ORIENTDB_STUDIO_ENABLED=true ./bin/strongbox console
```
Unlike a relational DBMS, Graph DBMS have vertices and edges, not rows and tables. So, in terms of Graph, every persistent entity should be stored as vertex or edge. An example of a vertex might be `Artifact` or `AritfactCoordinates` and the relation between them would be an edge. It should be noted that, unlike RDBMS, object relations are represented by a separate edge, instead of just a foreign key column in a table. In addition to vertices, persistence objects can also be edges -- for example, the `ArtifactDependency` would be an edge between `ArtifactCoordinates` vertices.

Please, note that the `STRONGBOX_VAULT` environment variable needs to be pointing to an absolute path for this to work.
## Gremlin Server

As with the source distribution, you can set additional environment variables to further configure OrientDB Studio:
`TODO`

```
$ export STRONGBOX_ORIENTDB_STUDIO_IP_ADDRESS=0.0.0.0
$ export STRONGBOX_ORIENTDB_STUDIO_PORT=2480
```

Once the application is running, you can login to OrientDB Studio by visiting
http://127.0.0.1:2480/studio/index.html in your browser. The initial credentials are `admin` and `password`.

![Login Screen](/assets/screenshots/orientdb-studio/login-screen.png)

After your login, you'll land on the Browse Screen, which allows you to query the embedded database.

![Browse Screen](/assets/screenshots/orientdb-studio/browse-screen.png)

Finally, you can explore the schema defined in the database by clicking `SCHEMA`.

![Schema Screen](/assets/screenshots/orientdb-studio/schema-screen.png)

## Adding Dependencies

Let's assume that you, as a Strongbox developer, need to create a new module or write some persistence code in an
Let's assume that you, as a Strongbox developer, need to create a new module, or write some persistence code in an
existing module that does not contain any persistence dependencies yet. (Otherwise you will already have the proper
`<dependencies/>` section in your `pom.xml`, similar to the one in the example below). You will need to add the
following code snippet to your module's `pom.xml` under the `<dependencies>` section:
Expand All @@ -75,173 +43,195 @@ following code snippet to your module's `pom.xml` under the `<dependencies>` sec
</dependency>
```

Notice that there is no need to define any direct dependencies on OrientDB or Spring Data - it's already done via
Notice that there is no need to define any direct dependencies on JanusGraph or Spring Data - it's already done via
the `strongbox-data-service` module.

## Creating Your Entity Class

Let's now assume that you have a POJO and you need to save it to the database (and that you probably have at least
CRUD operation's implemented in it as well). Place your code under the `org.carlspring.strongbox.domain.yourstuff`
package. For the sake of the example, let's pick `MyEntity` as the name of your entity.
CRUD operations implemented in it as well). Place your code under the `org.carlspring.strongbox.domain`
package. For the sake of the example, let's pick `PetEntity` as the name of your entity.

If you want to store that entity properly you need to adopt the following rules:

* Extend the `org.carlspring.strongbox.data.domain.GenericEntity` class to inherit all required fields and logic from
the superclass.
* Define getters and setters according to the `JavaBeans` coding convention for all non-transient properties in your
class.
* Define a default empty constructor for safety (even if the compiler will create one for you, if you don't define any
other constructors) and follow the `JPA` and `java.io.Serializable` standards.
* Override the `equals() `and `hashCode()` methods according to java `hashCode` contract (because your entity could be
used in collection classes such as `java.util.Set` and if you don't define such methods properly other developers or
yourself will be not able to use your entity).
* _Optional_ - define a `toString()` implementation to let yourself and other developers see something meaningful in
the debug messages.
* Create the interface for your entity with all the getters and setters that are required to interact with the entity, according to the `JavaBeans` coding convention. This interface should extend `org.carlspring.strongbox.data.domain.DomainObject`. We need an interface in order to hide the implementation-specific details that depend on the underlying database, such as inheritance strategy.
* Create the entity class which implements the above interface and extend to `org.carlspring.strongbox.data.domain.DomainEntity`.
* Declare an entity class with `@NodeEntity` or `@RelationshipEntity`.
* Define a default empty constructor, as this would be required in order to create entity instances from `neo4j-ogm` internals.

The complete source code example that follows all requirements should look something like this:

```java
package org.carlspring.strongbox.domain;

import org.carlspring.strongbox.data.domain.GenericEntity;

import com.google.common.base.Objects;

public class MyEntity
extends GenericEntity
@NodeEntity("Pet")
public class PetEntity
extends DomainEntity
implements Pet
{

private String property;
private Integer age;

public MyEntity()
public PetEntity()
{
}

public String getProperty()
@Override
public Integer getAge()
{
return property;
return age;
}

public void setProperty(String property)
@Override
public void setAge(Integer age)
{
this.property = property;
this.age = age;
}

}
```

## Creating a `EntityTraversalAdapter`

As mentioned above, besides `neo4j-ogm` and `spring-data-neo4j`, we were forced to use custom CRUD implementations based on Gremlin. This has its advantages, as it allows us to optimize OGM entities and make them faster than what the common `neo4j-ogm` provides out of the box. The main thing of the Gremlin based CRUD is `EntityTraversalAdapter` which is a strategy for create/update/read/delete operations. The concrete `EntityTraversalAdapter` provides [Anonymous Traversals](http://tinkerpop.apache.org/docs/current/tutorials/gremlins-anatomy/) for each operation of the specific entity type. These traversals are used in Gremlin-based repositories to perform common CRUD operations:

- `fold` : to construct entity instance based on vertex/edge and its properties
- `unfold` : to extract entity properties into vertex/edge and its properties
- `cascade` : to cascade other vertices/edges within delete if needed

Basically these all these operations are implemented using special `__` class, which represent anonymous traversal in Gremlin.

The `EntityTraversalAdapter` implementations can also use each other to support relations between entities, inheritance and cascade operations.

Below is the code example of `EntityTraversalAdapter` implementation for `PetEntity`:

```java
package org.carlspring.strongbox.gremlin.adapters;

import static org.carlspring.strongbox.gremlin.adapters.EntityTraversalUtils.extractObject;

import java.util.Collections;
import java.util.Map;
import java.util.Set;

import org.apache.tinkerpop.gremlin.process.traversal.Traverser;
import org.apache.tinkerpop.gremlin.structure.Element;
import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.carlspring.strongbox.domain.Pet;
import org.carlspring.strongbox.domain.PetEntity;
import org.carlspring.strongbox.gremlin.dsl.EntityTraversal;
import org.carlspring.strongbox.gremlin.dsl.__;
import org.springframework.stereotype.Component;

@Component
public class PetAdapter extends VertexEntityTraversalAdapter<Pet>
{

@Override
public boolean equals(Object o)
public Set<String> labels()
{
if (this == o)
{
return true;
}
if (o == null || getClass() != o.getClass())
{
return false;
}

MyEntity myEntity = (MyEntity) o;
return Collections.singleton("Pet");
}

return Objects.equal(property, myEntity.property);
@Override
public EntityTraversal<Vertex, Pet> fold()
{
return __.<Vertex, Object>project("uuid", "age")
.by(__.enrichPropertyValue("uuid"))
.by(__.enrichPropertyValue("age"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you perhaps add some details about these oddly named classes/variables? (I know it was Gremlin, or Cypher, but it would probably help others get a better understanding of what this black magic spell is all about :) ). Thanks! :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

.map(this::map);
}

private Pet map(Traverser<Map<String, Object>> t)
{
PetEntity result = new PetEntity();
result.setUuid(extractObject(String.class, t.get().get("uuid")));
result.setAge(extractObject(Integer.class, t.get().get("age")));

return result;
}

@Override
public int hashCode()
public UnfoldEntityTraversal<Vertex, Vertex> unfold(Pet entity)
{
return Objects.hashCode(property);
EntityTraversal<Vertex, Vertex> t = __.<Vertex>identity();
if (entity.getAge() != null)
{
t = t.property(single, "age", entity.getAge());
}

return new UnfoldEntityTraversal<>("Pet", t);
}

@Override
public String toString()
public EntityTraversal<Vertex, ? extends Element> cascade()
{
final StringBuilder sb = new StringBuilder("MyEntity{");
sb.append("property='").append(property).append('\'');
sb.append('}');

return sb.toString();
return __.identity();
}

}
```

## Creating a DAO Layer
```

First of all you will need to extend the `CrudService` with the second type parameter that corresponds to your ID's data type. Usually it's just strings.
## Creating a `Repository`

All the database interactions should be done through repositories. For the compatibility with `spring-data`, we use `org.springframework.data.repository.CrudRepository` as a basis for our repositories. The base class for implementing `EntityTraversalAdapter`-based repositories is `org.carlspring.strongbox.gremlin.repositories.GremlinRepository`. Further repository implementation depends on the type of entity; for vertex-backed entities, it should be `GremlinVertexRepository`.
In addition to CRUD operations, there is also the need to be able to select data using queries. Queries could be implemented using [Cypher](https://neo4j.com/docs/cypher-manual/current/introduction/) through `spring-data-neo4j` using the `@org.springframework.data.neo4j.annotation.Query` annotation. So, the final repository should be a mixin that extends `GremlinRepository` and delegates custom `Cypher` queries to the `org.springframework.data.repository.Repository` instance provided by `spring-data-neo4j`.

!!! tip "To read more about ID's in OrientDB, check the <a href='http://orientdb.com/docs/2.0/orientdb.wiki/Tutorial-Record-ID.html' target='_blank'>manual</a>"
Putting together all the above, the repository for the `PetEntity` will look like below:

```java
package org.carlspring.strongbox.users.service;
package org.carlspring.strongbox.repositories;

import org.carlspring.strongbox.data.service.CrudService;
import org.carlspring.strongbox.users.domain.MyEntity;
import javax.inject.Inject;

import org.springframework.transaction.annotation.Transactional;
import org.carlspring.strongbox.domain.Pet;
import org.carlspring.strongbox.gremlin.adapters.PetAdapter;
import org.carlspring.strongbox.gremlin.repositories.GremlinVertexRepository;
import org.springframework.stereotype.Repository;

/**
* CRUD service for managing {@link MyEntity} entities.
*
* @author Alex Oreshkevich
*/
@Transactional
public interface MyEntityService
extends CrudService<MyEntity, String>
@Repository
public class PetRepository extends GremlinVertexRepository<Pet>
implements PetQueries
{

MyEntity findByProperty(String property);
@Inject
PetAdapter adapter;

@Inject
PetQueries queries;

}
```
@Override
protected PetAdapter adapter()
{
return adapter;
}

After that you will need to define an implementation of your service class.

Follow these rules for the service implementation:

* Inherit your CRUD service from `CommonCrudService<MyEntity>` class;
* Name it like your service interface with an `Impl` suffix, for example `MyEntityServiceImpl`;
* Annotate your class with the Spring `@Service` and `@Transactional` annotations;
* Do **not** define your service class as public and use interface instead of class for injection (with `@Autowired`);
this follows the best practice principles from Joshua Bloch 'Effective Java' book called Programming to Interface;
* _Optional_ - define any methods you need to work with your `MyEntity` class; these methods mostly should be based on
common API form `javax.persistence.EntityManager`, or custom queries (see example below);

* !!! warning "Avoid query parameters construction through string concatenation!"
Please avoid using query parameter construction through string concatenation!
This usually leads to [SQL Injection](https://en.wikipedia.org/wiki/SQL_injection) issues!
Bad query example:
`String sQuery = "select * from MyEntity where proprety='" + propertyValue + "'"`;
What you should do instead is to create a service which does properly assigns the parameters.
Here's an example service:
```java
@Transactional
public class MyEntityServiceImpl
extends CommonCrudService<MyEntity> implements MyEntityService
{
public MyEntity findByProperty(String property)
{
String sQuery = "select * from MyEntity where property = :propertyValue";

OSQLSynchQuery<Long> oQuery = new OSQLSynchQuery<Long>(sQuery);
oQuery.setLimit(1);

HashMap<String, String> params = new HashMap<String, String>();
params.put("propertyValue", property);

List<MyEntity> resultList = getDelegate().command(oQuery).execute(params);
return !resultList.isEmpty() ? resultList.iterator().next() : null;
}
}
```

## Register entity schema in EntityManager
Before using entities you will need to register them. Consider the following example:
List<Pet> findByAgeGreater(Integer age)
{
return queries.findByAgeGreater(age);
}

```java
@Inject
private OEntityManager oEntityManager;
}

@PostConstruct
public void init()
@Repository
interface PetQueries
extends org.springframework.data.repository.Repository<Pet, String>
{
oEntityManager.registerEntityClass(MyEntity.class);

@Query("MATCH (pet:Pet) " +
"WHERE pet.age > $age " +
"RETURN pet")
List<Pet> findByAgeGreater(@Param("age") Integer age);

}
```

## Issues of `cypher-for-gremlin` and `neo4j-ogm`

The first issue that we have, is the fact that `cypher-for-gremlin` does not fully suport all Cypher syntax that is produced by `neo4j-ogm` for CRUD operations. To be more specific, on every CRUD operation, `neo4j-ogm` generates a Cypher query which is then translated to Gremlin by `cypher-for-gremlin`. As a workadound, we modify Cypher queries produced by `neo4j-ogm` and replace some clauses (see `org.opencypher.gremlin.neo4j.ogm.request.GremlinRequest`).

Another issue is that `cypher-for-gremlin` has an ambiguous concept for working with `null` values in Gremlin. They put a lot of noisy tokens into Gremlin traversals which prevents the JanusGraph engine from matching expected indexes. This, in term, causes heavy full-scans on every query (see [#342](https://github.com/opencypher/cypher-for-gremlin/issues/342)). This was the main reason why we couldn't use the `neo4j-ogm` for CRUD operations.

Either way, we are still using it for custom Cypher queries via the `@org.springframework.data.neo4j.annotation.Query` annotation. This is a good option to have Cypher queries, instead of Gremlin ones, because it looks more clear and takes less time to read and write queries.