Skip to content

Rexster Format

Dan LaRocque edited this page Sep 5, 2014 · 38 revisions
This is the documentation for Faunus 0.4.
Faunus was merged into Titan and renamed Titan-Hadoop in version 0.5.
Documentation for the latest Titan version is available at http://s3.thinkaurelius.com/docs/titan/current.

  • InputFormat: com.thinkaurelius.faunus.formats.rexster.RexsterInputFormat

Rexster is a graph server that exposes any Blueprints graph (e.g. TinkerGraph, Neo4j, OrientDB, DEX, Titan, and Sail RDF Stores) through REST and a binary protocol called RexPro. (See the Benefits of Rexster).

Rexster and Faunus

Faunus can be configured to be used with Rexster (version 2.1.0+, see the Rexster Operations section of Gotchas and Limitations) through a Faunus Rexster Kibble (also known as an Extension). The purpose of this extension is to allow Faunus to work with other Blueprints implementations that do not have a direct InputSource for Faunus.

The easiest way to get started is with one of the standard “toy” graphs that comes with Rexster: the Grateful Dead Graph. To deploy Faunus to Rexster, first make a directory as in:

mkdir REXSTER_HOME/ext/faunus

Then, if building Faunus from source:

cp FAUNUS_HOME/target/faunus-x.y.z-standalone/lib/*.* REXSTER_HOME/ext/faunus

or, if using the Faunus zipped distribution download:

cp FAUNUS_HOME/lib/*.* REXSTER_HOME/ext/faunus

NOTE: Future releases will have a much more efficient/simple packaging model.

Next, edit the following segment of the rexster.xml file to tell Rexster to expose the Faunus Kibble on the gratefulgraph:

<graph>
    <graph-name>gratefulgraph</graph-name>
    <graph-type>com.tinkerpop.rexster.config.TinkerGraphGraphConfiguration</graph-type>
    <graph-location>data/graph-example-2</graph-location>
    <extensions>
        <allows>
            <allow>tp:gremlin</allow>
            <allow>faunus:rexsterinputformat</allow>
         </allows>
     </extensions>
 </graph>

Start Rexster (See Getting Started with Rexster) and note the inclusion of the Faunus Kibble in the console on gratefulgraph:

rexster$ bin/rexster.sh -s -c ./bin/rexster.xml
[INFO] WebServer - .:Welcome to Rexster:.
...
[INFO] RexsterApplicationGraph - Graph [gratefulgraph] - configured with allowable namespace [tp:gremlin]
[INFO] RexsterApplicationGraph - Graph [gratefulgraph] - configured with allowable namespace [faunus:rexsterinputformat]
...
[INFO] WebServer - Rexster Server running on: [http://localhost:8182]
[INFO] WebServer - Rexster configured with no security.
[INFO] WebServer - RexPro serving on port: [8184]
[INFO] ShutdownManager$ShutdownSocketListener - Bound shutdown socket to /127.0.0.1:8183. Starting listener thread for shutdown requests.

Next, a rexster-input.properties is created with the following properties. Note that bin/rexster-input.properties is provided by default with Faunus.

faunus.graph.input.format=com.thinkaurelius.faunus.formats.rexster.RexsterInputFormat
faunus.graph.input.rexster.hostname=127.0.0.1
faunus.graph.input.rexster.port=8182
faunus.graph.input.rexster.ssl=false
faunus.graph.input.rexster.graph=gratefulgraph
faunus.graph.input.rexster.v-estimate=800

faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=output
faunus.output.location.overwrite=true

From here, any Faunus job can be run where now, the source data is being pulled from Rexster. Start a Gremlin terminal from Faunus:

faunus$ bin/gremlin.sh

Then do a simple label distribution:

gremlin> g = FaunusFactory.open('bin/rexster-input.properties')
==>faunusgraph[rexsterinputformat->graphsonoutputformat]
gremlin> g.V.in('followed_by').name.groupCount()
12/09/18 19:01:36 INFO mapreduce.FaunusCompiler: Compiled to 2 MapReduce job(s)
...
==>A MIND TO GIVE UP LIVIN	1
==>ADDAMS FAMILY	2
==>AINT SUPERSTITIOUS	9
==>ALABAMA GETAWAY	14
==>ALL ALONG THE WATCHTOWER	26
==>ALTHEA	58
==>ARE YOU LONELY FOR ME	
==>...

Configuration Cheatsheet

All settings Rexster specific configuration settings in bin/rexster-input.properties are prefixed with faunus.graph.input.rexster.

Setting Description
hostname The IP address or hostname of the Rexster server.
port The port that Rexster is serving the REST API from.
ssl Tells Faunus if it should connect to Rexster with http or https.
graph The name of the graph as configured in Rexster.
v-estimate The estimated number of vertices in the target graph. Helps Faunus understand how to split the job into even bits for better efficiency in processing. Set this value to -1 to have Faunus request the actual vertex count from Rexster. This configuration will come at the expense of a full iteration of the vertices in the graph.
username The username to send on the authentication header, if basic Rexster Security is configured.
password The password to send on the authentication header, if basic Rexster Security is configured.

Blueprints Graph Implementations

Rexster can expose any Blueprints graph to Faunus. It is important to note that Faunus is only capable of working with graph vertex and edge identifiers that are of the long data type. Using a Blueprints graph that does not have identifiers that resolve to long will produce errors.

OrientDB Configuration

OrientDB does not use long identifiers and instead has a compound identifier which consists of a cluster id and unique identifier for the item in the cluster. The Faunus Kibble is capable of converting this compound identifier to something Faunus can operate with. To tell the Faunus Kibble to convert the compound identifier to long, add the following configuration to rexster.xml for any OrientDB database:

<graph>
    <graph-name>orientdbsample</graph-name>
    <graph-type>orientgraph</graph-type>
    <graph-location>local:/tmp/orientdb</graph-location>
    <properties>
        <username>admin</username>
        <password>admin</password>
    </properties>
    <extensions>
        <allows>
            <allow>tp:gremlin</allow>
            <allow>faunus:rexsterinputformat</allow>
         </allows>
         <extension>
	     <namespace>faunus</namespace>
             <name>rexsterinputformat</name>
             <configuration>
                 <id-handler>orientdb</id-handler>
             </configuration>
         </extension>
     </extensions>
 </graph>

Titan BerkeleyDB Configuration

The Titan BerkeleyDB can be exposed to Faunus via Rexster. It does not expose long identifiers for edges through its Blueprints interface. The Faunus Kibble is capable of converting the identifier it does use to something Faunus can operate with. To tell the Faunus Kibble to convert the identifier to long, add the following configuration to rexster.xml for any Titan BerkeleyDB database:

<graph>
  <graph-name>titanexample</graph-name>
  <graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type>
  <graph-location>/tmp/titan</graph-location>
  <graph-read-only>false</graph-read-only>
  <properties>
    <storage.backend>local</storage.backend>
    <buffer-size>100</buffer-size>
  </properties>
  <extensions>
    <allows>
      <allow>tp:gremlin</allow>
      <allow>faunus:rexsterinputformat</allow>
    </allows>
    <extension>
      <namespace>faunus</namespace>
      <name>rexsterinputformat</name>
      <configuration>
        <id-handler>titan-berkeleyje</id-handler>
      </configuration>
    </extension>
  </extensions>
</graph>

It is important to note that this configuration is for BerkeleyDB configuration only. For all other modes of Titan operations (i.e. Cassandra), use the appropriate Titan Formats.

Usage in EC2

It’s not difficult to get Faunus running with Rexster in Amazon EC2. There are just a few EC2 and Rexster configuration steps to consider in addition to the instructions for Running Faunus on Amazon EC2.

After downloading Rexster to the EC2 instance, ensure that the base-uri and rexster-server-host configuration properties of rexster.xml are set to the private IP address of the EC2 instance. The configuration should look something like the following:

<rexster>
  ...
  <rexster-server-host>10.118.95.50</rexster-server-host>
  <base-uri>http://10.118.95.50</base-uri>
  ...
</rexster>

By default, Faunus creates an EC2 security group called jclouds#faunuscluster, which all the Hadoop nodes are created in. To allow the nodes in this cluster to talk to the Rexster instance, the security group that Rexster is in must allow access to that security group.

To provide this access, first utilize Whirr to establish the Hadoop cluster (as described here) and find the jclouds#faunuscluster security group in the Amazon EC2 Console:

Take note of the security group identifier. In the case above, it is sg-02e38b6a. Edit the Inbound settings for the security group that Rexster is in. In the following screenshot, Rexster exists in a security group that is aptly named “Rexster Group”.

Add a rule that allows the jclouds#faunuscluster to access the Rexster Group over port 8182 (the default port established in rexster.xml). Utilize the security group identifier as shown in the above screenshot to create this rule. It is now possible to run a Faunus job against Rexster.