Slowness #3

pietermartin · 2016-01-28T15:59:18Z

    @Test
    public void testBlazeGraph() {
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        final BlazeGraphEmbedded g = BlazeGraphFactory.open(journalFile);
        for (int i = 0; i < 10000; i++) {
            g.addVertex(T.label, "Person", "name", "xxxxxx");
        }
        g.tx().commit();
        stopWatch.stop();
        System.out.println(stopWatch.toString());
        stopWatch.reset();
        stopWatch.start();
        Assert.assertEquals(10000, g.traversal().V().hasLabel("Person").count().next().intValue());
        stopWatch.stop();
        System.out.println(stopWatch.toString());
    }

0:03:40.845 for insert
0:00:00.164 for select

The select is fine but the insert time is way to slow.
The time seem to be spent in com.bigdata.relation.accesspath.BlockingBuffer$BlockingIterator._hasNext()

The same code on Sqlg (Postgres) inserts in 1.2 seconds.
I did not test Neo4j but it is in general way faster than Sqlg being embedded.

Expecting sub second times for an embedded graph.

The text was updated successfully, but these errors were encountered:

beebs-systap · 2016-01-28T16:25:10Z

Adding @mikepersonick. Tryit with the https://github.com/blazegraph/tinkerpop3#bulk-load-api.

pietermartin · 2016-01-28T18:54:16Z

Ok, much faster 0:00:02.066 for the insert in batch mode.
Still slow though.

neo4j takes 0:00:00.243 in normal mode.
sqlg takes 0:00:00.269 in normal mode
sqlg takes in 0:00:00.130 (postgres copy command underneath).

Reckon there is locking going on as embedded mode really should write faster.

beebs-systap · 2016-01-28T19:20:14Z

OK -- makes sense. That difference part of the Tinkerpop3 design per @mikepersonick:

Incremental update does strict checking on vertex and edge id re-use and enforcement of property key cardinality. Both of these validations require a read against the database indices. Blazegraph benefits greatly from buffering and batch inserting statements into the database indices. Buffering and batch insert are defeated by interleaving reads and removes into the loading process, which is what happens with the normal validation steps in incremental update mode.

Would be interested in relatively numbers with much larger data scales for loading and also with executing of traversal work loads. We'll do some testing ourselves as well. Please pass along any testing that you have.

pietermartin · 2016-01-28T19:27:10Z

Incremental update does strict checking on vertex and edge id re-use

I do not really follow that part as, at least in the code above, the user does not specify the ids so there should be nothing to check?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slowness #3

Slowness #3

pietermartin commented Jan 28, 2016

beebs-systap commented Jan 28, 2016

pietermartin commented Jan 28, 2016

beebs-systap commented Jan 28, 2016

pietermartin commented Jan 28, 2016

Slowness #3

Slowness #3

Comments

pietermartin commented Jan 28, 2016

beebs-systap commented Jan 28, 2016

pietermartin commented Jan 28, 2016

beebs-systap commented Jan 28, 2016

pietermartin commented Jan 28, 2016