Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowness #3

Open
pietermartin opened this issue Jan 28, 2016 · 4 comments
Open

Slowness #3

pietermartin opened this issue Jan 28, 2016 · 4 comments

Comments

@pietermartin
Copy link

    @Test
    public void testBlazeGraph() {
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        final BlazeGraphEmbedded g = BlazeGraphFactory.open(journalFile);
        for (int i = 0; i < 10000; i++) {
            g.addVertex(T.label, "Person", "name", "xxxxxx");
        }
        g.tx().commit();
        stopWatch.stop();
        System.out.println(stopWatch.toString());
        stopWatch.reset();
        stopWatch.start();
        Assert.assertEquals(10000, g.traversal().V().hasLabel("Person").count().next().intValue());
        stopWatch.stop();
        System.out.println(stopWatch.toString());
    }

0:03:40.845 for insert
0:00:00.164 for select

The select is fine but the insert time is way to slow.
The time seem to be spent in com.bigdata.relation.accesspath.BlockingBuffer$BlockingIterator._hasNext()

The same code on Sqlg (Postgres) inserts in 1.2 seconds.
I did not test Neo4j but it is in general way faster than Sqlg being embedded.

Expecting sub second times for an embedded graph.

@beebs-systap
Copy link
Contributor

@pietermartin
Copy link
Author

Ok, much faster 0:00:02.066 for the insert in batch mode.
Still slow though.

neo4j takes 0:00:00.243 in normal mode.
sqlg takes 0:00:00.269 in normal mode
sqlg takes in 0:00:00.130 (postgres copy command underneath).

Reckon there is locking going on as embedded mode really should write faster.

@beebs-systap
Copy link
Contributor

OK -- makes sense. That difference part of the Tinkerpop3 design per @mikepersonick:

Incremental update does strict checking on vertex and edge id re-use and enforcement of property key cardinality. Both of these validations require a read against the database indices. Blazegraph benefits greatly from buffering and batch inserting statements into the database indices. Buffering and batch insert are defeated by interleaving reads and removes into the loading process, which is what happens with the normal validation steps in incremental update mode.

Would be interested in relatively numbers with much larger data scales for loading and also with executing of traversal work loads. We'll do some testing ourselves as well. Please pass along any testing that you have.

@pietermartin
Copy link
Author

Incremental update does strict checking on vertex and edge id re-use

I do not really follow that part as, at least in the code above, the user does not specify the ids so there should be nothing to check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants