Skip to content

Commit

Permalink
Merge pull request #5 from pietercolpaert/development
Browse files Browse the repository at this point in the history
Hardf Parser and Writer functions
  • Loading branch information
pietercolpaert authored Apr 17, 2017
2 parents 893a0c2 + db373e8 commit 7c301dd
Show file tree
Hide file tree
Showing 13 changed files with 4,818 additions and 11 deletions.
5 changes: 4 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
language: php
php:
- "7.0"
- '5.6'
- '7.0'
- '7.1'
- nightly
sudo: false
env:
before_script:
Expand Down
193 changes: 183 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
# hardf
[![Build Status](https://travis-ci.org/pietercolpaert/hardf.svg?branch=master)](https://travis-ci.org/pietercolpaert/hardf)

Current Status: early port of [N3.js](https://github.com/RubenVerborgh/N3.js) to PHP
**hardf** is a PHP5.6+ library that lets you handle RDF easily. It offers:
- [**Parsing**](#parsing) triples/quads from [Turtle](http://www.w3.org/TR/turtle/), [TriG](http://www.w3.org/TR/trig/), [N-Triples](http://www.w3.org/TR/n-triples/), [N-Quads](http://www.w3.org/TR/n-quads/), and [Notation3 (N3)](https://www.w3.org/TeamSubmission/n3/)
- [**Writing**](#writing) triples/quads to [Turtle](http://www.w3.org/TR/turtle/), [TriG](http://www.w3.org/TR/trig/), [N-Triples](http://www.w3.org/TR/n-triples/), and [N-Quads](http://www.w3.org/TR/n-quads/)

Basic PHP library for RDF1.1. Currently provides simple tools (an Util library) for an array of triples/quads.
Both the parser as the serializer have _streaming_ support.

For now, [EasyRDF](https://github.com/njh/easyrdf) is the best PHP library for RDF (naming of this library is a contraction of "Hard" and "RDF", in which we try to make the point that you should at this point only use hardf when you know what you’re doing).
The EasyRDF library is a high-level library which abstracts all the difficult parts of dealing with RDF.
Hardf on the other hand, aims at a high performance for triple representations.
We will only support formats such as turtle or trig and n-triples or n-quads.
If you want other other formats, you will have to write some logic to load the triples into memory according to our triple representation (e.g., for JSON-LD, check out [ml/json-ld](https://github.com/lanthaler/JsonLD)).
_This library is a port of [N3.js](https://github.com/RubenVerborgh/N3.js) to PHP_

## Triple Representation

Expand All @@ -30,7 +28,7 @@ $triple = [

Encode literals as follows (similar to N3.js)

```
```php
'"Tom"@en-gb' // lowercase language
'"1"^^http://www.w3.org/2001/XMLSchema#integer' // no angular brackets <>
```
Expand All @@ -43,7 +41,182 @@ Install this library using [composer](http://getcomposer.org):
composer install pietercolpaert/hardf
```

Currently, we only have the `pietercolpaert\hardf\Util` class available, that will help you to create and evaluate literals, IRIs, and expand prefixes.
### Writing
```php
use pietercolpaert\hardf\TriGWriter;
```

A class that should be instantiated and can write TriG or Turtle

Example use:
```php
$writer = new TriGWriter([
"prefixes" => [
"schema" =>"http://schema.org/",
"dct" =>"http://purl.org/dc/terms/",
"geo" =>"http://www.w3.org/2003/01/geo/wgs84_pos#",
"rdf" => "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs"=> "http://www.w3.org/2000/01/rdf-schema#"
],
"format" => "n-quads" //Other possible values: n-quads, trig or turtle
]);

$writer->addPrefix("ex","http://example.org/");
$writer->addTriple("schema:Person","dct:title","\"Person\"@en","http://example.org/#test");
$writer->addTriple("schema:Person","schema:label","\"Person\"@en","http://example.org/#test");
$writer->addTriple("ex:1","dct:title","\"Person1\"@en","http://example.org/#test");
$writer->addTriple("ex:1","http://www.w3.org/1999/02/22-rdf-syntax-ns#type","schema:Person","http://example.org/#test");
$writer->addTriple("ex:2","dct:title","\"Person2\"@en","http://example.org/#test");
$writer->addTriple("schema:Person","dct:title","\"Person\"@en","http://example.org/#test2");
echo $writer->end();
```

#### All methods
```php
//The method names should speak for themselves:
$writer = new TriGWriter(["prefixes": [ /* ... */]]);
$writer->addTriple($subject, $predicate, $object, $graphl);
$writer->addTriples($triples);
$writer->addPrefix($prefix, $iri);
$writer->addPrefixes($prefixes);
//Creates blank node($predicate and/or $object are optional)
$writer->blank($predicate, $object);
//Creates rdf:list with $elements
$list = $writer->addList($elements);

//Returns the current output it is already able to create and clear the internal memory use (useful for streaming)
$out .= $writer->read();
//Alternatively, you can listen for new chunks through a callback:
$writer->setReadCallback(function ($output) { echo $output });

//Call this at the end. The return value will be the full triple output, or the rest of the output such as closing dots and brackets, unless a callback was set.
$out .= $writer->end();
//OR
$writer->end();
```

### Parsing

Next to [TriG](https://www.w3.org/TR/trig/), the TriGParser class also parses [Turtle](https://www.w3.org/TR/turtle/), [N-Triples](https://www.w3.org/TR/n-triples/), [N-Quads](https://www.w3.org/TR/n-quads/) and the [W3C Team Submission N3](https://www.w3.org/TeamSubmission/n3/)

#### All methods

```php
$parser = new TriGParser($options, $tripleCallback, $prefixCallback);
$parser->setTripleCallback($function);
$parser->setPrefixCallback($function);
$parser->parse($input, $tripleCallback, $prefixCallback);
$parser->parseChunk($input);
$parser->end();
```

#### Basic examples for small files

Using return values and passing these to a writer:
```php
use pietercolpaert\hardf\TriGParser;
use pietercolpaert\hardf\TriGWriter;
$parser = new TriGParser(["format" => "n-quads"]); //also parser n-triples, n3, turtle and trig. Format is optional
$writer = new TriGWriter();
$triples = $parser->parse("<A> <B> <C> <G> .");
$writer->addTriples($triples);
echo $writer->end();
```

Using callbacks and passing these to a writer:
```php
$parser = new TriGParser();
$writer = new TriGWriter(["format"=>"trig"]);
$parser->parse("<http://A> <https://B> <http://C> <http://G> . <A2> <https://B2> <http://C2> <http://G3> .", function ($e, $triple) use ($writer) {
if (!isset($e) && isset($triple)) {
$writer->addTriple($triple);
echo $writer->read(); //write out what we have so far
} else if (!isset($triple)) // flags the end of the file
echo $writer->end(); //write the end
else
echo "Error occured: " . $e;
});
```

#### Example using chunks and keeping prefixes

When you need to parse a large file, you will need to parse only chunks and already process them. You can do that as follows:

```php
$writer = new TriGWriter(["format"=>"n-quads"]);
$tripleCallback = function ($error, $triple) use ($writer) {
if (isset($error))
throw $error;
else if (isset($triple)) {
$writer->write();
echo $writer->read();
else if (isset($error)) {
throw $error;
} else {
echo $writer->end();
}
};
$prefixCallback = function ($prefix, $iri) use (&$writer) {
$writer->addPrefix($prefix, $iri);
};
$parser = new TriGParser(["format" => "trig"], $tripleCallback, $prefixCallback);
$parser->parseChunk($chunk);
$parser->parseChunk($chunk);
$parser->parseChunk($chunk);
$parser->end(); //Needs to be called
```

### Utility
```php
use pietercolpaert\hardf\Util;
```

A static class with a couple of helpful functions for handling our specific triple representation. It will help you to create and evaluate literals, IRIs, and expand prefixes.

```php
$bool = isIRI($term);
$bool = isLiteral($term);
$bool = isBlank($term);
$bool = isDefaultGraph($term);
$bool = inDefaultGraph($triple);
$value = getLiteralValue($literal);
$literalType = getLiteralType($literal);
$lang = getLiteralLanguage($literal);
$bool = isPrefixedName($term);
$expanded = expandPrefixedName($prefixedName, $prefixes);
$iri = createIRI($iri);
$literalObject = createLiteral($value, $modifier = null);
```

See the documentation at https://github.com/RubenVerborgh/N3.js#utility for more information.

## Two executables

We also offer 2 simple tools in `bin/` as an example implementation: one validator and one translator. Try for example:
```bash
curl -H "accept: application/trig" http://fragments.dbpedia.org/2015/en | php bin/validator.php trig
curl -H "accept: application/trig" http://fragments.dbpedia.org/2015/en | php bin/convert.php trig n-triples
```

## Performance

We compared the performance on two turtle files, and parsed it with the EasyRDF library in PHP, the N3.js library for NodeJS and with Hardf. These were the results:

| #triples | framework | time (ms) | memory (MB) |
|----------:|-------------------------|------:|--------:|
|1,866 | __Hardf__ without opcache | 27.6 | 0.722 |
|1,866 | __Hardf__ with opcache | 24.5 | 0.380 |
|1,866 | [EasyRDF](https://github.com/njh/easyrdf) without opcache | 5,166.5 | 2.772 |
|1,866 | [EasyRDF](https://github.com/njh/easyrdf) with opcache | 5,176.2 | 2.421 |
| 1,866 | [N3.js](https://github.com/RubenVerborgh/N3.js) | 24.0 | 28.xxx |
| 3,896,560 | __Hardf__ without opcache | 40,017.7 | 0.722 |
| 3,896,560 | __Hardf__ with opcache | 33,155.3 | 0.380 |
| 3,896,560 | [N3.js](https://github.com/RubenVerborgh/N3.js) | 7,004.0 | 59.xxx |


See the documentation at https://github.com/RubenVerborgh/N3.js#utility. Instead of N3Util, you will have to use `pietercolpaert\hardf::Util`.
## License, status and contributions
The N3.js library is copyrighted by [Ruben Verborgh](http://ruben.verborgh.org/) and [Pieter Colpaert](https://pietercolpaert.be)
and released under the [MIT License](https://github.com/RubenVerborgh/N3.js/blob/master/LICENSE.md).

Contributions are welcome, and bug reports or pull requests are always helpful.
If you plan to implement a larger feature, it's best to discuss this first by filing an issue.
30 changes: 30 additions & 0 deletions bin/convert.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/php
<?php
/** Converts TriG, Turtle, N3, N-QUADS or N-TRIPLES input to TriG, Turtle, N-QUADS or N-TRIPLES*/
include_once(__DIR__ . '/../vendor/autoload.php');
use pietercolpaert\hardf\TriGParser;
use pietercolpaert\hardf\TriGWriter;
$informat = "turtle";
if (isset($argv[1]))
$informat = $argv[1];

$outformat = "n-triples";
if (isset($argv[2]))
$outformat = $argv[2];

$writer = new TriGWriter(["format" => $outformat]);
$parser = new TriGParser(["format" => $informat], function ($error, $triple) use (&$writer) {
if (!isset($error) && !isset($triple)) { //flags end
echo $writer->end();
} else if (!$error) {
$writer->addTriple($triple);
echo $writer->read();
} else {
fwrite(STDERR, $error->getMessage() . "\n");
}
});

while ($line = fgets(STDIN)) {
$parser->parseChunk($line);
}
$parser->end();
32 changes: 32 additions & 0 deletions bin/validator.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/php
<?php
/** Validates TriG, Turtle, N3, N-QUADS or N-TRIPLES input */
include_once(__DIR__ . '/../vendor/autoload.php');
use pietercolpaert\hardf\TriGParser;
use pietercolpaert\hardf\TriGWriter;
$format = "trig";
if (isset($argv[1]))
$format = $argv[1];
$parser = new TriGParser(["format" => $format]);
$errored = false;
$finished = false;
$tripleCount = 0;
$line = true;
while (!$finished && $line) {
try {
$line = fgets(STDIN);
if ($line)
$tripleCount += sizeof($parser->parseChunk($line));
else {
$tripleCount += sizeof($parser->end());
$finished = true;
}
} catch (\Exception $e) {
echo $e->getMessage() . "\n";
$errored = true;
}
}
if (!$errored) {
echo "Parsed " . $tripleCount . " triples successfully.\n";
}

27 changes: 27 additions & 0 deletions examples/parseAndWrite.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<?php
include_once(__DIR__ . '/../vendor/autoload.php');
use pietercolpaert\hardf\TriGParser;
use pietercolpaert\hardf\TriGWriter;

echo "--- First, simple implementation ---\n";
$parser = new TriGParser([]);
$writer = new TriGWriter(["format"=>"trig"]);
$triples = $parser->parse("(<x>) <a> <b>. <b> <c> \"\"\"\n\"\"\".");
$writer->addTriples($triples);
echo $writer->end();

//Or, option 2, the streaming version
echo "--- Second streaming implementation with callbacks ---\n";
$parser = new TriGParser();
$writer = new TriGWriter(["format"=>"trig"]);
$error = null;
$parser->parse("@prefix ex: <http://ex.org/> . <http://A> <https://B> <http://C> <http://G> . <A2> <https://B2> <http://C2> <http://G3> . ex:s ex:p ex:o . ", function ($e, $triple) use (&$writer) {
if (!$e && $triple)
$writer->addTriple($triple);
else if (!$triple)
echo $writer->end();
else
echo "Error occured: " . $e;
}, function ($prefix, $iri) use (&$writer) {
$writer->addPrefix($prefix,$iri);
});
24 changes: 24 additions & 0 deletions examples/write.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<?php

include_once(__DIR__ . '/../vendor/autoload.php');
use pietercolpaert\hardf\TriGWriter;

//Add prefixes in the constructor
$writer = new TriGWriter([
"prefixes" => [
"schema" =>"http://schema.org/",
"dct" =>"http://purl.org/dc/terms/",
"geo" =>"http://www.w3.org/2003/01/geo/wgs84_pos#",
"rdf" => "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs"=> "http://www.w3.org/2000/01/rdf-schema#"
]
]);

$writer->addPrefix("ex","http://example.org/");
$writer->addTriple("schema:Person","dct:title","\"Person\"@en","http://example.org/#test");
$writer->addTriple("schema:Person","schema:label","\"Person\"@en","http://example.org/#test");
$writer->addTriple("ex:1","dct:title","\"Person1\"@en","http://example.org/#test");
$writer->addTriple("ex:1","http://www.w3.org/1999/02/22-rdf-syntax-ns#type","schema:Person","http://example.org/#test");
$writer->addTriple("ex:2","dct:title","\"Person2\"@en","http://example.org/#test");
$writer->addTriple("schema:Person","dct:title","\"Person\"@en","http://example.org/#test2");
echo $writer->end();
38 changes: 38 additions & 0 deletions perf/parser-streaming-perf.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<?php
include_once(__DIR__ . '/../vendor/autoload.php');
use pietercolpaert\hardf\TriGParser;

if (sizeof($argv) !== 2) {
echo 'Usage: parser-perf.php filename';
exit;
}

$filename = $argv[1];
$base = 'file://' . $filename;

$TEST = microtime(true);

$count = 0;
$parser = new TriGParser([ "documentIRI" => $base ]);
$callback = function ($error, $triple) use (&$count, $TEST, $filename) {
if ($triple) {
$count++;
}
else {
echo '- Parsing file ' . $filename . ': ' . (microtime(true) - $TEST) . "s\n";
echo '* Triples parsed: ' . $count . "\n";
echo '* Memory usage: ' . (memory_get_usage() / 1024 / 1024) . "MB\n";
}
};

$handle = fopen($filename, "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
$parser->parseChunk($line, $callback);
}
$parser->end($callback);
fclose($handle);
} else {
// error opening the file.
echo "File not found " . $filename;
}
Loading

0 comments on commit 7c301dd

Please sign in to comment.