-
Notifications
You must be signed in to change notification settings - Fork 211
Changelog
preFilter() function in Join module is optimazed using the pre2num structure, as well as the choose_next_node() function. A global string buffer is added to lower the cost of getFinalResult(), and the time of answering queries is reduced greatly.
In addition, we assign buffers of different size for all B+ trees.(some of them are more important and more frequently used) WangLibo merges several B+ trees into one, and the num of all B+ trees are reduced to 9 from 17. This strategy not only reduces the space cost, but also reduces the memory cost, meanwhile speeding up the build process and query process.
What is more, ChenJiaqi has done a lot of work to optimaze the SPARQL query. For example, some unconnected SPARQL query graphs are dealed specially.
ZengLi splits the KVstore into 3 parts according to the types of key and value, i.e. int2string, string2int and string2string. In addition, updates are supported now. You can insert, delete or modify some triples in the gStore database. In fact, only insert() and remove() are implemented, while the modify() are supported by removing frst and insert again.
ZengLi has enabled the gStore to answer queries with predicate variables. In addition, the structures of many queries have been studied to speed up the query processing. ChenJiaqi rewrites the sparql query plan to acquire a more efficient one, which brings many benefits to us.
The structure of this project has changed a lot now. A new join method has been achieved and we use it to replace the old one. The test result shows that speed is improved and the memory cost is lower. We also do some change to
Parser/Sparql*, which are all generated by ANTLR. They must be modified because the code is in C, which brings several multiple definition problems, and its size is too large.
There is a bug in the original Stream module, which brings some control characters to the output, such as ˆC, ˆV and so on. We have fixed it now and enabled the Stream to sort the output strings(both internal and external).
In addition, SPARQL queries which are not BGP(Basic Graph Pattern) are also supported now, using the naive method.
A powerful interactive console, which is named gconsole now, is achieved to bring convenience to users. What is more, we use valgrind tools to test our system, and deal with several memory leaks.
The docs and API have also changed, but this is of little importance.
We merge several classes(like Bstr) and adjust the project structure, as well as the debug system.
In addition, most warnings are removed, except for warnings in Parser module, which is due to the use of ANTLR.
What is more, we change RangeValue module to Stream, and add Stream for ResultSet. We also better the gquery console, so now you can redirect query results to a specified file in the gsql console.
Unable to add Stream for IDlist due to complex operations, but this is not necessary. Realpath is used to supported soft links in the gquery console, but it not works in Gstore.(though works if not in Gstore)
We add a gtest tool for utility, you can use it to query several datasets with their own queries.
In addition, gquery console is improved. Readline lib is used for input instead of fgets, and the gquery console can support commands history, modifying command and commands completion now.
What is more, we found and fx a bug in Database/(a pointer for debugging log is not set to NULL after fclose operation, so if you close one database and open another, the system will fail entirely because the system think that the debugging log is still open)
We implement the version of B+Tree, and replace the old one.
After testing on DBpedia, LUBM, and WatDiv benchmark, we conclude that the new BTree performs more efcient than the old version. For the same triple file, the new version spends shorter time on executing gload command.
Besides, the new version can handle the long literal objects efficiently, while triples whose object’s length exceeds 4096 bytes result in frequent inefficient split operations on the old version BTree.
We modify the RDF parser and SPARQL parser.
Under the new RDF parser, we also redesign the encode strategy, which reduces RDF file scanning times.
Now we can parse the standard SPARQL v1.1 grammar correctly, and can support basic graph pattern(BGP) SPARQL queries written by this standard grammar.
We add API for C/CPP and JAVA.
We share our gStore2.0 code as an open-source project under BSD license on github.