-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration of sparql with large language model related functionality #193
Comments
In the context of a tutorial that I gave a few years ago, I collected information about the full-text search features provided by several triple store vendors (BlazeGraph, Virtuoso, AllegroGraph, Stardog, GraphDB). The latest version of my slides with this information can be found at the following address, where slides 24 to 41 are the relevant ones. |
@hartig great one, tnx |
GraphDB supports the following:
https://graphdb.ontotext.com/documentation/10.4/retrieval-graphdb-connector.html
https://graphdb.ontotext.com/documentation/10.4/talk-to-graph.html
We are working on natural language querying (NLQ) aka knowledge graph question answering (KGQA). |
Interesting! I've started a plug-in for integrating vectors into SPARQL by
using registered IRIs as defined vector spaces, and rdf:JSON literals as
objects. Haven't made progress on the search side yet, but this is super
relevant to many of our research projects.
Jamie McCusker (she/her/hers)
Director, Data Operations
Tetherless World Constellation
Rensselaer Polytechnic Institute
***@***.*** ***@***.***>
http://tw.rpi.edu
…On Thu, Dec 14, 2023 at 9:50 AM Vladimir Alexiev ***@***.***> wrote:
GraphDB supports the following:
https://graphdb.ontotext.com/documentation/10.4/gpt-queries.html
- magic predicates to ask an LLM for text, list or table using data
from your KG:
- query explanation
- result explanation, summarization, rephrasing, translation
https://graphdb.ontotext.com/documentation/10.4/retrieval-graphdb-connector.html
- Indexing of KG entities in a vector database
- Supports any text embedding algorithm and vector database. We've
played with Weaviate, Elastic, etc
- Uses the same powerful connector (indexing) language that we use for
Elastic, Solr, Lucene
- Automatic synchronization of changes in RDF data to the KG entity
index
- Supports nested objects (but not yet in the UI)
- Serializes KG entities to text like this:
Franvino:
- is a RedWine.
- made from grape Merlo.
- made from grape Cabernet Franc.
- has sugar dry.
- has year 2012.
https://graphdb.ontotext.com/documentation/10.4/talk-to-graph.html
- A simple chatbot using a defined KG entity index
image.png (view on web)
<https://github.com/w3c/sparql-dev/assets/536250/80129475-5d92-451e-98c5-bc0d75960e6a>
We are working on natural language querying (NLQ) aka knowledge graph
question answering (KGQA).
Cheers!
—
Reply to this email directly, view it on GitHub
<#193 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAETCEMZEA6QZC5KVGMSFDDYJMG5BAVCNFSM6AAAAABAU264N6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJVHE4TCNZSGE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@jpmccu , very cool. Any idea whether standardized value sets for vector spaces would allow "composition" of machines? Specific example: given a set of synaptic weights for diagnosing an ischemic stroke and another set for traffic patterns in a city, could one combine independently-trained machines in order to optimize stroke patient care (e.g. decide between close hospital or one further away that's good at angioplasty)? Sounds like you might be playing with stuff like that. Testing that in SPARQL would be very interesting indeed. |
We assume that each vector space dimension is consistent (and is enforced
before storage in the vector DB). One could concatenate vectors into a
vector union in a new space, but we haven't really thought about doing
multi-space comparisons.
Right now we just have the representation and a plug-in for whyis that
intercepts the vectors as they're being published. We haven't done much
more than brainstorm what the SPARQL would look like, beyond the BGPs for
access looking like the RDF (using Jena PropertyFunctions) and ANN search
using a PropertyFunction similar to the full text search module.
Jamie McCusker (she/her/hers)
Director, Data Operations
Tetherless World Constellation
Rensselaer Polytechnic Institute
***@***.*** ***@***.***>
http://tw.rpi.edu
…On Tue, Dec 19, 2023 at 8:44 AM ericprud ***@***.***> wrote:
@jpmccu <https://github.com/jpmccu> , very cool. Any idea whether
standardized value sets for vector spaces would allow "composition" of
machines? Specific example: given a set of synaptic weights for diagnosing
an ischemic stroke and another set for traffic patterns in a city, could
one combine independently-trained machines in order to optimize stroke
patient care (e.g. decide between close hospital or one further away that's
good at angioplasty)? Sounds like you might be playing with stuff like
that. Testing that in SPARQL would be very interesting indeed.
—
Reply to this email directly, view it on GitHub
<#193 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAETCEPONWQBIVHXDUQTCALYKGK5BAVCNFSM6AAAAABAU264N6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRSG44DKOJZHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
But @jpmccu and @ericprud, is it appropriate to store tensors in JSON? Under https://accordproject.eu/ (automated compliance checking of architectural designs and urban planning) we're thinking about a binary data connector for GraphDB. There's also schemaorg/schemaorg#3140 |
They aren't actually stored in JSON, just represented that way. And within my system, we can add loaders for any useful format. JSON is useful because it can be embedded in Turtle easily, and I was able to create an RDFlib handler for it that didn't require serialization and deserialization, so they remain Python objects when put in memory graphs.
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Vladimir Alexiev ***@***.***>
Sent: Wednesday, December 20, 2023 5:01:40 AM
To: w3c/sparql-dev ***@***.***>
Cc: Jamie McCusker ***@***.***>; Mention ***@***.***>
Subject: Re: [w3c/sparql-dev] Integration of sparql with large language model related functionality (Issue #193)
But @jpmccu<https://github.com/jpmccu> and @ericprud<https://github.com/ericprud>, is it appropriate to store tensors in JSON?
Shouldn't we think of appropriate binary formats like HDF5 or stores like TensorStore<https://google.github.io/tensorstore/>?
There are also Data Abstraction Layers (eg GDAL) to isolate data access from the specific binary format/storage used.
Under https://accordproject.eu/ (automated compliance checking of architectural designs and urban planning) we're thinking about a binary data connector for GraphDB.
There's also schemaorg/schemaorg#3140<schemaorg/schemaorg#3140>
—
Reply to this email directly, view it on GitHub<#193 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAETCELFJ6IBVY6AUXSJYXLYKKZQJAVCNFSM6AAAAABAU264N6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRUGE4DOMBVHA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Thanks a lot to all for the discussion so far. Let me try to structure this. A lot of the discussion seems to focused on how to encode vectors and how to use them for search. And there are existing issues like #40 related to that. There is nearly no discussion on capabilities to query LLMs and to generate graphs out of them. E.g. the magic predicates mentioned by @VladimirAlexiev or the ones from Franz I had mentioned at the top of this issue. Do people see a use case for these to be standardised? Also, how about use cases that go beyond query but build on vector based similarity? One could use this for example for KG construction (which could be on top of queries via SPARQL CONSTRUCT) or validation ("check if everything which is skos:related is semantically really realted"). |
Some updates on this topic with newer developments. @rdfguy mentioned in the KGC panel discussions on KG standards that the combination of symbolic and statistical reasoning would be potential future direction for graph technologies. At the data week Leizip 2024, Lisa Wenige gave a 15 minute presentation on how this may look like, she showed sparql extensions for LLMs, see her 15 min presentation at https://www.youtube.com/watch?v=QfPCU8RiNhA&list=PLiyYYLqA8v5NBcAZJy6CpLVnDMrU4Y4yL&t=8344s At the knowledge graph conference, LLM support was shown by nearly all knowledge graph vendors.
As pointed out previously in this issue, many of these topics are related to search. Now, there seem to functionalities beyond search, e.g.
I am wondering if there is now a critical mass for starting work on this topic. Both RDF and property graph vendors are quite active in this space now. |
As a quick reminder on how this group works: Everyone can pick up one of the topics and make a concrete proposal in the form of a SEP, see https://github.com/w3c/sparql-dev/tree/main/SEP. But from experience I can say that it needs 1-2 people per SEP (at least) that really want to get it done and spend the time on it. We have a few successful examples when for example @afs and @Tpt created a SEP and worked on implementations after that in both Jena & Oxigraph. |
@ktk thanks for the reminder. My question was meant to see if somebody wants to pick this (potentially jointly) up :) |
Interested! There are several dimensions for SPARQL enhancements. One part of this may be to work on the standardization of call-out extensibility. Free text search is an example here. There is a common general sense of what a text search involves, while each text search system has particular features and syntax details. Therefore either define a (another!) free text search syntax or provide a flexible way to pass requests to text search systems. What would be the requirements on a call-out interface to support LLM's? What about call-in? |
Why
Several vendors are looking into this space already, sometimes in relation to extended (vector based) search capabilities, sometimes in relation to more general large language model features like summarization or knowledge graph generation from unstructured text.
Previous work
See also https://www.biorxiv.org/content/10.1101/463778v1.full.pdf .
Proposed solution
Nothing concrete yet, currently gathering related work.
Considerations for backward compatibility
Too early to discuss.
The text was updated successfully, but these errors were encountered: