-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRAPE on Heterogenous graphs #42
Comments
So for starters:
Is the task modelled as an edge prediction task? We can do a call next week on the discord channel, just ping me and we can plan it. Luca |
Hey ! Thanks for the reply ! Yes the task I am trying to achieve is an edge prediction task between a node of type molecule and one of type species (undirected). So far I was able to create a graph with the library. However, if I want to try to use an edge prediction task of your library and I specify ValueError: The provided node features have 48599 rows but the provided graph Lotus has 71410 nodes. Maybe these features refer to another version of the graph or another graph entirely? Which makes sense since I have 48599 nodes that are of type "molecule" and the rest of type species. I'm thinking the 2 dataframes should be merged but I am wondering if there is an other solution since the number of features is not the same for species of molecules. So to answer your question, maybe I don't need embedding but then how do I add features to each node ? I'd be glad if we could have a call to have a better understanding of all this. Tanks again ! Marco |
I am not sure how you expect a model to ingest such features - could you please describe how you would expect the model of your choosing to work? |
I am on discord if you'd like to have a call. |
I have a similar issue; I am trying to load an undirected graph with edge types and weights; my csv that I generate looks like this: head,relation,tail,weight
113091,14,412357,0.7917595
560244,14,1164306,0.7917595
388246,14,1121544,0.7917595
1102500,14,1142585,0.7917595
590896,14,661190,0.7917595
422681,14,501152,0.7917595
754343,14,1105352,0.7917595
639287,14,859151,0.7917595
270949,14,995611,0.7917595 I use the following snippet, adjusted from one of your tutorials: graph_ = grape.Graph.from_csv(
# Edges related parameters
## The path to the edges list tsv
edge_path="companykg.csv",
## Set the tab as the separator between values
edge_list_separator=",",
## The first rows should NOT be used as the columns names
edge_list_header=True,
## The source nodes are in the first nodes
sources_column_number=0,
## The destination nodes are in the second column
destinations_column_number=2,
## Both source and destinations columns use numeric node_ids instead of node names
edge_list_numeric_node_ids=True,
## The weights are in the third column
weights_column_number=3,
edge_type_path="companykg.csv",
edge_types_column="relation",
# Nodes related parameters
## The path to the nodes list tsv
# node_path=None,
## Set the tab as the separator between values
# node_list_separator="\t",
## The first rows should be used as the columns names
# node_list_header=True,
## The column with the node names is the one with name "node_name".
# nodes_column="node_name",
# Graph related parameters
## The graph is undirected
directed=False,
## The name of the graph is HomoSapiens
name="CompanyKG",
## Display a progress bar, (this might be in the terminal and not in the notebook)
verbose=True,
) However, getting the edge weights seem to not load it. >>> graph_.get_undirected_edge_type_ids()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [27], in <cell line: 1>()
----> 1 graph_.get_undirected_edge_type_ids()
ValueError: The current graph instance does not have edge types. Am I doing something wrong? Thank you! :) |
Hi @Filco306, the two issues are very different in nature. In the former one, @mvisani was providing features that did not map to the nodes of the graph. I helped him design other features that did. Your issue is that you are loading an edge-type file, but you are not loading the edge types column from the edge path. Possibly I should raise an error when encountering these configurations, although they are not wrong per se - with the parametrization you are using, you are setting the vocabulary of edge types of the graph, but you are not loading the column of the edge types from the edge list. Most likely |
Weird, I see that I have already added extensive errors for this type of parametrization - which version of Ensmallen are you using? Which OS? |
Interesting! Version: >>> grape.print_version()
{'GRAPE Version': '0.1.29', 'Python version': '3.10.6', 'Platform': 'Linux-5.4.0-150-generic-x86_64-with-glibc2.31', 'Threads number': 48, 'PyTorch version': '1.13.0', 'PyKEEN version': '1.9.0'} Ensmallen version when I do Should I update the ensmallen package? Or how should I solve it? :) Thank you for your quick reply! |
The latest version on ensmallen is |
Ah. Upgraded now, and now I get
|
Ok, which is what I was expecting you to get before. I hope the error I wrote contains sufficient information for you to correct the parametrization, if it remains unclear please do let me know so I can improve the error for the next version of Ensmallen. |
Ah, I had to change one parameter, and now it works! However, loading the excellent analysis takes waaaaaay longer; I suppose it takes longer given certain analyses are run that weren't run before? Thank you for a great package either way! I love it! |
As you have now included the edge types, it will incorporate them in the analysis. Most likely, the slowest new step is the isomorphic edge types detection step, which is similar in nature to the same thing I do for the nodes. At some point, I will make a faster version. |
Okay! Is there a way to turn off certain analyses such as that one prior to starting the loading of the analysis and leaving in the rest? |
Hi @Filco306 - I am not sure I have understood your question, could you kindly expand upon it? |
Yes absolutely! Is there a way to turn off/skip specifically the isomorphic edge detection step for the analysis summary, so that step is skipped and it finishes faster? That way, I could also test whether it is that step that is the issue. |
Currently, it runs the complete set of analyses for the whole graph as you have loaded it. In the default analysis that you get when you display the graph object, you cannot provide any parameter, and I tried to make it generally decently fast. If you are now experiencing long runtimes after having loaded the edge types, it is likely that it is the isomorphic edge types analysis that is slow, meaning it is finding many near-isomorphic candidates. Note that this is not the analysis on isomorphic edges, which is another thing entirely. If you do not load the edge types, does the analysis complete much faster? |
Sorry for my late reply. The analysis still runs fairly slow with the update. I will return with a timing on both! |
Hi !
I am a beginner in GNN and saw you repo and it seems that it could work for my problem but I just need to be sure.
My goal is to try to predict the chemical composition of organisms across the tree of life. I have a CSV file that is similar to this example :
So at each row we have unique pair of molecule-species (I'm thinking that would be the edge between 2 nodes of different type hence the Heterogenous graph), a certain number of papers that have actually found the molecule in that species (edge weight ?) , and then some information about the molecule and the species.
In this database there are 2 things we know : how species are related (classic phylogenic tree) and how molecules are related (group-subgroup structure seen above).
One fair assumption is that closely related species may share a similar set of molecules and molecules related in their synthesis may share a similar distribution across species. What I would like to have as a result is a matrix of s (species) by m (molecules) of probabilities that tell me if the edge between that molecule and that species could exist.
My questions are :
Sorry if those are very rooky questions, and thanks in advance for the reply ! :)
The text was updated successfully, but these errors were encountered: