code for SketchNE
[1] Yuyang Xie, Yuxiao Dong, Jiezhong Qiu, Wenjian Yu, Xu Feng, Jie Tang, “SketchNE: Embedding billion-scale networks accurately in one hour,” IEEE Trans. Knowledge and Data Engineering, 2023
The code is compiled and run with g++ 7.4.0 (any supporting c++17 should work in theory).
In the spectral propagation strategy, we need modified Bessel functions of the first kind which is supported by Boost.
sudo apt-get install libboost-dev
Intel MKL is used for basic linear algebra operations. You can install with Anaconda
conda create -n sketchne python=3.7 # first create a new python env
conda activate sketchne # activate the new created env
conda install mkl -c intel --no-update-deps
conda install mkl-devel
You can also download directly from Intel. Please follow
https://software.intel.com/en-us/mkl/choose-download/linux
The installation script will install intel mkl (by default) at /opt/intel
.
To compile sketchne, you may need to edit Makefile when you install MKL with Anaconda. You need to set something like:
INCLUDE_DIRS = -I./ligra -I./pbbslib -I./mklfreigs -I"{ANACONDA_PATH}/envs/sketchne/include"
LINK_DIRS = -L"{ANACONDA_PATH}/envs/sketchne/lib"
Then run make
to compile.
To clean the compiled file, run make clean
.
Before running the example, you may need to set environment. If you install MKL directly from Intel, you can set:
export LD_LIBRARY_PATH=/opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH
Or you can set the library path in anaconda path:
export LD_LIBRARY_PATH={ANACONDA_PATH}/envs/sketchne/lib:$LD_LIBRARY_PATH
run blog.sh
in example directory.
The input format is the adjacency graph format used by GBBS. All vertices and offsets are 0 based and represented in decimal. The specific format is as follows:
AdjacencyGraph
<n>
<m>
<o0>
<o1>
...
<o(n-1)>
<e0>
<e1>
...
<e(m-1)>
We have a format conversion program in the util directory, which supports the conversion of edgelist and mat formats to adjacency graph format.
Here we only give the small graph as an example. If you need more datasets for testing, please download and unzip datasets used in NetSMF paper.
cd data_bin
wget https://sampledbsql1backup.blob.core.windows.net/www19netsmf/datasets.zip
unzip datasets.zip
It's easy to found youtube.mat (youtube dataset)
and mag.edge (OAG dataset)
in the datasets.
friendster
and livejournal
can be download from SNAP: https://snap.stanford.edu/data/
.
ClueWeb graph can be downloaded from here.
Hyperlink2014 graph can be downloaded from here.
Hyperlink2012 graph can be downloaded from here.