-
Notifications
You must be signed in to change notification settings - Fork 22
CiteSeer Dataset
Alexander L. Hayes edited this page Sep 18, 2017
·
26 revisions
All Datasets: boost-starai/BoostSRL-Datasets
by: Nandini Ramanan, Alexander L. Hayes
<< "UW-CSE" | BoostSRL Wiki | "WebKB" >>
"CiteSeer" is a relational dataset of publication citations for Alchemy, the original dataset is available on their website. This version has modifications to work with RDN-Boost; including the associated background, train/test folders, and the positives/negatives/facts.
Three targets are considered:
infield_fauthor
infield_ftitle
infield_fvenue
Download: CiteSeer.zip (1.62 MB)
-
md5sum
: e606e6f3fbe12f62cb5261285b39209c -
sha256sum
: f5f6dd960a09d98e80cb2dcb735463dbc7dc5aaf2676f98d938be7df6edd2200
Linux/Mac:
- After downloading, unzip CiteSeer.zip
unzip CiteSeer.zip
- If you're using a jar file, move it into the CiteSeer directory:
mv (jar file) CiteSeer/
- Learning:
- Learning may take a week on a dataset this large.
java -jar BoostSRL.jar -l -train train/ -target infield_fauthor,infield_ftitle,infield_fvenue -trees 10
- Inference:
java -jar BoostSRL.jar -i -test test/ -model train/models/ -target infield_fauthor,infield_ftitle,infield_fvenue -trees 10
Windows:
(Coming soon)
// Parameters
usePrologVariables: true.
setParam: treeDepth=4.
setParam: nodeSize=2.
setParam: numOfClauses=8.
setParam: numOfCycles=8.
// Modes & Bridgers
mode: center(+bib, +pos).
mode: center(+bib, -pos).
mode: firstin(+bib, +pos).
mode: firstin(+bib, -pos).
mode: firstnonauthortitletkn(+bib, +pos).
mode: firstnonauthortitletkn(+bib, -pos).
mode: followby(+bib, +pos, #token).
mode: hascomma(+bib, +pos).
mode: hascomma(+bib, -pos).
mode: haspunc(+bib, +pos).
mode: haspunc(+bib, -pos).
mode: infield_ftitle(+bibpos).
mode: infield_fauthor(+bibpos).
mode: infield_fvenue(+bibpos).
mode: isalphachar(+token).
mode: isdate(+token).
mode: isdigit(+token).
mode: lastinitial(+bib, +pos).
mode: lastinitial(+bib, -pos).
mode: lessthan(+pos, -pos).
mode: lessthan(-pos, +pos).
mode: next(+pos, -pos).
mode: next(-pos, +pos).
bridger: next/2.
mode: nextbibpos(+bibpos, -bibpos).
mode: nextbibpos(-bibpos, +bibpos).
nextbibpos(BP1,BP2) :- isbibpos(BP1, B,P1), isbibpos(BP2,B,P2), next(P1,P2).
mode: isbibpos(+bibpos, -bib, -pos).
mode: isbibpos(+bibpos, +bib, -pos).
mode: isbibpos(+bibpos, -bib, +pos).
mode: isbibpos(-bibpos, +bib, +pos).
bridger: isbibpos/3.
mode: token(+token, +pos, +bib).
mode: token(+token, -pos, +bib).
mode: token(-token, +pos, +bib).
BoostSRL Wiki
Home
BoostSRL Basics
- Getting Started
- File Structure
- Basic Usage Parameters
- Advanced Usage Parameters
- Basic Modes Guide
- Advanced Modes Guide
Deep dive into BoostSRL
- Default (RDN-Boost)
- MLN-Boost
- Regression
- Cost-sensitive SRL
- Learning with Advice
- Approximate Counting
- One-class Classification (coming soon)
- Discretization of Continuous Valued Attributes
- Lifted Relational Random Walks
- Grounded Relational Random Walks
Datasets
Applications of BoostSRL