-
Notifications
You must be signed in to change notification settings - Fork 0
topic model-based search for scientific publications of the Max Planck Society
License
dh-thesis/maxmodelr
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# ////////////////// # # /// maxmodelr /// # # //////////////// # # SET UP # ... by cloning the repository $ git clone https://github.com/dh-thesis/maxmodelr.git $ cd maxmodelr # REQUIREMENTS --> check if you fullfill the requirements listed --> at https://github.com/herreio/maxplanckr # PUBLICATION SEARCH # start R console $ R # if it's the first time, packrat will install itself # after that simply run the following command to set up R environment > packrat::restore() # this will take same time, grab yourself a coffee... # load package > devtools::load_all() # load topic coupling data (k = number of topics) > couple <- load_couple(k=25) > format(object.size(couple), units="MB") # [1] "541 Mb" <-- be aware of this memory usage, you need some free RAM! # set search query > query <- "information retrieval" # choose topic model > models <- c("lda_all","lda_mpi","lda_pers","btm_all") > model <- sample(models, 1) # get n most probable items with same topic as query > n = 20 > search_topic_items(couple, query=query, model=model, n=n) # take c most probable items with same topic as query # and compute js-divergence between theta of items and query # return the first n items > c = 100 > search_dist_items(couple, query=query, model=model, n=n, c=c) # take all items with same topic as query # and compute js-divergence between theta of items and query # return the first n items # !! # !! depending on the number of topics in the model, this means comparing lots of vectors # !! the lower the number of topics, the more publications are in the same topic # !! and need to be compared. it is suggested not to use this function with the # !! example coupling data provided in this repository with k = 25. # !! instead create coupling data with k >= 100 (see below) # !! > search_topic_items_dist(couple, query=query, model=model, n=n) # DATA PREPARATION # create models $ exec/runbtm > btm.log 2>&1 $ exec/runlda > lda.log 2>&1 # infer titles $ exec/infer > infer.log 2>&1 # couple titles and topics $ exec/couple
About
topic model-based search for scientific publications of the Max Planck Society
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published