Mathosphere consists of the following components:
- A baseX ad hock formula search system
- A flink batch processing system
- A rest interface
The baseX backend manages the data used for ad hock retrieval for MathSearch on Wikipedia or DRMF. The rest interface provides an interface for ad hock retrieval. The MediaWiki MathSearch extension serves as a frontend. The flink batch processing component is used for long running data analysis and batch queries.
Currently, there is no Mathosphere release available. Version 3.0.0 is the first version that is going to be released to the public
However, the MathML query generator is available from maven central Note we are using a development version of MathML query generator for this project, under a submodule.
Version 1.0.0-SNAPSHOT is tightly coupled to Stratosphere 0.2.x and was focused on batch formula search. The code is available from (TU-Berlin/mathosphere-history). The research prototype was build explicitly for the NTCIR-10. We demonstrate the principle of separating the challenges of handing huge dadaists from principal question in MIR. See our Querying large Collections of Mathematical Publications paper.
Version 2.0.0-SNAPSHOT is based on Apache Flink. This research prototype analysing fundamental factors of formula similarity is build for the NTCIR-11 conference. See our paper
We are using the NTCIR-11 Wikipedia dataset (specifically the augmentedWikiDump.xml from this host) for as additional training dataset.
Run the following to initialize submodules after cloning this project:
git submodule init
git submodule update
Run the following to pull latest changes from each submodules' repo
git submodule update --remote --merge