TOMMY: Interactive Topic Modeling

Note: this project is open sourced, and should therefore not get deleted after the project is finished.

Description

TOMMY is a tool for interactive topic modeling. It is designed to be used by researchers and data scientists who want to explore their data and extract topics from it without requiring extensive programming experience. TOMMY is built on top of the LDA, NMF and BERTopic algorithms. It provides a user-friendly interface for exploring the topics extracted from the data.

Installation

To run TOMMY, you can create a virtual environment. Once you're in your virtual environment, running the following command will download all the necessary packages.

pip install -r ./requirements.txt

By running the following command, TOMMY will start up. Note that this might take some time (30 seconds).

python -m tommy.main

Alternatively, the executables can be downloaded off of tommy.fyor.nl. Instructions for the installation process can be found in the installation guide. Note that this website is written in Dutch.

Usage

The software can be used to explore topics in a dataset. The user can import a dataset by selecting the 'Import' button in the top left corner. This will open a window where you can select the folder containing the dataset. TOMMY will then try to read all the files in the folder and display them in the file overview. After this is done, you can select the different parameters on the left, and run the topic modeling algorithm by clicking the 'Toepassen' button just below.

You can exclude words from the analysis by selecting the 'Blacklist' tab, by filling in the words you want to exclude in the text box. These must be separated by an enter.

Support

If you have any questions or issues, feel free to make a post on this repository. We will try to respond as soon as possible.

Contributing

If you want to contribute to this project, feel free to make a pull request. We ask you to follow the styleguide provided in the repository. Additionally, it would be appreciated if you could provide a brief description of the changes and the reason for them. This will help us to understand the changes and to merge them more quickly. Please try to include tests for the changes, reducing the likelihood of breaking the software for everyone.

Authors and Acknowledgment

This software has been developed by students of Utrecht University as part of our graduation project. This project has been commissioned by the Dutch company EMMA.

The students who have contributed to this project are:

Our supervisors are:

Our client is:

Lidwien van de Wijngaert from EMMA

License

This project is licensed under the GNU AGPLv3 License - see the LICENSE.md file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 1,438 Commits
docs		docs
installer		installer
test		test
tommy		tommy
.coveragerc		.coveragerc
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
LICENSE		LICENSE
README.md		README.md
STYLEGUIDE.md		STYLEGUIDE.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TOMMY: Interactive Topic Modeling

Table of Contents

Description

Installation

Usage

Support

Contributing

Authors and Acknowledgment

License

About

Releases 1

Packages

Contributors 10

Languages

License

Top-Models/TOMMY

Folders and files

Latest commit

History

Repository files navigation

TOMMY: Interactive Topic Modeling

Table of Contents

Description

Installation

Usage

Support

Contributing

Authors and Acknowledgment

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 10

Languages

Packages