Skip to content

TOMMY is een topic modelling applicatie die ontwikkeld is door studenten van de Universiteit Utrecht in opdracht van EMMA. Voor het uitvoeren van topic modelling wordt het achterliggende Latent Dirichlet Allocation (LDA) algoritme of Non-Negative Matrix Factorization (NMF) algoritme uitgevoerd op de door de gebruiker aangeleverde bestanden.

License

Notifications You must be signed in to change notification settings

Top-Models/TOMMY

Repository files navigation

Top Models logo

TOMMY: Interactive Topic Modeling

Note: this project is open sourced, and should therefore not get deleted after the project is finished.

Table of Contents

  • Description
  • Installation
  • Usage
  • Support
  • Contributing
  • Authors and Acknowledgment
  • License

Description

TOMMY is a tool for interactive topic modeling. It is designed to be used by researchers and data scientists who want to explore their data and extract topics from it without requiring extensive programming experience. TOMMY is built on top of the LDA, NMF and BERTopic algorithms. It provides a user-friendly interface for exploring the topics extracted from the data.

Installation

To run TOMMY, you can create a virtual environment. Once you're in your virtual environment, running the following command will download all the necessary packages.

pip install -r ./requirements.txt

By running the following command, TOMMY will start up. Note that this might take some time (30 seconds).

python -m tommy.main

Alternatively, the executables can be downloaded off of tommy.fyor.nl. Instructions for the installation process can be found in the installation guide. Note that this website is written in Dutch.

Usage

The software can be used to explore topics in a dataset. The user can import a dataset by selecting the 'Import' button in the top left corner. This will open a window where you can select the folder containing the dataset. TOMMY will then try to read all the files in the folder and display them in the file overview. After this is done, you can select the different parameters on the left, and run the topic modeling algorithm by clicking the 'Toepassen' button just below.

You can exclude words from the analysis by selecting the 'Blacklist' tab, by filling in the words you want to exclude in the text box. These must be separated by an enter.

Support

If you have any questions or issues, feel free to make a post on this repository. We will try to respond as soon as possible.

Contributing

If you want to contribute to this project, feel free to make a pull request. We ask you to follow the styleguide provided in the repository. Additionally, it would be appreciated if you could provide a brief description of the changes and the reason for them. This will help us to understand the changes and to merge them more quickly. Please try to include tests for the changes, reducing the likelihood of breaking the software for everyone.

Authors and Acknowledgment

This software has been developed by students of Utrecht University as part of our graduation project. This project has been commissioned by the Dutch company EMMA.

The students who have contributed to this project are:

Our supervisors are:

Our client is:

License

This project is licensed under the GNU AGPLv3 License - see the LICENSE.md file for more details.

About

TOMMY is een topic modelling applicatie die ontwikkeld is door studenten van de Universiteit Utrecht in opdracht van EMMA. Voor het uitvoeren van topic modelling wordt het achterliggende Latent Dirichlet Allocation (LDA) algoritme of Non-Negative Matrix Factorization (NMF) algoritme uitgevoerd op de door de gebruiker aangeleverde bestanden.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages