Skip to content

Handling long texts remains a major challenge for language models, and we have proposed a two-stage model approach to address the difficulties posed by Chinese long texts.

Notifications You must be signed in to change notification settings

vic4code/chinese-long-text-nlp

Repository files navigation

Chineses Long Text NLP

These techniques enable us to swiftly extract key information from Chinese text and classify similar texts, significantly enhancing the efficiency of our subsidiary in information retrieval and analysis. In the field of natural language processing, handling long texts remains a major challenge for language models, and we have proposed a two-stage model approach to address the difficulties posed by Chinese long texts.

Installation

Environment

  • python >= 3.8

Setting up the Python Environment

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
  • Create a Python environment:
conda create -n <name> python=3.8

Python Package Installation

GPU Version:

pip install -r requirements.txt

Quickstart

  • Navigate to the project directory and set PYTHONPATH:
cd 
export PYTHONPATH="$PWD/src"

Data Preparation

Label Studio Data

  • Information Extraction
  • Text Classification:
    • UTC:
    • UIE (Optional):

Inference Data

Modeling

Coming soon...

About

Handling long texts remains a major challenge for language models, and we have proposed a two-stage model approach to address the difficulties posed by Chinese long texts.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published