banglanmt/batch_filtering at master · csebuetnlp/banglanmt

History

Name		Name	Last commit message	Last commit date
parent directory ..
source		source
README.md		README.md
install_external_tools.sh		install_external_tools.sh
install_models.sh		install_models.sh
scoring_pipeline.py		scoring_pipeline.py

README.md

Setup

Install all dependecies mentioned here.
download models: bash ./install_models.sh
setup necessary tools: bash ./install_external_tools.sh

Usage

setup environment variable before running.

# inside this directory
$ export LASER=$(pwd)

Batch filtering options

$ python3 scoring_pipeline.py -h
usage: scoring_pipeline.py [-h] --input_dir PATH --output_dir PATH --src_lang
                         SRC_LANG --tgt_lang TGT_LANG [--thresh THRESH]
                         [--batch_size BATCH_SIZE] [--cpu]

optional arguments:
  -h, --help            show this help message and exit
  --input_dir PATH, -i PATH
                        Input directory
  --output_dir PATH, -o PATH
                        Output directory
  --src_lang SRC_LANG   Source language
  --tgt_lang TGT_LANG   Target language
  --thresh THRESH       threshold
  --batch_size BATCH_SIZE
                        batch size

The script will recursively look for all filepairs (X.src_lang, X.tgt_lang) inside input_dir, where X is any common file prefix, and produce the following output files within the corresponding subdirectories of output_dir
- X.merged.tsv: Output linepairs with their similarity score
- X.passed.src_lang / X.passed.tgt_lang: Linepairs that have similarity scores greater than given thresh
- X.failed.src_lang / X.failed.tgt_lang: Linepairs that have similarity scores less than given thresh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch_filtering

batch_filtering

README.md

Setup

Usage

Files

batch_filtering

Directory actions

More options

Directory actions

More options

Latest commit

History

batch_filtering

Folders and files

parent directory

README.md

Setup

Usage