Install

pip3 install -e .

Usage

The application can be run:

reading queries from stdin
reading query as an argument

Additional resources need to be downloaded:

python download.py # dictionaries, embeddings
python download_names.py

Queries from stdin

python ./generator/app.py app.input=stdin

Query as an argument

The application can be run with

python ./generator/app.py

It will generate suggestions for the default query.

The default parameters are defined in conf/config.yaml. Any of the parameters might be substituted with a path to the parameter, with dot-separated fragments, e.g.

python ./generator/app.py app.query=firepower

will substitute the default query with the provided one.

The parameters are documented in the config.

REST API

Start server:

python -m uvicorn web_api:app --reload

Query with POST:

curl -d '{"label":"fire"}' -H "Content-Type: application/json" -X POST http://localhost:8000

Tests

Run:

pytest

or without slow tests:

pytest -m "not slow"

Debugging

Run app with app.logging_level=DEBUG to see debug information:

python generator/app.py app.input=stdin app.logging_level=DEBUG

Deployment

Build Docker image locally

Set image TAG:

export TAG=0.1.0

Build a Docker image locally

docker compose -f docker-compose.build.yml build

Authorize to Amazon (if you are using MFA you have to take temporary ACCESS keys from AWS STS):

aws configure

Authorize to ECR:

./authorize-ecr.sh

Push image to ECR:

`docker push 571094861812.dkr.ecr.us-east-1.amazonaws.com/name-generator:${TAG}

Deploy image on remote instance

Set image TAG:

`export TAG=0.1.0

Authorize EC2 instance in ECR:

aws ecr get-login-password | docker login --username AWS --password-stdin 571094861812.dkr.ecr.us-east-1.amazonaws.com/name-generator

(Re-Deploy) image:

docker compose up -d

Check if service works:

curl -d '{"label":"firestarter"}' -H "Content-Type: application/json" -X POST http://44.203.61.202

Learning-To-Rank

To access the LTR features, you need to configure it in the Elasticsearch instance (see here for more details).

Pipelines, weights, sampler

In conf/prod_config_new.yaml are defined generator_limits which limits maximum number of suggestions generated by each generator. This is for optimization. E.g.:

  generator_limits:
    HyphenGenerator: 128
    AbbreviationGenerator: 128
    EmojiGenerator: 150
    Wikipedia2VGenerator: 100
    RandomAvailableNameGenerator: 20000

In conf/pipelines/prod_new.yaml are defined pipelines. Each pipeline have:

a name
one generator
list of filters, e.g. SubnameFilter, ValidNameFilter, ValidNameLengthFilter, DomainFilter
weights for each interpretation type (ngram, person, other) and each language
mode_weights_multiplier - a multiplier of above weights for each mode (e.g. instant, domain_detail, full)
global_limits for each mode, which can be integer (absolute number) or float (percentage of all results); also you can override values for grouped_by_category endpoint by adding prefix grouped_ (e.g. grouped_instant, grouped_domain_detail, grouped_full)

Setting 0 in mode_weights_multiplier or global_limits disables the pipeline in a given mode.

Sampler

Each request defines:

mode
min_suggestions
max_suggestions
min_available_fraction

A name can have many interpretations. Every interpretation has type (ngram, person, other) and language. Every interpretation have a probability. There might be more than one interpretation with the same type and language.

For each pair of type and language, probabilities of each pipeline are computed.

If there is enough suggestions then break.
If all pipeline probabilities for every pair of type nad language are 0 then break.
Sample type and language, then sample interpretation within this type and language.
Sample a pipeline for the sampled interpretation. The first pass of sampling is without replacement to increase diversity in top suggestions.
If the pipeline exceeds its global limit then go to 4.
Get a suggestion from the pipeline. (The generator is executed here). If there is no more suggestions then go to 4.
If the suggestion have been already sampled then go to 6.
If the suggestion is not available and there is room only for available then go to 6.
If the suggestion is not normalized then go to 6.
Go to 1.

Exhausted pipelines are removed from sampling.

Grouped by category

Parameters:

mode
min_available_fraction
max number of categories
max number of suggestions per category
max related categories
min total categories?
max total categories?

Requirements:

order of categories is fixed
every generator must be mapped to only one category
flag generator suggestion should appear in 10% of suggestions - maybe we should detect if it is first search by a user
- should we remove first pass of sampling with every generator?

Shuffle order of categories (using weights?) if min number of categories is smaller than all categories. If some category does not return suggestions then we take the next one.
Within each category: sample type and lang of interpretation, sample interpretaion with this type and lang. Sample pipeline (weights of pipelines depends on type and language. Do it in parallel?
Sample max number of suggestions per category. How handle min_available_fraction?

Suggestions by category

For each category MetaSampler is created with appropriate pipelines. In parallel, all MetaSamplers are exectuted. In one MetaSampler:

Apply global limits.
For each interpretation (interpretation_type, lang, tokenization) a sampler is created.

After generation of suggestions for all categories:

For each category number of suggestions is limited by category's max_suggestions.
If count_real_suggestions < min_total_suggestions then RandomAvailable names are appended as other category.

Name		Name	Last commit message	Last commit date
Latest commit History 1,228 Commits
.github/workflows		.github/workflows
conf		conf
data		data
download		download
generator		generator
research		research
tests		tests
.gitignore		.gitignore
.tool-versions		.tool-versions
Dockerfile		Dockerfile
authorize-ecr.sh		authorize-ecr.sh
collection_models.py		collection_models.py
docker-compose-elasticsearch.yml		docker-compose-elasticsearch.yml
docker-compose.build.yml		docker-compose.build.yml
docker-compose.yml		docker-compose.yml
healthcheck.py		healthcheck.py
models.py		models.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
readme.md		readme.md
web_api.py		web_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install

Usage

Queries from stdin

Query as an argument

REST API

Tests

Debugging

Deployment

Build Docker image locally

Deploy image on remote instance

Learning-To-Rank

Pipelines, weights, sampler

Sampler

Grouped by category

Suggestions by category

About

Releases 1

Packages

Contributors 7

Languages

namehash/name-generator

Folders and files

Latest commit

History

Repository files navigation

Install

Usage

Queries from stdin

Query as an argument

REST API

Tests

Debugging

Deployment

Build Docker image locally

Deploy image on remote instance

Learning-To-Rank

Pipelines, weights, sampler

Sampler

Grouped by category

Suggestions by category

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 7

Languages

Packages