Stampy NLP performs semantic search and other NLP microservices for aisafety.info and stampy.ai, a database of questions and answers about AGI safety. Contributions will be welcome (once I get the messy things cleaned up), and the code is released under the MIT License.
The demo url is nlp.stampy.ai or direct link to stampy-nlp-t6p37v2uia-uw.a.run.app. If you're interested in learning more about Natural Language Processing (NLP) and Transformers, the HuggingFace course provides an excellent introduction.
Our NLP services offer 4 features which depend on 2 key components:
- Three NLP models from HuggingFace, specifically SentenceTransformers, provide pre-trained models optimized for different types of semantic searches by generating sentence embeddings -- 768-dimension vectors, numerical representations that capture the meaning of the text. Think of it as an 768 element array of floats. In general, we use Python + PyTorch since that gives us the most flexibility to use a variety of models by default.
- Retriever model (multi-qa-mpnet) for identifying paraphrased questions.
- allenai-specter for searching titles & abstracts of scientific publications.
- Reader model (electra-base-squad2) finds the start & end index of the answer given a question and a context paragraph containing the answer.
- Pinecone is a fully managed, high-performance database for vector search applications. Each data element contains the 768-dimension vector, a unique id (i.e. Coda id for our FAQ) and some metadata (original text, url, other relevant information).
Encodes a given query
string, sends the vector embedding to search pinecone for nearest entries in faq-titles
namespace, then returns payload as json sorted by score between 0 and 1 indicating the similarity of match.
Sample API usage:
https://nlp.stampy.ai/api/search?query=What+AI+safety%3F`
query
(required) is the sentence or phrase to be encoded then have nearest entries returned.top
(optional) indicates the number of entries returned. If the value is not specified, the default is to return the 10 nearest entries.showLive=0
(optional) will return only entries withstatus
that are NOT "Live on site". The default isshowLive=1
to return only entries that withstatus
that are "Live on site".status=all
(optional) returns all entries including those that have not yet been canonically answered. Specify multiple values forstatus
to return matching more than one value.getContent=true
(optional) returns the content of answers along with each entry. Otherwise, default isgetContent=false
and only the question titles without answers are returned.
Sample usages:
showLive=1
returns entries where status == "Live on site"
https://stampy-nlp-t6p37v2uia-uw.a.run.app/api/search?query=What+AI+safety%3F&showLive=1
showLive=0
returns entries where status != "Live on site"
https://stampy-nlp-t6p37v2uia-uw.a.run.app/api/search?query=What+AI+safety%3F&showLive=0
status=all
returns all questions regardless of status
https://stampy-nlp-t6p37v2uia-uw.a.run.app/api/search?query=Something+random%3F&status=all
status=value
returns entries with status matching whatever value is specified. Multiple values may be listed separately. The example below returns entries with status == "Not started"
and also status == "In progress"
https://stampy-nlp-t6p37v2uia-uw.a.run.app/api/search?query=Something+random%3F&status=Not%20started&status=In%20progress
getContent=true
returns the content of answers along with each entry.
https://nlp.stampy.ai/api/search?query=Something+random%3F&getContent=true
Display a table with top pairs of most similar questions in Coda based on the last time paraphrase_mining
was called.
Encodes a given query string, sends the vector embedding to search pinecone for nearest entry in paper-abstracts
namespace, then returns payload as json sorted by score between 0 and 1 indicating the similarity of match. In an effort to minimize the number of huge models in our app container, this service still uses the external HuggingFace API so it's still a bit slow.
Sample API usage:
https://nlp.stampy.ai/api/literature?query=What+AI+safety%3F
4. Extract QA
Encodes a given query string then sends the vector embedding to search pinecone for the 10 nearest entries in extracted-chunks
namespace. For each entry, run the HuggingFace pipeline task to extract the answer from each content then returns payload as json sorted by score between 0 and 1 indicating the confidence of the answer matches the query question. Since this runs +10 inferences, this can be rather slow.
Sample API usage:
https://nlp.stampy.ai/api/extract?query=What+AI+safety%3F
./setup.sh
If this is your first run, it will:
- Download the appropriate models from Huggingface
- Write the appropriate API keys/tokens to
.env
- Create a virtualenv
- Install all requirements
Subsequent runs will skip bits that have already been done, but it does so by simply checking whether the appropriate files exist. API tokens for Coda, Pinecone and OpenAI are required, but the script will ask you for them.
The Stampy Coda table is https://coda.io/d/_dfau7sl2hmG
When creating a Pinecone project, make sure that the environment is set to us-west1-gcp
There is an /api/encode-faq-titles
endpoint that will generate a duplicates file and save it to Cloud
Storage. To avoid misusage, the endpoint is password protected. The password is provided via the AUTH_PASSWORD
env variable. This is only used for that endpoint - if not set, the endpoint will simply return 401s.
The models used are hosted separately and are provided via the following env variables:
QA_MODEL_URL=https://qa-model-t6p37v2uia-uw.a.run.app
RETRIEVER_MODEL_URL=https://retriever-model-t6p37v2uia-uw.a.run.app
LIT_SEARCH_MODEL_URL=https://lit-search-model-t6p37v2uia-uw.a.run.app
To help with local development you can set up the above model servers via docker-compose:
docker-compose up
This should work, but slowly. If you want faster results, consider either manually running the model that you're
using (check the model_server
folder for details), or provide a cloud server with the model.
Sentence transformer models can be run locally by providing the path to them, e.g.:
RETRIEVER_MODEL_URL=multi-qa-mpnet-base-cos-v1
LIT_SEARCH_MODEL_URL=allenai-specter
For this to work, its dependancies must first be installed via pip install -e '.[local_model]'
.
Install Google Cloud SDK
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
sudo apt-get update && sudo apt-get install google-cloud-cli
gcloud init
gcloud auth login --no-launch-browser
brew install --cask google-cloud-sdk
gcloud init
- Install Docker
- Authenticate Docker to Google Cloud:
gcloud auth configure-docker us-west1-docker.pkg.dev
One thing worth remembering here is that Google Cloud Run containers assume that they'll get a Linux x64 image. The deployment scripts should generate appropriate images, but it might be an issue if your deployments don't want to work and you're not on a Linux x64 system
Deploy to Google Cloud Run
./deploy.sh <service name>
If no service name is provided, the script will deploy to stampy-nlp
. Before actually doing anything, the script will
ask to make sure everything is correct.