From 0c36b2d63689adc473233be960932f7861d480c7 Mon Sep 17 00:00:00 2001 From: Raphael Mitsch Date: Mon, 9 Oct 2023 12:17:12 +0200 Subject: [PATCH 1/4] Add examples for binary classification. --- website/docs/api/large-language-models.mdx | 48 ++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/website/docs/api/large-language-models.mdx b/website/docs/api/large-language-models.mdx index f8404cb2e48..993b77839b2 100644 --- a/website/docs/api/large-language-models.mdx +++ b/website/docs/api/large-language-models.mdx @@ -673,6 +673,22 @@ prompt. The formatting of few-shot examples is the same as those for the [v1](#textcat-v1) implementation. +If you want to perform few-shot learning with a binary classifier, you can +provide positive and negative examples - e. g.: + +```json +[ + { + "text": "You won the lottery! Wire a fee of 200$ to be able to withdraw your winnings.", + "answer": "Spam" + }, + { + "text": "Your order #123456789 has arrived", + "answer": "NotSpam" + } +] +``` + #### spacy.TextCat.v2 {id="textcat-v2"} V2 includes all v1 functionality, with an improved prompt template. @@ -702,6 +718,22 @@ V2 includes all v1 functionality, with an improved prompt template. The formatting of few-shot examples is the same as those for the [v1](#textcat-v1) implementation. +If you want to perform few-shot learning with a binary classifier, you can +provide positive and negative examples - e. g.: + +```json +[ + { + "text": "You won the lottery! Wire a fee of 200$ to be able to withdraw your winnings.", + "answer": "Spam" + }, + { + "text": "Your order #123456789 has arrived", + "answer": "NotSpam" + } +] +``` + #### spacy.TextCat.v1 {id="textcat-v1"} Version 1 of the built-in TextCat task supports both zero-shot and few-shot @@ -752,6 +784,22 @@ supports `.yml`, `.yaml`, `.json` and `.jsonl`. path = "textcat_examples.json" ``` +If you want to perform few-shot learning with a binary classifier, you can +provide positive and negative examples - e. g.: + +```json +[ + { + "text": "You won the lottery! Wire a fee of 200$ to be able to withdraw your winnings.", + "answer": "Spam" + }, + { + "text": "Your order #123456789 has arrived", + "answer": "NotSpam" + } +] +``` + ### REL {id="rel"} The REL task extracts relations between named entities. From 3cb57e18e107d137c6e8ded81dd587178308293e Mon Sep 17 00:00:00 2001 From: Raphael Mitsch Date: Wed, 11 Oct 2023 12:04:30 +0200 Subject: [PATCH 2/4] Fix example. --- website/docs/api/large-language-models.mdx | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/website/docs/api/large-language-models.mdx b/website/docs/api/large-language-models.mdx index 993b77839b2..6f4ba344e6c 100644 --- a/website/docs/api/large-language-models.mdx +++ b/website/docs/api/large-language-models.mdx @@ -784,18 +784,19 @@ supports `.yml`, `.yaml`, `.json` and `.jsonl`. path = "textcat_examples.json" ``` -If you want to perform few-shot learning with a binary classifier, you can -provide positive and negative examples - e. g.: +If you want to perform few-shot learning with a binary classifier (i. e. a text either should or should not be assigned +to a given class), you can provide positive and negative examples with the POS/NEG label. An example for spam +classification: ```json [ { "text": "You won the lottery! Wire a fee of 200$ to be able to withdraw your winnings.", - "answer": "Spam" + "answer": "POS" }, { "text": "Your order #123456789 has arrived", - "answer": "NotSpam" + "answer": "NEG" } ] ``` From 64c40d16f06534d36ca84806441879195c9bc697 Mon Sep 17 00:00:00 2001 From: Raphael Mitsch Date: Wed, 11 Oct 2023 12:06:16 +0200 Subject: [PATCH 3/4] Remove binary textcat example. Format. --- website/docs/api/large-language-models.mdx | 37 ++-------------------- 1 file changed, 3 insertions(+), 34 deletions(-) diff --git a/website/docs/api/large-language-models.mdx b/website/docs/api/large-language-models.mdx index 6f4ba344e6c..eaa896d74e9 100644 --- a/website/docs/api/large-language-models.mdx +++ b/website/docs/api/large-language-models.mdx @@ -673,22 +673,6 @@ prompt. The formatting of few-shot examples is the same as those for the [v1](#textcat-v1) implementation. -If you want to perform few-shot learning with a binary classifier, you can -provide positive and negative examples - e. g.: - -```json -[ - { - "text": "You won the lottery! Wire a fee of 200$ to be able to withdraw your winnings.", - "answer": "Spam" - }, - { - "text": "Your order #123456789 has arrived", - "answer": "NotSpam" - } -] -``` - #### spacy.TextCat.v2 {id="textcat-v2"} V2 includes all v1 functionality, with an improved prompt template. @@ -718,22 +702,6 @@ V2 includes all v1 functionality, with an improved prompt template. The formatting of few-shot examples is the same as those for the [v1](#textcat-v1) implementation. -If you want to perform few-shot learning with a binary classifier, you can -provide positive and negative examples - e. g.: - -```json -[ - { - "text": "You won the lottery! Wire a fee of 200$ to be able to withdraw your winnings.", - "answer": "Spam" - }, - { - "text": "Your order #123456789 has arrived", - "answer": "NotSpam" - } -] -``` - #### spacy.TextCat.v1 {id="textcat-v1"} Version 1 of the built-in TextCat task supports both zero-shot and few-shot @@ -784,8 +752,9 @@ supports `.yml`, `.yaml`, `.json` and `.jsonl`. path = "textcat_examples.json" ``` -If you want to perform few-shot learning with a binary classifier (i. e. a text either should or should not be assigned -to a given class), you can provide positive and negative examples with the POS/NEG label. An example for spam +If you want to perform few-shot learning with a binary classifier (i. e. a text +either should or should not be assigned to a given class), you can provide +positive and negative examples with the POS/NEG label. An example for spam classification: ```json From f6d9e5c4dfe6c70ecfc868906949e1a0d85d15b3 Mon Sep 17 00:00:00 2001 From: Raphael Mitsch Date: Wed, 11 Oct 2023 12:16:40 +0200 Subject: [PATCH 4/4] Rephrase. --- website/docs/api/large-language-models.mdx | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/website/docs/api/large-language-models.mdx b/website/docs/api/large-language-models.mdx index eaa896d74e9..55d137e216d 100644 --- a/website/docs/api/large-language-models.mdx +++ b/website/docs/api/large-language-models.mdx @@ -754,8 +754,9 @@ path = "textcat_examples.json" If you want to perform few-shot learning with a binary classifier (i. e. a text either should or should not be assigned to a given class), you can provide -positive and negative examples with the POS/NEG label. An example for spam -classification: +positive and negative examples with answers of "POS" or "NEG". "POS" means that +this example should be assigned the class label defined in the configuration, +"NEG" means it shouldn't. E. g. for spam classification: ```json [