Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add judge judy #6

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions jupyterlite/files/examples/QJudge Judy.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
{
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "python",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8"
},
"kernelspec": {
"name": "python",
"display_name": "Python (Pyodide)",
"language": "python"
}
},
"nbformat_minor": 4,
"nbformat": 4,
"cells": [
{
"cell_type": "markdown",
"source": "# Judge Judy \nJudge Judy relies on OpenAI to provide evaluations. This notebook only works under **https** mode as it requires you to call out to the OpenAI servers.\n\nCAUTION: Your **OPENAI_API_KEY** will be displayed in this notebook\n\nPlease copy this example and customize it for your own purposes!",
"metadata": {}
},
{
"cell_type": "code",
"source": "OPENAI_API_KEY='your-key-here'\nQUEPID_API_QEY='your-api-here'\nTEAM_ID=1\nBOOK_ID=25\nJUDGE_EMAIL='[email protected]'\nLIMIT_TO_BE_JUDGED=10\njudgement_counter = 0",
"metadata": {
"trusted": true
},
"execution_count": 9,
"outputs": []
},
{
"cell_type": "code",
"source": "from IPython.display import HTML, Markdown\nimport json\nfrom pyodide.ffi import to_js\nfrom IPython.display import JSON\nfrom js import Object\nfrom js import fetch",
"metadata": {
"trusted": true
},
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"source": "# Generic GET call to a JSON endpoint \nasync def get_json(url):\n resp = await fetch(url)\n resp_text = await resp.text()\n return json.loads(resp_text)\n\nasync def post_json(url, payload):\n resp = await fetch(url,\n method= \"POST\",\n body= json.dumps(payload),\n credentials= \"same-origin\",\n headers= Object.fromEntries(to_js({ \"Content-Type\":\"application/json\",\"Authorization\": \"Bearer \" + QUEPID_API_QEY })),\n )\n resp_text = await resp.text()\n return json.loads(resp_text)\n",
"metadata": {
"trusted": true
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "markdown",
"source": "## Step 2: Extract and Prepare Data for Judging",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "Make sure our user defined by the `JUDGE_EMAIL` exists.",
"metadata": {}
},
{
"cell_type": "code",
"source": "users = await get_json(f'/api/users?prefix={JUDGE_EMAIL}')\nif (len(users['users']) == 0):\n print(f'CREATING NEW JUDGE {JUDGE_EMAIL}')\n user = await post_json(f'/api/teams/{TEAM_ID}/members/invite', {'id': JUDGE_EMAIL})\nelse:\n user = users['users'][0]\n\njudge_id = user['id']\n \nMarkdown(f\"We will be generating judgements for {JUDGE_EMAIL}, judge_id: {judge_id}\")\n ",
"metadata": {
"trusted": true
},
"execution_count": 4,
"outputs": [
{
"execution_count": 4,
"output_type": "execute_result",
"data": {
"text/plain": "<IPython.core.display.Markdown object>",
"text/markdown": "We will be generating judgements for [email protected], judge_id: 23"
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Find out how many query/doc pairs exists for the book, and how many have already been judged by Judy",
"metadata": {}
},
{
"cell_type": "code",
"source": "query_doc_pairs = await get_json(f'/api/books/{BOOK_ID}/query_doc_pairs')\nMarkdown(f\"There are {len(query_doc_pairs['query_doc_pairs'])} query/doc pairs, I wish I could tell you how many have already been judged by {JUDGE_EMAIL}\")\n",
"metadata": {
"trusted": true
},
"execution_count": 5,
"outputs": [
{
"execution_count": 5,
"output_type": "execute_result",
"data": {
"text/plain": "<IPython.core.display.Markdown object>",
"text/markdown": "There are 2420 query/doc pairs, I wish I could tell you how many have already been judged by [email protected]"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"source": "model = {\n 'model': 'gpt-4-turbo-preview',\n 'max_tokens': 2048,\n 'top_p': 0.1,\n 'seed': 1,\n 'frequency_penalty': 0,\n 'presence_penalty': 0,\n 'response_format': {\n 'type': 'json_object'\n }\n}\nsystem_message = {\n 'role': 'system',\n 'content': 'You are a helpful AI assistant.'\n}\n ",
"metadata": {
"trusted": true
},
"execution_count": 6,
"outputs": []
},
{
"cell_type": "markdown",
"source": "## Step 3: Judge just like a Human, using the sort of Randomized selection process",
"metadata": {}
},
{
"cell_type": "code",
"source": "async def run_judgement(judge_id): \n query_doc_pair = await get_json(f'/api/books/{BOOK_ID}/query_doc_pairs/to_be_judged/{judge_id}')\n query_text = query_doc_pair['query_text']\n document_fields = json.loads(query_doc_pair['document_fields'])\n document = f\"{document_fields['name']} {document_fields['title']}\"\n\n judge_prompt = f\"\"\"\n Act as a judge determining to what extent a document matches the search query that it is paired with. All of the documents are related to business and finance. Your job is to understand the intent of the search query and the relevance of the document.\n The user provides:\n - Query: This is the actual search that was sent to the search engine\n - Document: Fields from the retrieved document\n Consider each attribute and how it does or does not pertain to the question. If you do not understand a term or how it is used do not try to guess. You will judge the relevance according to the following rules:\n 0: The document is irrelevant or relevance cannot be determined\n 1: The document is somewhat relevant and may contribute to answering the query\n 2: The document is relevant the query\n The date is 11-JAN-2024. This may affect how relevant documents are to time-based queries.\n Please reply in JSON with the following structure:\n - explanation: Why the document is relevant to the query\n - judgement: The judgement you would apply to the text from 0 to 2\n When explaining the judgement, only discuss why the document is relevant and not extraneous features of the document. Consider your answer carefully and explain your reasoning. Be strict in your assessment.\n Query: {query_text}\n Document: {document}\n \"\"\"\n\n resp = await fetch('https://api.openai.com/v1/chat/completions',\n method= \"POST\",\n body= '{' + json.dumps(model)[1:-1] + ', \"messages\": [' + json.dumps(system_message) + ', {\"role\": \"user\", \"content\": ' + json.dumps(judge_prompt) + '}]}',\n credentials= \"same-origin\",\n headers= Object.fromEntries(to_js({ \"Content-Type\":\"application/json\",\"Authorization\": \"Bearer \" + OPENAI_API_KEY })),\n )\n res = await resp.text()\n response = json.loads(res)\n #JSON(response)\n content_json = response['choices'][0]['message']['content']\n judgement = json.loads(content_json)\n print(f\"Judged a {str(judgement['judgement'])} because {judgement['explanation']}\")\n #s = \"Judged a <b>\" + str(judgement['judgement']) + \"</b> because <i>\" + judgement['explanation'] + '</i>'\n\n #display(HTML(s))\n\n\n judgement = await post_json(f\"/api/books/{BOOK_ID}/judgements/\", {'judgement': {'query_doc_pair_id':query_doc_pair['id'],'rating':judgement['judgement'], 'user_id': judge_id, 'explanation':judgement['explanation']}})\n #print(judgement) \n",
"metadata": {
"trusted": true
},
"execution_count": 7,
"outputs": []
},
{
"cell_type": "code",
"source": "while (judgement_counter < LIMIT_TO_BE_JUDGED):\n judgement_counter = judgement_counter + 1\n print(f'Making Judgment {judgement_counter}')\n await run_judgement(judge_id)",
"metadata": {
"trusted": true
},
"execution_count": 8,
"outputs": [
{
"name": "stdout",
"text": "Making Judgment 1\nJudged a 1 because The document describes a product (a mobile phone case) specifically designed for the iPhone XR, not the iPhone X. While both are models of iPhones, the query specifically searches for the iPhone X, making the document only somewhat relevant due to the focus on a different, though related, iPhone model.\nMaking Judgment 2\nJudged a 1 because The document describes a specific smartphone model, the Motorola Moto G 8, providing details about its screen size, memory capacity, SIM capability, connectivity, color, operating system, and battery life. These details directly address the search query for 'smartphone' by presenting a relevant product in the smartphone category. However, the document does not cover a range of smartphones or provide comparative information, which might be expected if the user's intent was to explore various options or learn about smartphones in general. The document's focus on a single product makes it highly relevant to someone specifically interested in the Motorola Moto G 8 but only somewhat relevant to a broader query about smartphones without specifying a brand or model.\nMaking Judgment 3\nJudged a 0 because The document mentions a specific product, the Philips Lithium Ultra Battery FR6LB4A/10, which is a type of battery. However, the query 'usb battery' suggests the user is looking for batteries that can be charged via USB or are related to USB in some way. The document does not provide information on whether the Philips Lithium Ultra Battery FR6LB4A/10 is a USB battery or has USB charging capabilities. Therefore, the relevance of this document to the query cannot be determined based on the provided information.\nMaking Judgment 4\nJudged a 1 because The document mentions a specific product, the 'Speaker Dock SBD8100/10' by Philips, which is a type of speaker. However, the document does not specify whether this speaker uses Bluetooth technology. The query is specifically looking for 'bluetooth speaker', and without confirmation that the Philips Speaker Dock supports Bluetooth, it cannot be determined if this document fully matches the search query. Therefore, the document may be somewhat relevant because it pertains to speakers, but it does not directly answer the query regarding Bluetooth capability.\nMaking Judgment 5\nJudged a 0 because The document mentions 'Jan van Haasteren', which might be mistaken for relevance to 'vans' in a broad search. However, the query 'vans' is likely referring to the brand of shoes or vehicles, not a person's name. The document is about a puzzle celebrating 'miffy 65 years' and does not pertain to the brand or concept of vans in the context most users would expect (either vehicles or footwear). Therefore, it does not match the likely intent behind the search query.\nMaking Judgment 6\nJudged a 1 because The document directly matches the search query 'dell' by mentioning a specific product related to Dell, which is the 'Origin Storage HDD Caddy for Dell Latitude E6500'. This indicates that the document is about a Dell product accessory, making it relevant to someone searching for Dell-related items, specifically for the Latitude E6500 model. However, the document does not provide broader information about Dell as a company or its range of products and services, which means it may only be partially relevant depending on the user's intent behind the search query 'dell'. If the user's intent was to find accessories or parts for a Dell Latitude E6500, this document is highly relevant. If the search intent was broader or unrelated to this specific product, the document's relevance would be limited.\nMaking Judgment 7\nJudged a 0 because The document mentions 'RBC2A Tripp Lite RBC2A UPS battery', which is a specific type of battery used for Uninterruptible Power Supplies (UPS). The query 'usb battery' is likely looking for portable USB batteries used to charge devices like smartphones and tablets. Although both are types of batteries, the document does not match the probable intent of the search for a portable USB battery. Therefore, the document is not directly relevant to the query.\nMaking Judgment 8\nJudged a 1 because The document mentions 'My Sims Case EAMS801 Bigben Interactive My Sims Case EAMS801', which is a product related to the Nintendo ecosystem, as 'My Sims' is a video game that could be played on Nintendo consoles. However, the document does not provide information about Nintendo directly, such as company news, products, or financial information. It only mentions a product that is associated with Nintendo's platform. Therefore, while there is a connection to Nintendo through the 'My Sims' game, the document's relevance is limited to showing a product that is part of Nintendo's gaming ecosystem without providing broader or direct information about Nintendo itself.\nMaking Judgment 9\nJudged a 1 because The document mentions a specific model of a PC camera by Philips, which is a type of video camera used primarily for capturing video on a computer. The query is looking for a 'video camera,' and while the document specifies a 'PC Camera,' it falls under the broader category of video cameras, especially those used for digital video capture on computers. However, the document does not provide information on other types of video cameras, such as handheld camcorders, action cameras, or professional video cameras, which limits its relevance to the query to only a subset of video cameras.\nMaking Judgment 10\nJudged a 1 because The document directly mentions 'Lenovo ThinkPad X60 Tablet 8 cell Li-Ion battery', which is a product made by Lenovo. Since the query is simply 'lenovo', the document is relevant because it provides information about a specific Lenovo product. However, the query is very broad and could pertain to a wide range of topics related to Lenovo, such as company news, product releases, reviews, or financial information. The document only provides information about a specific product and does not cover other potential areas of interest related to the query 'lenovo'. Therefore, while the document is relevant, it may not fully satisfy the user's potential need for information about Lenovo in a broader sense.\n",
"output_type": "stream"
}
]
},
{
"cell_type": "markdown",
"source": " _This notebook was last updated 11-MAR-2024_",
"metadata": {}
},
{
"cell_type": "code",
"source": "",
"metadata": {},
"execution_count": null,
"outputs": []
}
]
}
Loading