Llamma-2Fine-Tune-New-Categorization

1. Generating Instruction Dataset using GPT-3.5

Overview

News articles play a crucial role in advancing machine learning research, offering a vast and diverse dataset for training and evaluating models in natural language understanding. This blog explores an innovative solution for efficiently creating a labeled dataset for news classification. The goal is to organize this wealth of information into distinct categories, facilitating research and industry applications such as sentiment analysis and text summarization.

Dataset Creation

Creating a well-categorized dataset manually or through keyword searches can be challenging. In this blog, we introduce an efficient method to generate an instruction dataset for news classification. The approach involves leveraging OpenAI's GPT 3.5, a powerful Large Language Model (LLM) that powers ChatGPT.

Ways to Create an Instruction Dataset

Convert Existing Dataset : Transform an existing dataset into an instruction dataset tailored for the desired news classification task.

Use Existing LLMs: Employ existing Large Language Models to generate an instruction dataset based on the unique language constructs and domain-specific terminology found in news articles.

Manual Creation: Manually curate an instruction dataset, ensuring high quality but potentially time-consuming.

Given the need for a high-quality dataset within a limited timeframe and budget, we opt to use GPT 3.5 for dataset creation.

2. Fine-Tuning Meta’s Llama 2 7B Model for News Article Categorization

Installing and Loading Required Modules Ensure you have the necessary Python modules installed. You can use the requirements.txt file or install them manually.

pip install -r requirements.txt

Approval for Meta’s Llama 2 Models

Below are the steps to request permission for the Llama-2–7B model:

Get approval from Hugging Face (https://huggingface.co/meta-llama/Llama-2-7b-hf).
Get approval from Meta (https://ai.meta.com/resources/models-and-libraries/llama-downloads/).
Create a WRITE access token on Hugging Face (https://huggingface.co/settings/tokens).
Execute !huggingface-cli login in Google Colab Notebook, enter the token, and enter "Y."

Setting up Hugging Face CLI and User Authentication

Create a WRITE access token on Hugging Face (https://huggingface.co/settings/tokens).
Execute !huggingface-cli login in Google Colab Notebook, enter the token, and enter "Y."

3. Deployment to AWS Sagemaker

Aws_SageMaker_Deploy.ipynb is the script to run for sagemaker deployment

Prequsites

Need an AWS account with AWSSageMakerFullAccess Role configured
Derive AWS Credentials
install boto3 and sagemaker

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
notebooks		notebooks
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llamma-2Fine-Tune-New-Categorization

1. Generating Instruction Dataset using GPT-3.5

Overview

Dataset Creation

Ways to Create an Instruction Dataset

2. Fine-Tuning Meta’s Llama 2 7B Model for News Article Categorization

Approval for Meta’s Llama 2 Models

Setting up Hugging Face CLI and User Authentication

3. Deployment to AWS Sagemaker

About

Releases

Packages

Languages

DHRUV6029/Llamma-2Fine-Tune-New-Categorization

Folders and files

Latest commit

History

Repository files navigation

Llamma-2Fine-Tune-New-Categorization

1. Generating Instruction Dataset using GPT-3.5

Overview

Dataset Creation

Ways to Create an Instruction Dataset

2. Fine-Tuning Meta’s Llama 2 7B Model for News Article Categorization

Approval for Meta’s Llama 2 Models

Setting up Hugging Face CLI and User Authentication

3. Deployment to AWS Sagemaker

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages