FAKE-NEWS-DETECTION

What is fake news?

Fake news, also known as junk news, pseudo news is a form of news consisting of deliberate disinformation spread via traditional news media or online social media.

About this project:

This project aims at detecting fake news.

Using sklearn we build a TfidVectorizer on our dataset.Then we intialize a PassiveAggressiveClassifier and fit the model. At the end the accuracy score and confusion matrix tells how our model fares.

The DataSet:

The dataset that we use here is news.csv.This dataset takes upto 29.2MB of space and you can download it here.

What is PassiveAggressiveClassifier?

PassiveAggressive algorithm are a family of algorithms for large scale learning.It is used to shuffle the training data,when shuffle is set to frame.

Family of PassiveAggressive are:

What is TfidVectorizer?

The TfidVectorizer will tokenize documets ,learn the vocabulary and inverse the document frequency weighting and allow you to encode new documents.

Term frequency: This summarizes how often a given word appears within a document.
Inverse Document Frequency: This downscales words that appear a lot across documents.

What is ConfusionMatrix?

A Confusion matrix is a table that is often used to describe the performance of a classification model(or classifier) on a set of test data for which the true values are known.

Definition of these terms:

Positive(P): Observation is positive.
Negative(N): Observation is negative.
True Positive(TP): Observation is positive,and is predicted to be positive.
False Negative(FN): Observation is positive,but is predicted negative.
True Negative(TN): Observation is negative,and is predicted to be negative.
False Positive(FP): Observation is negative,but is predicted positive.

Accuracy is computed as shown below:

Now let's get into coding!!!

Install all the necessary libraries using pip.

!pip install numpy pandas sklearn

Make all the necessary imports

import numpy as np
import pandas as pd
import intertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score,confusion_matrix

Now read the data

dataframe=pd.read_csv("news.csv)
dataframe

What is .info() function?

The info function is used to print a concise summary of a dataframe.This method prints information about a dataFrame.This method prints information about a DataFrame including the index dtype and column ,non-null values and memory usage.

dataframe.info()

What is .keys() function?

The keys function returns the 'info axis' for the pandas object.If the pandas object is series then it returns index.If the pandas object is dataFrame then it retrurns columns.

dataframe.keys()

To get the shape of the object

dataframe.shape

What is .describe() function?

The describe method computes a summary of statistics pertaining to the DataFrame columns.

dataframe.describe()

What is .head() function?

This function returns the first n rows for the object based on position.It is useful for quickly testing if your object has right type of data in it.

dataframe.head()

Labelling

Data labelling is important because the machine learning algorithm has to understand the data.

labels=dataframe.label
labels.head()

Split the dataset into Training and Testing

Training set: A subset to train a model.
Test set: A subset to test the trained model.

Split the dataset

x_train,x_test,,y_train,y_test=train_test_split(dataframe['text'],labels,test_size=0.2,random_state=7)

TfidVectorizer

Initialize a TfidVectorizer.

tfidf_vectorizer=TfidVectorizer(stop_words='english',max_df=0.7)

Fit and transform train set,transform test set.

tfidf_train=tfidf_vectorizer.fit_transform(x_train)
tfidf_test=tfidf_vectorizer.transform(x_test)

PassiveAggressiveClassifier

Initialize a PassiveAggressiveClassifier.

pac=PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train,y_train)

Predict on the test set and calculate the accuracy.

y_pred=pac.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f'Accuracy: {rund(score*100,2)}%')

The accuracy obtained with this model is 93.05%.

Printing the confusion matrix

confusion_matrix(y_test,y_pred,labels=['FAKE','REAL'])

OUTPUT:
array([[592, 46],
       [ 42, 587]],dtype=int64)

Therefore from this model, according to confusion matrix ,we have 592 true positives,587 true negatives, 42 false positives and 46 false negatives.

CONCLUSION

Using this model, to detect fake news, we have obtained an accuracy of 93.05%.

Reference: DLITHE

Website: www.dlithe.com

Assignment during Online Internship with Dlithe www.dlithe.com

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Detecting_Fake_News.ipynb		Detecting_Fake_News.ipynb
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAKE-NEWS-DETECTION

What is fake news?

About this project:

The DataSet:

What is PassiveAggressiveClassifier?

Family of PassiveAggressive are:

What is TfidVectorizer?

What is ConfusionMatrix?

Accuracy is computed as shown below:

CONCLUSION

About

Releases

Packages

Languages

nishmitha03/Fake-News-Detection

Folders and files

Latest commit

History

Repository files navigation

FAKE-NEWS-DETECTION

What is fake news?

About this project:

The DataSet:

What is PassiveAggressiveClassifier?

Family of PassiveAggressive are:

What is TfidVectorizer?

What is ConfusionMatrix?

Accuracy is computed as shown below:

CONCLUSION

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages