GitHub - RiteshKH/Legal_Citation_recognition: Application to read a legal text document and highlight the citations among the text for easier navigation and impact analysis

Legal_Citation_recognition

Python application to read a legal case document in text format and highlight the citations among the text for easier navigation and impact analysis.

Legal citation is the practice of crediting and referring to authoritative documents and sources. The most common sources of authority cited are court decisions (cases), statutes, regulations, treaties, and scholarly writing. Typically, a proper legal citation will inform the reader about a source's authority, how strongly the source supports the writer's proposition, its age, and other, relevant information. This is an example citation to a United States Supreme Court court case:
Griswold v. Connecticut, 381 U.S. 479, 480 (1965).
However in very long documents, searching for citations is a time consuming process. It is also a difficult task in NLP to find out the pattern of citation text and predict them. We have used a 'spacy' deep learning model for this purpose.

Here's a quick guide to each of the files and information:

input folder: Keep the document in txt format in this folder
outputs folder: Contains the generated output files. The highlighted text can be found in html and word document. The txt file here lists out the predicted citations.

Final command:: python main.py input/<filename>.txt
Ex - python main.py input/sample_input.txt

Use the following command to install and satisfy all requirements: pip install --user --requirement requirements.txt

Working of the files explained:

spacyNL24Sep.py : This is the code for building the spacy model, separately kept in model_building_and_training folder. For building the model on your own, put the files train_data_and_labels.csv and test_data.csv in the same folder and tweak the python script according to the dataframe.
predict_citation.py : This module takes a single raw text file as input and passes on through the model for prediction. Output is generated in csv format: result_citation.csv (the result contains filenames, citation text)
json_making.py : This module builds the json from the csv file generated from the predicted csv. Conditions used: Consider valid citation if(startid != -1 and length < 150) If duplicate citations are present, we check for all the citations's positional indexes and keep the record accordingly.
Coref_jsoncreation.py : This file takes in the initial json data and adds anaphoric information (Whether the short citations are refering to some other citation and details of the same).
Json_to_text_doc.py: This file takes in the final json data and raw text file as input. It generates the output as a text file as required . It also generates a docx file which contains all the highlighted citations
Citationhtmlutils.py : This takes in the raw text file and the text generated from Json_to_text_doc.py as input, and generates the html text with highlighted citations

Sample input

Generated Outputs

Command line output:

Predicted citations in json:

Highlighted text:

Future work :

The precision and recall currently is not good enough, and many citations are still not detected. Need to try other techniques such as LSTMs or BERT and try to improve the results.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
input		input
model_building_and_training		model_building_and_training
model_folder		model_folder
outputs		outputs
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
citationhtmlutil.py		citationhtmlutil.py
coref_jsoncreation.py		coref_jsoncreation.py
json_2_text_doc.py		json_2_text_doc.py
json_making.py		json_making.py
main.py		main.py
predict_citation.py		predict_citation.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legal_Citation_recognition

Working of the files explained:

Sample input

Generated Outputs

Command line output:

Predicted citations in json:

Highlighted text:

Future work :

About

Releases

Packages

Contributors 2

Languages

License

RiteshKH/Legal_Citation_recognition

Folders and files

Latest commit

History

Repository files navigation

Legal_Citation_recognition

Working of the files explained:

Sample input

Generated Outputs

Command line output:

Predicted citations in json:

Highlighted text:

Future work :

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages