Script filtering comments present in a git repository source code according to a predefined set of patterns.
This script relies on the following packages:
- GitPython==2.1.5
- comment-parser==1.0.3
To check and install the dependencies simply run the command pip install -r requirements.txt
From the root directory execute:
python parse.py
The script takes as input the file patterns.txt
, in which the patterns to be matched are specified.
The output of the script is stored in the file output_parsing.tsv
, which contains the source code comments matching the predefined patterns.
The three columns of the output file are:
- File name: Location of the souce code file in which the matched comment appears
- Keyword: Pattern keyword(s) contained in the matched comment
- Comment: Content of the matched source code comment
-
The git repository to be analyzed is currently hardcoded in the script. Change the variable
git_repository_url
to utilize a different repository. -
The language of the repository has to be specified in the MIME type variable
MIME
. For the mapping of languages to MIME types refer to the documentation of the comment_parser package. -
Extension type(s) of the files to be considered during the parsing have to be specified in the extension variable
extensions
-
Currently supported languages:
- C
- C++
- Go
- Java
- Javascript
- Bash/Sh
Author:
- Roberto Verdecchia ([email protected])
Sample patterns were taken from the dataset of the research "An Exploratory Study on Self-Admitted Technical Debt" by Potdar et. al available here.
This project is licensed under the MIT License - see the file license.txt