Skip to content

Latest commit

 

History

History
122 lines (93 loc) · 4.48 KB

README.md

File metadata and controls

122 lines (93 loc) · 4.48 KB

Github PR prediction (pullreq-ml)

ETL process

This Node/Python library builds a model to predict if a particular Pull Request (PR) will be accepted when it is created, by learning information about a Github Project. The aim of this library is to aid Project integrator in managing PRs for a particular project. You can find more information about the model and how in this article.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What things you need to install the software and how to install them

You will need the following:

Installing & Running

  1. Choose a project to predict. In this document I will use https://github.com/Netflix/pygenie, because it is smaller, but you can use any, like the Node project

  2. Clone this repository into your machine:

    git clone https://github.com/sophilabs/pullreq-ml.git
  3. (Optional) Install your local copy into a virtual environment. For example using the venv library you can do the following.

    python -m venv venv
    source venv/bin/activate
  4. Install dependencies

    cd pullreq-ml # or pullreq-ml-master
    npm install
    pip install -r requirements.txt
  5. (Optional) Create a user for your MongoDB instance

    echo "db.createUser({ user: 'github', 'pwd': 'github', roles: ['readWrite'] })" | mongo github
  6. Replace the contents of config.js with the actual repo and database authentication. For example

     module.exports = {
         // Local Mongo DB
         MONGO_DB_URL: 'mongodb://github:github@localhost:27017/github',
         // Token
         GITHUB_ACCESS_TOKEN: '<your token here>',
         // Repo Information for example for https://github.com/Netflix/pygenie you should put
         REPO_OWNER: 'Netflix',
         REPO_NAME: 'pygenie'
     }
  7. Clone the target repo inside the targetrepo folder

    git clone https://github.com/Netflix/pygenie.git targetrepo
  8. Start fetching Repo information

    node fetch.js
  9. Train and evaluate Pull Request Acceptance for your repository

    python evaluate.py

    You should see an output like the following one

    Report on Test data
              precision    recall  f1-score   support
    
     not merged       0.76      0.22      0.34       264
         merged       0.78      0.98      0.87       753
    
     avg / total       0.78      0.78      0.73      1017
    
    Dumped classifier data to classifier.pkl
    

    This command generates a classifier.pkl binary file which can be used to predict any PR on the target Project.

TODO

  • Build a file to predict a particular PR against the trained model. A command like:
    > python classify.py https://github.com/nodejs/node/pull/11107
    Will not be merged!

Built With

  • scikit-learn - Used their algorithms to estimate PR merge predictions.
  • MongoDB - Used to store Github downloaded project data.
  • Git - Used to compute diffs and analyze PR commit deltas.

Contributing

Feel free to make a Pull Request if you find a bug or want to implement a feature. We welcome any help.

Authors

  • Ignacio Avas - Initial work - igui

Acknowledgments

  • Pablo Grill for his insight and knowledge over Machine Learning

License

pullreq-ml is Copyright (c) 2018 sophilabs, inc. It is free software, and may be redistributed under the terms specified in the license file.

About

sophilabs

pullreq-ml is maintained and funded by sophilabs, inc. The names and logos for sophilabs are trademarks of sophilabs, inc.