Skip to content

GatorEducator/GatorTracer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GatorTracer

Fetching Json from GitHub with the rule you make to get a dataset the way you want

Setup

  • GitHub: To access to one person's repository in a GitHub organization, that person has to be at least a member in that organization. Outside collaborator permission won't allow our tool to fetch the target repositories. This tool also expects to have a personal access token(PAT) as input. To setup a PAT, please check this GitHub tutorial. Or it's more recommended to use GitHub CLI to authenticate in shell and run gh auth status --show-token to display the PAT automatically generated by GitHub CLI. For the steps of setting up GitHub CLI, please check GitHub CLI QuickStart
  • GatorTracer: our tool currently hasn't been shipped on the pypi. The only approach to use this tool is by cloning this repository link and run poetry install in the root directory of this tool to set up an poetry virtual environment.

Features

Before everything, please run poetry install to install all the required dependencies.

Token Manipulation

As mentioned above, this tool needs GitHub Authentication like PAT to communicate with GitHub to access to the those target Json files. It's reasonable to assume passing tokens every time running command is annoying and inhuman. So this tool gives users choice of using a temporary token or a saved token set up by users themselves.

To manipulate or control saved token, please use the official CLI related to saved-token:

  • poetry run gatortracer saved-token --verify
  • poetry run gatortracer saved-token --save CERTAIN_TOKEN
  • poetry run gatortracer saved-token --remove

to check existence, save a new token and remove saved token.

Here is an example:

>> ~/cs/yes/garage/GatorTracer: poetry run gatortracer saved-token --save fake_token                                       
Token has been saved

The list of flags associated with poetry run gatortracer saved-token:

│ --verify  -v            verify if there is stored gh token                                                                                         │
│ --save    -s      TEXT  save a new gh token                                                                                                        │
│ --remove  -r            remove the currente stored gh token                                                                                        │
│ --help                  Show this message and exit.                                                                                                │

Configuration Manipulation

Here are currently two configuration files: include.json and exclude.json in charge of included organizations plus repositories as well as excluded ones. json_fetch related commands consults those two Json files to find wanted organizations and repositories. It's required to set up those two files before running any other commands. P.S.: while both include.json and exclude.json being specified, only include.json will work.

To write one configuration json file, please have a Json file following the template below and using regular expression to specify the content you want to include/exclude.Here the content of organization should be the url name like my-org-1 other than display name My Organization Name. You can find url name from url and fetch only the organization part. e.g.: fetching GatorEducator in https://github.com/GatorEducator

{
    "organization": [],
    "repository": []
}

Here is an example of include.json that includes all the organizations whose name starts with allegheny-college-sand or starts with allegheny-haunted and include all the repositories whose name ends with Yanqiao4396 or starts with hey within those selected organizations

{
    "organization": [
        "^allegheny-college-sand+.",
        "^allegheny-haunted+."
    ],
    "repository": [
        ".+Yanqiao4396$",
        "hey.+"
    ]
}

To interact with configuration files, use the commands like

  • poetry run gatortracer config --display-in
  • poetry run gatortracer config --clear-in
  • poetry run gatortracer config --in-from-file

An example of usig poetry run gatortracer config --in-from-file

>> ~/cs/yes/garage/GatorTracer: poetry run gatortracer config --in-from-file example_in.json
>> ~/cs/yes/garage/GatorTracer: poetry run gatortracer config --display-in

include.json

{
    "organization": [
        "^yes+.",
        "^org+"
    ],
    "repository": [
        "^sample-repo+."
    ]
}

Here is a list of available flags associated with poetry run gatortracer config

│ --display-all               display both exclude and include config json file                                                                      │
│ --display-in                display the include config json file                                                                                   │
│ --display-ex                display the exclude config json file                                                                                   │
│ --clear-all                 clear both exclude and include config json file                                                                        │
│ --clear-in                  clear include config json file                                                                                         │
│ --clear-ex                  clear exclude config json file                                                                                         │
│ --in-from-file        TEXT  Write configuration include.json from another json file                                                                │
│ --ex-from-file        TEXT  Write configuration exclude.json from another json file                                                                │
│ --help                      Show this message and exit.                                                                                            

Main Feature: Fetch Json

poetry run gatortracer js-fetch commands fetch Json files from certain path in certain branch in certain repositories in certain organizations.

The output of js-fetch will be a series of tables that could fall in two categories: MainTable and CheckTables. MainTable includes the report level information generated by report of GatorGrade. And the specific checks of GatorGrade will be shipped to sub-tables naming after the name of check. the insight report in MainTable and its subordinate-checks are linked by uid, a unique identifier generated based on report level information. uid exists in all the tables. If two rows share the same uid in or across tables, then it literately means these two checks belong to a same insight report.

If you try to store output of js-fetch to a path where MainTable and other tables have already existed, then the new output will be appended to the old tables other than overwriting old ones.

Here is a list of flags associated with it.

│ *  --token          -t      TEXT  Choose the kind of token to use: either the saved token or the temporary token [default: None] [required]        │
│    --branch         -b      TEXT  The branch where json(s) reside [default: insight]                                                               │
│ *  --dir            -d      TEXT  The directory where json(s) reside [default: None] [required]                                                    │
│    --file           -f      TEXT  The file names in the regex format [default: .]                                                              │
│    --parse-insight  -p            parsing insight ```checks to output matrix [default: True]                                                          │
│    --store-path     -s      TEXT  The path where the output files will inhabit. [default: .]                                                       │
│    --help                         Show this message and exit. 

Here is an example of command using js-fetch: poetry run gatortracer js-fetch -t s -b insight -d insight -f "(^insight+.)|(^hello-world+.)" -s tables what this command does is: with saved token, fetch all the Json files in path insight of branch insight. Json file names should start with insight or hello-world. Finally save all the output tables under a directory called tables.

Check Selection

poetry run gatortracer select-check selects all the qualified checks with attribute and attribute-value and qualified check DataSet as a csv file.

Here is a list of flags associated with it:

│ *  --main-path    -p      TEXT  The directory where main table inhabit [default: None] [required]                                                  │
│ *  --attribute    -a      TEXT  the attribute check selection is subject to [default: None] [required]                                             │
│ *  --value        -v      TEXT  the value associate with the attribute [default: None] [required]                                                  │
│    --save-file    -s      TEXT  if specified, then save output as csv in the path you choose                                                       │
│    --table        -t      TEXT  the table where you want to select checks from, all the available tables will be selected by default. [default: .] │
│    --with-report  -r            combine checks with report file information [default: True]                                                    │
│    --help                       Show this message and exit.                                                        

Here is an example of it: poetry run gatortracer select-checks --main-path examples/tables --attribute status --value False --save-file examples/status_false.csv

Here by using the tables in example/tables, I want to fetch all the checks whose status is False and save those checks into a csv file named examples/status_false.csv

Using BranchWrite

Using BranchWrite to automatically generate Json files in a certain branch within workflow, here it's recommended to use BranchWrite dynamically write Json files. For GatorGrade, BranchWrite is extremely helpful to store students GatorGrade reports for future data analysis.

License

This project is licensed under the terms of the MIT license.

Releases

No releases published

Packages

No packages published

Languages