Fetching Json from GitHub with the rule you make to get a dataset the way you want
- GitHub: To access to one person's repository in a GitHub organization, that person has to be at least a member in that organization. Outside collaborator permission won't allow our tool to fetch the target repositories. This tool also expects to have a personal access token(PAT) as input. To setup a PAT, please check this GitHub tutorial. Or it's more recommended to use GitHub CLI to authenticate in shell and run
gh auth status --show-token
to display the PAT automatically generated byGitHub CLI
. For the steps of setting upGitHub CLI
, please check GitHub CLI QuickStart - GatorTracer: our tool currently hasn't been shipped on the
pypi
. The only approach to use this tool is by cloning this repository link and runpoetry install
in the root directory of this tool to set up an poetry virtual environment.
Before everything, please run poetry install
to install all the required dependencies.
As mentioned above, this tool needs GitHub Authentication like PAT to communicate with GitHub to access to the those target Json files. It's reasonable to assume passing tokens every time running command is annoying and inhuman. So this tool gives users choice of using a temporary token or a saved token set up by users themselves.
To manipulate or control saved token, please use the official CLI related to saved-token:
poetry run gatortracer saved-token --verify
poetry run gatortracer saved-token --save CERTAIN_TOKEN
poetry run gatortracer saved-token --remove
to check existence, save a new token and remove saved token.
Here is an example:
>> ~/cs/yes/garage/GatorTracer: poetry run gatortracer saved-token --save fake_token
Token has been saved
The list of flags associated with poetry run gatortracer saved-token
:
│ --verify -v verify if there is stored gh token │
│ --save -s TEXT save a new gh token │
│ --remove -r remove the currente stored gh token │
│ --help Show this message and exit. │
Here are currently two configuration files: include.json
and exclude.json
in charge of included organizations plus repositories as well as excluded ones. json_fetch
related commands consults those two Json files to find wanted organizations and repositories. It's required to set up those two files before running any other commands. P.S.: while both include.json
and exclude.json
being specified, only include.json
will work.
To write one configuration json file, please have a Json file following the template below and using regular expression to specify the content you want to include/exclude.Here the content of organization should be the url name like my-org-1
other than display name My Organization Name
. You can find url name from url and fetch only the organization part. e.g.: fetching GatorEducator
in https://github.com/GatorEducator
{
"organization": [],
"repository": []
}
Here is an example of include.json
that includes all the organizations whose name starts with allegheny-college-sand
or starts with allegheny-haunted
and include all the repositories whose name ends with Yanqiao4396
or starts with hey
within those selected organizations
{
"organization": [
"^allegheny-college-sand+.",
"^allegheny-haunted+."
],
"repository": [
".+Yanqiao4396$",
"hey.+"
]
}
To interact with configuration files, use the commands like
poetry run gatortracer config --display-in
poetry run gatortracer config --clear-in
poetry run gatortracer config --in-from-file
An example of usig poetry run gatortracer config --in-from-file
>> ~/cs/yes/garage/GatorTracer: poetry run gatortracer config --in-from-file example_in.json
>> ~/cs/yes/garage/GatorTracer: poetry run gatortracer config --display-in
include.json
{
"organization": [
"^yes+.",
"^org+"
],
"repository": [
"^sample-repo+."
]
}
Here is a list of available flags associated with poetry run gatortracer config
│ --display-all display both exclude and include config json file │
│ --display-in display the include config json file │
│ --display-ex display the exclude config json file │
│ --clear-all clear both exclude and include config json file │
│ --clear-in clear include config json file │
│ --clear-ex clear exclude config json file │
│ --in-from-file TEXT Write configuration include.json from another json file │
│ --ex-from-file TEXT Write configuration exclude.json from another json file │
│ --help Show this message and exit.
poetry run gatortracer js-fetch
commands fetch Json files from certain path in certain branch in certain repositories in certain organizations.
The output of js-fetch
will be a series of tables that could fall in two categories: MainTable
and CheckTables
. MainTable
includes the report level information generated by report of GatorGrade
. And the specific checks of GatorGrade
will be shipped to sub-tables naming after the name of check. the insight report in MainTable
and its subordinate-checks are linked by uid
, a unique identifier generated based on report level information. uid
exists in all the tables. If two rows share the same uid
in or across tables, then it literately means these two checks belong to a same insight report.
If you try to store output of js-fetch
to a path where MainTable
and other tables have already existed, then the new output will be appended to the old tables other than overwriting old ones.
Here is a list of flags associated with it.
│ * --token -t TEXT Choose the kind of token to use: either the saved token or the temporary token [default: None] [required] │
│ --branch -b TEXT The branch where json(s) reside [default: insight] │
│ * --dir -d TEXT The directory where json(s) reside [default: None] [required] │
│ --file -f TEXT The file names in the regex format [default: .] │
│ --parse-insight -p parsing insight ```checks to output matrix [default: True] │
│ --store-path -s TEXT The path where the output files will inhabit. [default: .] │
│ --help Show this message and exit.
Here is an example of command using js-fetch:
poetry run gatortracer js-fetch -t s -b insight -d insight -f "(^insight+.)|(^hello-world+.)" -s tables
what this command does is: with saved token, fetch all the Json files in path insight
of branch insight
. Json file names should start with insight
or hello-world
. Finally save all the output tables under a directory called tables
.
poetry run gatortracer select-check
selects all the qualified checks with attribute
and attribute-value
and qualified check DataSet as a csv file.
Here is a list of flags associated with it:
│ * --main-path -p TEXT The directory where main table inhabit [default: None] [required] │
│ * --attribute -a TEXT the attribute check selection is subject to [default: None] [required] │
│ * --value -v TEXT the value associate with the attribute [default: None] [required] │
│ --save-file -s TEXT if specified, then save output as csv in the path you choose │
│ --table -t TEXT the table where you want to select checks from, all the available tables will be selected by default. [default: .] │
│ --with-report -r combine checks with report file information [default: True] │
│ --help Show this message and exit.
Here is an example of it: poetry run gatortracer select-checks --main-path examples/tables --attribute status --value False --save-file examples/status_false.csv
Here by using the tables in example/tables
, I want to fetch all the checks whose status is False and save those checks into a csv file named examples/status_false.csv
Using BranchWrite
to automatically generate Json files in a certain branch within workflow, here it's recommended to use BranchWrite dynamically write Json files. For GatorGrade, BranchWrite is extremely helpful to store students GatorGrade reports for future data analysis.
This project is licensed under the terms of the MIT license.