Guac-alytics is a collection of tools and resources designed to help researchers and practitioners better understand the risk profile of open-source software ecosystems. The repository contains scripts for collecting, initializing, and handling various datasets required for analyzing open-source software ecosystems.
More information about the project can be found on its website.
Before running this project, you need to package and install it.
The package can be installed from our From Source.
The instructions provided below are for Linux operating systems:
- Clone the project locally:
git clone https://github.com/TSELab/guac-alytics.git
cd
into the project:cd guac-alytics
- Create a
Python 3
virtual environment:python3 -m venv env
- Activate virtual environment:
source env/bin/activate
After installing the package/project from source, the project can be run as individual scripts.
To import variables correctly, you need to set up the path inside each script. The repository contains the following directories:
- scripts directory contains code to help set up the repository (e.g., collect data, initialize the database, etc).
- ingestion directory contains functions and scripts to handle a particular data type (e.g., .buildinfo, or pocon data)
- analytics directory contains scripts/models to gather or visualize results based on the information collected.
To run any of the scripts, execute the following command pattern:
python 'Scripts Directory'/'Script Name'
For example
python scripts/ingestion/buildinfo_main.py
The schema of our data is as follows:
source_table:
source_id (Integer) <PK> | source_name (varchar) | version (varchar) | location (varchar) |
---|---|---|---|
1 | maxima | 5.42.0-1 |
buildinfo_table:
buildinfo_id (Integer) <PK> | source_id (Integer) <FK> | type (varchar) | build_origin (varchar) | build_architecture (varchar) | build_date (datetime) | build_path (varchar) | environment (varchar) |
---|---|---|---|---|---|---|---|
1 | 1 | mips | Debian | mips | 2018-10-05T02:46:09+00:00 | /build/maxima-ffBduW/maxima-5.42.0 | DEB_BUILD_OPTIONS="parallel=2" LC_ALL="POSIX" SOURCE_DATE_EPOCH="1538247291" |
binary_table:
binary_id (Integer) <PK> | package (varchar) | version (varchar) | architecture (varchar) |
---|---|---|---|
1 | maxima | 5.42.0-1 | mips |
dependency_table:
buildinfo_id (Integer) <FK> | build_id (Integer) <FK> |
---|---|
1 | 23 |
output_table:
buildinfo_id (Integer) <FK> | build_id (Integer) <FK> | checksum_md5 (varchar) | checksum_sha1 (varchar) | checksum_sha256 (varchar) |
---|---|---|---|---|
1 | 1 | ['c671904988b053efb0e49405ad82511e 5736524 maxima_5.42.0-1_mips.deb', '6477b5fca4f2bfc6d09aae67f1efc9ca 485988 xmaxima_5.42.0-1_mips.deb'] | ['50a417d7b6642250947730b23f173b08e00425dc 5736524 maxima_5.42.0-1_mips.deb', 'f8caa8d98ecfed3717738e0f4ada053b3683e7a5 485988 xmaxima_5.42.0-1_mips.deb'] | ['d67b0a3b43f8c8cad5ff9b4e4c0120ad7c50021762d9a20560ed785ea0ab2eef 5736524 maxima_5.42.0-1_mips.deb', 'e065f3f443cecc14df0cc55a1df4be547f073dc5422a159cf9d69f97f04ef01d 485988 xmaxima_5.42.0-1_mips.deb'] |
popularity_table:
name (text) | date (date) | inst (integer) | vote (integer) | old (integer) | recent (integer) | no-files (integer) | maintainer (text) | inst_norm (varchar) | vote_norm (varchar) |
---|---|---|---|---|---|---|---|---|---|
dpkg | 02/09/2023 | 209081 | 192500 | 2847 | 13700 | 34 | Dpkg Developers | 195484.300615492 | 179981.575889164 |
maintainer:
name (text) | package (varchar) | inst (integer) | vote (integer) | old (integer) | recent (integer) | no-files (integer) |
---|---|---|---|---|---|---|
Debian Gnome Maintainers | libvaladoc-0.56-dev | 25913899 | 8685994 | 8352785 | 2749012 | 6126108 |
publish_packages:
Package_id | Package | Architecture | Version | Section | Size | pool_endpoint | DFSG | Added_at | MD5sum | SHA1 | SHA256 | Provided_by |
---|---|---|---|---|---|---|---|---|---|---|---|---|
18221 | imagemagick | amd64 | 8:6.8.9.9-5+deb8u4 | graphics | 156996 | pool/main/i/imagemagick/imagemagick_6.8.9.9-5+deb8u4_amd64.deb | main | 20161231T000000Z | c2cedf60dbc3d6f794fe78fb6d5fbe10 | 82eaef39fe894cf7ca829f3080c445d7e59d4285 | 54ca108d2b61a50dfeaaef1ff3315e52bdd47bb1da1f731c02f2c4b7baf19992 |
publish_dependencies:
Dependency_id | Package_id | Dependency_package_id | Condition |
---|---|---|---|
5 | 1 | 2359 | libboost-filesystem1.55.0 |
The data is represented using an ER diagram, which can be edited here.
See these slides to aid contextualize the project. This project may or may not eventually be an overlay over the guacsec code see.
Meeting agenda and notes on this hackmd.