Create an LLM app with deep understanding of a GitHub repo

Overview

This repository contains scripts designed to extract and summarize data from various sources, including GitHub repositories, local folders, and custom data formats. Additionally, it provides tools to generate summaries and Q&A pairs from CSV files for knowledge-based applications.

Follow the steps outlined below to use the scripts effectively.

Prerequisites

Before running the scripts, ensure the following tools and libraries are installed:

Python: Version 3.10 or above
Required Python libraries: Install them using the requirements.txt file:
```
pip install -r requirements.txt
```

Script Descriptions

Data Parsing Scripts

GitHub Parser (github_parser.py)
Extracts data from a GitHub repository and creates a CSV file.
- How to Run:
```
 python github_parser.py <repo_path> <output_path> --exclude folder1 folder2
```
Local Folder Parser (local_parser.py)
Extracts data from a local folder and creates a CSV file.
- How to Run:
```
python local_parser.py /path/to/local/repo output.csv --exclude folder1 folder2
```
Repopack Parser (repopack_parser.py)
Processes a .txt file generated by the Repopack tool and creates a CSV file.
- How to Run:
```
python repopack_parser.py <input_text_file> <output_csv_file>
```

Summarizer and Q&A Generator Scripts

Summarizer and Q&A Generator (summarizer.py)
Processes a CSV file containing textual data to generate summaries and Q&A pairs.
- How to Run:
```
python summarizer.py <input_csv> <output_csv>
```
- Output:
  - A CSV file with the original content, generated summaries, and Q&A pairs.
Individual Operation Scripts:
- Separate scripts are available for running only one operation at a time:
  - Summary Only: Use summary_only.py
  - Q&A Only: Use qna_only.py
- How to Run:
```
python summary_only.py <input_csv> <output_csv>
python qna_only.py <input_csv> <output_csv>
```

Workflow

Step 1: Parse Data

Choose the appropriate parser script based on your data source. Run the script to generate a CSV file with raw content.

Step 2: Summarize and Generate Q&A Pairs

Use the summarizer.py script to process the CSV file and generate summaries and Q&A pairs.

Step 3: Individual Operations (Optional)

If you need only summaries or Q&A pairs, use the corresponding individual operation script.

Automate the Workflow

To keep the process automated, you can set up a cron job for periodic execution of the summarizer or parser scripts.

Example Cron Job for Daily Updates

Open the crontab editor:
```
crontab -e
```

Add the following line to run the summarizer at midnight:

0 0 * * * python /path/to/summarizer.py /path/to/input.csv /path/to/output.csv >> /path/to/logfile.log 2>&1

Save and exit.

Notes

Ensure input CSV files are well-formed and contain valid textual data.
For large datasets, the processing time may vary depending on system resources.
Modify the summarizer.py script to adjust the prompt or model settings as needed.
The repository is structured to support additional scripts for parsing or summarizing different formats.
Once the database is created using these scripts, it can be utilized to generate a knowledge base that helps address issues related to a specific repository. For more information, refer to this resource.
Knowledge bases for certain repositories, along with the final CSV files containing summaries and Q&A pairs, can be accessed here.
An example of creating a knowledge base for Gradio can be found in the examples folder here.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
Output		Output
examples @ f7b9a19		examples @ f7b9a19
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Create an LLM app with deep understanding of a GitHub repo

Overview

Prerequisites

Script Descriptions

Data Parsing Scripts

Summarizer and Q&A Generator Scripts

Workflow

Step 1: Parse Data

Step 2: Summarize and Generate Q&A Pairs

Step 3: Individual Operations (Optional)

Automate the Workflow

Example Cron Job for Daily Updates

Notes

Additional Resources

About

Releases

Packages

Languages

staru09/Github_analyser

Folders and files

Latest commit

History

Repository files navigation

Create an LLM app with deep understanding of a GitHub repo

Overview

Prerequisites

Script Descriptions

Data Parsing Scripts

Summarizer and Q&A Generator Scripts

Workflow

Step 1: Parse Data

Step 2: Summarize and Generate Q&A Pairs

Step 3: Individual Operations (Optional)

Automate the Workflow

Example Cron Job for Daily Updates

Notes

Additional Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages