Skip to content

Commit

Permalink
Merge pull request #1 from JuliaConstraints/feature
Browse files Browse the repository at this point in the history
First implementation of the translate interface
  • Loading branch information
nicoladicicco authored Sep 20, 2024
2 parents 94493e6 + 943e555 commit 95ab560
Show file tree
Hide file tree
Showing 45 changed files with 1,516 additions and 15 deletions.
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
*.jl.*.cov
*.jl.cov
*.jl.mem
.DS_Store
.gitignore
.vscode/*
/Manifest.toml
Manifest.toml
23 changes: 22 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,34 @@ uuid = "314c63f5-3dda-4b35-95e7-4cc933f13053"
authors = ["Jean-François BAFFIER (@Azzaare)"]
version = "0.0.1"

[deps]
Constraints = "30f324ab-b02d-43f0-b619-e131c61659f7"
HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3"
InteractiveUtils = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
JSONSchema = "7d188eb4-7ad8-530c-ae41-71a32a6d4692"
REPL = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
TestItems = "1c621080-faea-4a02-84b6-bbd5e436b8fe"

[compat]
Aqua = "0.8"
Constraints = "0.5"
HTTP = "1.10"
InteractiveUtils = "1"
JET = "0.9"
JSON3 = "1"
JSONSchema = "1"
REPL = "1"
Test = "1"
TestItemRunner = "1"
TestItems = "1"
julia = "1.10"

[extras]
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
JET = "c3a54625-cd67-489e-a8e7-0a5a0ff4e31b"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
TestItemRunner = "f8b46487-2199-4994-9208-9a1283c18c0a"

[targets]
test = ["Aqua", "JET", "Test"]
test = ["Aqua", "JET", "Test", "TestItemRunner"]
52 changes: 52 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,55 @@
[![Build Status](https://github.com/Azzaare/ConstraintsTranslator.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/Azzaare/ConstraintsTranslator.jl/actions/workflows/CI.yml?query=branch%3Amain)
[![Coverage](https://codecov.io/gh/Azzaare/ConstraintsTranslator.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/Azzaare/ConstraintsTranslator.jl)
[![Aqua](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl)

A package for translating natural-language descriptions of optimization problems into Constraint Programming models to be solved via [`CBLS.jl`](https://github.com/JuliaConstraints/CBLS.jl) using Large Language Models (LLMs).

This package acts as a light wrapper around common LLM API endpoints, supplying appropriate system prompts and context informations to the LLMs to generate CP models. Specifically, we first prompt the model for generating an high-level representation of the problem in editable Markdown format, and then we prompt the model to generate Julia code.

We currently support the following LLM APIs:
- Groq (https://groq.com)
- Google Gemini (https://ai.google.dev)
- llama.cpp (https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md)

## Why not OpenAI / Anthropic / etc.?
Groq and Gemini are currently offering rate-limited free access to their APIs, and llama.cpp is free and open-source. We are still actively experimenting with this package, and we are not in a position to pay for API access. We might consider adding support for other APIs in the future.

## Workflow example
To begin playing with the package, you can start from the example below:

```julia
using ConstraintsTranslator

llm = GoogleLLM("gemini-1.5-pro")

# Optional setup of a terminal editor (uncomment and select a viable editor on your machine such as vim, nano, emacs, ...)
ENV["EDITOR"] = "vim"


description = """
We need to determine the shortest possible route for a salesman who must visit a set of cities exactly once and return to the starting city.
The objective is to minimize the total travel distance while ensuring that each city is visited exactly once.
Example input data:
1. cities.csv
city_id,city_name
1,CityA
2,CityB
2. distances.csv
from,to,distance
CityA,CityB,10
CityA,CityC,8
"""

response = translate(llm, description)
```

The `translate` function will first produce a Markdown representation of the problem, and then return the generated Julia code for parsing the input data and building the model.

This example uses Google Gemini as an LLM. You will need an API key and a model id to access proprietary API endpoints. Use `help?>` in the Julia REPL to learn more about the available models.

At each generation step, it will prompt the user in an interactive menu to accept the answer, edit the prompt and/or the generated text, or generate another answer with the same prompt.

The LLM expects the user to provide examples of the input data format. If no examples are present, the LLM will make assumptions about the data format based on the problem description.

16 changes: 16 additions & 0 deletions dataset/prompts/abstract_knapsack.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
I am planning a vacation and need to pack my suitcase, which has a strict weight limit.
I have several items to choose from, each with its own weight and level of importance.
The goal is to select the combination of items that will ensure the best possible vacation experience while staying within the allowed weight limit.

Example input data:
1. items.csv
item_id,item_name,weight,importance
1,ski_combination,7,low
2,warm_clothes,4,normal
3,hiking_boots,3,high
4,hiking_book,1,high
5,umbrella,2,normal

2. weight.csv
weight_limit
10
17 changes: 17 additions & 0 deletions dataset/prompts/calendar_scheduling.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
We are tasked with scheduling a set of meetings within a specific time frame, ensuring that no meetings overlap and all required participants can attend each meeting.
The objective is to find a feasible schedule that accommodates the availability of all participants and fits within the given time constraints.

Example input data:
1. meetings.csv
meeting_id,duration,participants
1,1,John;Alice
2,2,John;Bob
3,1,Alice;Charlie
4,1,Bob;Charlie

2. availability.csv
participant_id,availability_start,availability_end
John,09:00,12:00
Alice,10:00,13:00
Bob,09:00,11:00
Charlie,11:00,14:00
15 changes: 15 additions & 0 deletions dataset/prompts/capacitated_facility_location.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
We aim to solve a Capacitated Facility Location problem where a set of facilities and customers is given.
The objective is to minimize the total cost of opening facilities and serving customers while ensuring that each customer's demand is fully satisfied, and no facility exceeds its capacity.

Example input data:
1. facilities.csv
facility_id,opening_cost,capacity
1,5,15

2. customers.csv
customer_id,demand
1,5

3. transport_cost.csv
facility_id,customer_id,cost
1,1,3
9 changes: 9 additions & 0 deletions dataset/prompts/cargo_loading_2d.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
We are working for a logistics company that handles cargo shipping in containers.
Each container has a fixed width of 2.5 meters and a height of 2.5 meters.
The company receives orders to ship various items, each with specific dimensions.
The task is to determine how to load the items into the containers to minimize the number of containers used, while ensuring that no items are rotated and all items fit within the container’s dimensions.

Example input data:
1. items.csv
item_id,width,height,quantity
I1,1.2,0.5,10
9 changes: 9 additions & 0 deletions dataset/prompts/cargo_loading_3d.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
We are working for a logistics company that handles cargo shipping in containers.
Each container has fixed dimensions: 2.5 meters in width, 2.5 meters in height, and 6 meters in length.
The company receives orders to ship various items, each with specific dimensions.
We want to determine how to load the items into the containers to minimize the number of containers used, while ensuring that no items are rotated and all items fit within the container's dimensions.

Example input data:
1. items.csv
item_id,width,height,length,quantity
I1,1.2,0.5,3.0,10
20 changes: 20 additions & 0 deletions dataset/prompts/constrained_shortest_path.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
We need to find the shortest path, in terms of the number of hops, between a given source and destination in a capacitated graph.
Each link in the graph has a physical length and a capacity.
The objective is to find the path that minimizes the number of hops while satisfying constraints on the path's capacity (which is the minimum edge capacity along the path) and the total path length.

Example input data:
1. graph.csv
link_id,source_node,destination_node,capacity,length
1,NodeA,NodeB,10,5
2,NodeB,NodeC,15,7
3,NodeA,NodeC,8,12
4,NodeC,NodeD,12,3
5,NodeB,NodeD,9,4

2. source_destination.csv
source_node,destination_node
NodeA,NodeD

3. constraints.csv
min_path_capacity,max_path_length
9,15
12 changes: 12 additions & 0 deletions dataset/prompts/cutting_stock.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
We have a paper roll manufacturing company that produces standard rolls of paper, all of the same width but with a fixed length of 100 meters.
The company receives orders from customers for smaller rolls of different lengths.
Our task is to determine how to cut the standard rolls to fulfill these orders while minimizing the number of standard rolls used.

Example input data:
1. orders.csv
order_id,length_required,quantity
O1,30,5
O2,45,3
O3,65,2
O4,50,4
O5,80,1
33 changes: 33 additions & 0 deletions dataset/prompts/frequency_assignment.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
We need to assign radio frequencies to a set of transmitters in a telecommunication network.
Each transmitter must be assigned a frequency from a given set of available frequencies.
The objective is to minimize interference between transmitters while using the minimum number of distinct frequencies.
The interference between two transmitters is proportional to the square of their geographical distance and the absolute difference between their assigned frequencies.
Each transmitter must be assigned exactly one frequency.
The frequency assigned to a transmitter must be within its allowed frequency range.
Transmitters that are geographically close to each other must have a minimum frequency separation to avoid interference.
Some transmitters may have pre-assigned frequencies that cannot be changed. Pre_assigned_frequency of -1 means no pre-assignment.

Example input data:
1. transmitters.csv
transmitter_id,x_coordinate,y_coordinate,min_frequency,max_frequency,pre_assigned_frequency
T1,10,20,1,10,-1
T2,15,25,1,10,-1
T3,30,40,1,15,-1
T4,35,45,1,15,7
T5,50,60,5,20,-1

available_frequencies.csv
frequency_id,frequency_value
F1,1
F2,2
F3,3
F4,4
F5,5

3. interference_matrix.csv
transmitter1_id,transmitter2_id,min_frequency_separation,interference_cost
T1,T2,2,10
T1,T3,1,5
T1,T4,1,3
T1,T5,0,1
T2,T3,2,8
8 changes: 8 additions & 0 deletions dataset/prompts/golomb.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
We need to find a feasible Golomb ruler of a specified length m and order n.
A Golomb ruler is a set of n marks placed along a ruler such that all pairwise distances between marks are distinct.
The goal is to determine the positions of the marks that satisfy the distinct distance condition for the given length.

Example input data:
1. input.csv
ruler_length,number_of_marks
10,4
12 changes: 12 additions & 0 deletions dataset/prompts/job_shop_scheduling.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
We need to schedule a set of jobs on 4 machines, where each job consists of a sequence of tasks.
Each task must be processed on a specific machine for a given duration, and tasks within a job must follow a predefined order (dependency graph).
The objective is to find a feasible schedule that minimizes the overall completion time (makespan) while respecting the task dependencies and machine availability.

Example input data:
1. input.txt
task_id,job_id,machine_id,processing_time,dependencies
1,1,1,3,
2,1,2,2,1
3,1,3,4,2
4,2,2,5,
5,2,1,3,4
16 changes: 16 additions & 0 deletions dataset/prompts/knapsack.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
We aim to solve a Knapsack problem where a set of items is given.
For each item, we define a binary decision variable to indicate whether the item is included in the knapsack.
The objective is to maximize the total utility of the selected items without exceeding a given weight limit.

Example input data:
1. items.csv
item_id,weight,utility
1,2,2
2,3,3
3,7,1
4,4,2
5,1,3

2. weight.csv
weight_limit
10
23 changes: 23 additions & 0 deletions dataset/prompts/marriage_seats.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
You are tasked with creating a seating arrangement for a wedding reception.
The reception will be held in a venue with round tables, each seating 8 people.
The bride and groom must be seated at the same table (Table 1).
Immediate family members of the bride and groom must be seated at Tables 1 and 2.
Couples must be seated together.
People with known conflicts (e.g., divorced couples, family feuds) must be seated at different tables.
Maximize the number of guests seated with others they know, and seat guests with similar interests together when possible.

Example input data:
1. guests.csv
guest_id,name,group,dietary_requirement,interests
1,John Smith,Groom's Family,None,No,Sports
2,Jane Smith,Groom's Family,Vegetarian,No,Art
3,Alice Johnson,Bride's Family,Nut Allergy,Yes,Music
4,Bob Johnson,Bride's Family,None,No,Travel

2. relationships.csv
guest_id1,guest_id2,relationship
1,2,Couple
3,4,Couple
5,6,Conflict
7,8,Strangers
1,6,Friends
Empty file added dataset/prompts/min_cut.txt
Empty file.
2 changes: 2 additions & 0 deletions dataset/prompts/n_queens.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
I want to solve an n-queen puzzle where n is a positive integer.
The n-queen puzzle is the problem of placing n queens on an n x n chessboard such that no two queens can attack each other.
37 changes: 37 additions & 0 deletions dataset/prompts/nurse_rostering.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
We need to create a weekly schedule for nurses in a hospital.
Each nurse must be assigned shifts for a 7-day week, with three shift types: morning (7:00-15:00), evening (15:00-23:00), and night (23:00-7:00).
Each shift must be covered by the required number of nurses with appropriate skills.
Nurses cannot be assigned to more than one shift per day.
Each nurse must have at least 11 hours of rest between shifts.
Each nurse should work between 30 and 40 hours per week.
Each nurse should have at least one weekend day (Saturday or Sunday) off every two weeks.
The number of night shifts for each nurse should be distributed fairly.
The goal is to find a feasible schedule while maximizing fairness in satisfying nurse preferences.

Example input data:

1. nurses.csv
nurse_id,name,skill_level,max_shifts_per_week,max_night_shifts_per_week
N1,Alice,senior,5,2
N2,Bob,junior,5,2
N3,Charlie,senior,4,1
N4,Diana,mid,5,2
N5,Eve,junior,4,1

2. shift_requirements.csv
day,shift_type,required_seniors,required_mid,required_juniors
Monday,morning,1,1,1
Monday,evening,1,1,1
Monday,night,1,0,1
Tuesday,morning,1,1,1
Tuesday,evening,1,1,1
Tuesday,night,1,0,1

3. nurse_preferences.csv
nurse_id,day,shift_type,preference_score
N1,Monday,morning,3
N1,Monday,evening,1
N1,Monday,night,0
N2,Monday,morning,2
N2,Monday,evening,2
N2,Monday,night,1
35 changes: 35 additions & 0 deletions dataset/prompts/quadratic_assignment.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
We aim to assign a set of facilities to a set of locations.
The objective is to minimize the total cost, which is equal to the distance between locations times the flow between facilities.
Example input data:

1. facilities.csv
facility_id
1
2
3
4

2. locations.csv
location_id
1
2
3
4

3. flow.csv
facility_id_1,facility_id_2,flow
1,2,10
1,3,8
1,4,12
2,3,6
2,4,9
3,4,7

4. distance.csv
location_id_1,location_id_2,distance
1,2,4
1,3,7
1,4,3
2,3,5
2,4,6
3,4,2
Loading

0 comments on commit 95ab560

Please sign in to comment.