Merge pull request #1 from JuliaConstraints/feature

First implementation of the translate interface
JuliaConstraints · Sep 20, 2024 · 95ab560 · 95ab560
2 parents 94493e6 + 943e555
commit 95ab560
Show file tree

Hide file tree

Showing 45 changed files with 1,516 additions and 15 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,8 @@
+*.jl.*.cov
+*.jl.cov
+*.jl.mem
+.DS_Store
+.gitignore
+.vscode/*
+/Manifest.toml
+Manifest.toml
diff --git a/Project.toml b/Project.toml
@@ -3,13 +3,34 @@ uuid = "314c63f5-3dda-4b35-95e7-4cc933f13053"
 authors = ["Jean-François BAFFIER (@Azzaare)"]
 version = "0.0.1"
 
+[deps]
+Constraints = "30f324ab-b02d-43f0-b619-e131c61659f7"
+HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3"
+InteractiveUtils = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
+JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
+JSONSchema = "7d188eb4-7ad8-530c-ae41-71a32a6d4692"
+REPL = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
+TestItems = "1c621080-faea-4a02-84b6-bbd5e436b8fe"
+
 [compat]
+Aqua = "0.8"
+Constraints = "0.5"
+HTTP = "1.10"
+InteractiveUtils = "1"
+JET = "0.9"
+JSON3 = "1"
+JSONSchema = "1"
+REPL = "1"
+Test = "1"
+TestItemRunner = "1"
+TestItems = "1"
 julia = "1.10"
 
 [extras]
 Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
 JET = "c3a54625-cd67-489e-a8e7-0a5a0ff4e31b"
 Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
+TestItemRunner = "f8b46487-2199-4994-9208-9a1283c18c0a"
 
 [targets]
-test = ["Aqua", "JET", "Test"]
+test = ["Aqua", "JET", "Test", "TestItemRunner"]
diff --git a/README.md b/README.md
@@ -3,3 +3,55 @@
 [![Build Status](https://github.com/Azzaare/ConstraintsTranslator.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/Azzaare/ConstraintsTranslator.jl/actions/workflows/CI.yml?query=branch%3Amain)
 [![Coverage](https://codecov.io/gh/Azzaare/ConstraintsTranslator.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/Azzaare/ConstraintsTranslator.jl)
 [![Aqua](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl)
+
+A package for translating natural-language descriptions of optimization problems into Constraint Programming models to be solved via [`CBLS.jl`](https://github.com/JuliaConstraints/CBLS.jl) using Large Language Models (LLMs).
+
+This package acts as a light wrapper around common LLM API endpoints, supplying appropriate system prompts and context informations to the LLMs to generate CP models. Specifically, we first prompt the model for generating an high-level representation of the problem in editable Markdown format, and then we prompt the model to generate Julia code.
+
+We currently support the following LLM APIs:
+- Groq (https://groq.com)
+- Google Gemini (https://ai.google.dev)
+- llama.cpp (https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md)
+
+## Why not OpenAI / Anthropic / etc.?
+Groq and Gemini are currently offering rate-limited free access to their APIs, and llama.cpp is free and open-source. We are still actively experimenting with this package, and we are not in a position to pay for API access. We might consider adding support for other APIs in the future.
+
+## Workflow example
+To begin playing with the package, you can start from the example below:
+
+```julia
+using ConstraintsTranslator
+
+llm = GoogleLLM("gemini-1.5-pro")
+
+# Optional setup of a terminal editor (uncomment and select a viable editor on your machine such as vim, nano, emacs, ...)
+ENV["EDITOR"] = "vim"
+
+
+description = """
+We need to determine the shortest possible route for a salesman who must visit a set of cities exactly once and return to the starting city.
+The objective is to minimize the total travel distance while ensuring that each city is visited exactly once.
+
+Example input data:
+1. cities.csv
+city_id,city_name
+1,CityA
+2,CityB
+
+2. distances.csv
+from,to,distance
+CityA,CityB,10
+CityA,CityC,8
+"""
+
+response = translate(llm, description)
+```
+
+The `translate` function will first produce a Markdown representation of the problem, and then return the generated Julia code for parsing the input data and building the model.
+
+This example uses Google Gemini as an LLM. You will need an API key and a model id to access proprietary API endpoints. Use `help?>` in the Julia REPL to learn more about the available models.
+
+At each generation step, it will prompt the user in an interactive menu to accept the answer, edit the prompt and/or the generated text, or generate another answer with the same prompt.
+
+The LLM expects the user to provide examples of the input data format. If no examples are present, the LLM will make assumptions about the data format based on the problem description.
+
diff --git a/dataset/prompts/abstract_knapsack.txt b/dataset/prompts/abstract_knapsack.txt
@@ -0,0 +1,16 @@
+I am planning a vacation and need to pack my suitcase, which has a strict weight limit. 
+I have several items to choose from, each with its own weight and level of importance. 
+The goal is to select the combination of items that will ensure the best possible vacation experience while staying within the allowed weight limit.
+
+Example input data:
+1. items.csv
+item_id,item_name,weight,importance
+1,ski_combination,7,low
+2,warm_clothes,4,normal
+3,hiking_boots,3,high
+4,hiking_book,1,high
+5,umbrella,2,normal
+
+2. weight.csv
+weight_limit
+10
diff --git a/dataset/prompts/calendar_scheduling.txt b/dataset/prompts/calendar_scheduling.txt
@@ -0,0 +1,17 @@
+We are tasked with scheduling a set of meetings within a specific time frame, ensuring that no meetings overlap and all required participants can attend each meeting. 
+The objective is to find a feasible schedule that accommodates the availability of all participants and fits within the given time constraints.
+
+Example input data:
+1. meetings.csv
+meeting_id,duration,participants
+1,1,John;Alice
+2,2,John;Bob
+3,1,Alice;Charlie
+4,1,Bob;Charlie
+
+2. availability.csv
+participant_id,availability_start,availability_end
+John,09:00,12:00
+Alice,10:00,13:00
+Bob,09:00,11:00
+Charlie,11:00,14:00
diff --git a/dataset/prompts/capacitated_facility_location.txt b/dataset/prompts/capacitated_facility_location.txt
@@ -0,0 +1,15 @@
+We aim to solve a Capacitated Facility Location problem where a set of facilities and customers is given.
+The objective is to minimize the total cost of opening facilities and serving customers while ensuring that each customer's demand is fully satisfied, and no facility exceeds its capacity.
+
+Example input data:
+1. facilities.csv
+facility_id,opening_cost,capacity
+1,5,15
+
+2. customers.csv
+customer_id,demand
+1,5
+
+3. transport_cost.csv
+facility_id,customer_id,cost
+1,1,3
diff --git a/dataset/prompts/cargo_loading_2d.txt b/dataset/prompts/cargo_loading_2d.txt
@@ -0,0 +1,9 @@
+We are working for a logistics company that handles cargo shipping in containers. 
+Each container has a fixed width of 2.5 meters and a height of 2.5 meters. 
+The company receives orders to ship various items, each with specific dimensions. 
+The task is to determine how to load the items into the containers to minimize the number of containers used, while ensuring that no items are rotated and all items fit within the container’s dimensions.
+
+Example input data:
+1. items.csv 
+item_id,width,height,quantity 
+I1,1.2,0.5,10 
diff --git a/dataset/prompts/cargo_loading_3d.txt b/dataset/prompts/cargo_loading_3d.txt
@@ -0,0 +1,9 @@
+We are working for a logistics company that handles cargo shipping in containers.
+Each container has fixed dimensions: 2.5 meters in width, 2.5 meters in height, and 6 meters in length.
+The company receives orders to ship various items, each with specific dimensions.
+We want to determine how to load the items into the containers to minimize the number of containers used, while ensuring that no items are rotated and all items fit within the container's dimensions.
+
+Example input data:
+1. items.csv
+item_id,width,height,length,quantity
+I1,1.2,0.5,3.0,10
diff --git a/dataset/prompts/constrained_shortest_path.txt b/dataset/prompts/constrained_shortest_path.txt
@@ -0,0 +1,20 @@
+We need to find the shortest path, in terms of the number of hops, between a given source and destination in a capacitated graph. 
+Each link in the graph has a physical length and a capacity. 
+The objective is to find the path that minimizes the number of hops while satisfying constraints on the path's capacity (which is the minimum edge capacity along the path) and the total path length.
+
+Example input data:
+1. graph.csv
+link_id,source_node,destination_node,capacity,length
+1,NodeA,NodeB,10,5
+2,NodeB,NodeC,15,7
+3,NodeA,NodeC,8,12
+4,NodeC,NodeD,12,3
+5,NodeB,NodeD,9,4
+
+2. source_destination.csv
+source_node,destination_node
+NodeA,NodeD
+
+3. constraints.csv
+min_path_capacity,max_path_length
+9,15
diff --git a/dataset/prompts/cutting_stock.txt b/dataset/prompts/cutting_stock.txt
@@ -0,0 +1,12 @@
+We have a paper roll manufacturing company that produces standard rolls of paper, all of the same width but with a fixed length of 100 meters. 
+The company receives orders from customers for smaller rolls of different lengths. 
+Our task is to determine how to cut the standard rolls to fulfill these orders while minimizing the number of standard rolls used.
+
+Example input data:
+1. orders.csv
+order_id,length_required,quantity
+O1,30,5
+O2,45,3
+O3,65,2
+O4,50,4
+O5,80,1
diff --git a/dataset/prompts/frequency_assignment.txt b/dataset/prompts/frequency_assignment.txt
@@ -0,0 +1,33 @@
+We need to assign radio frequencies to a set of transmitters in a telecommunication network. 
+Each transmitter must be assigned a frequency from a given set of available frequencies. 
+The objective is to minimize interference between transmitters while using the minimum number of distinct frequencies.
+The interference between two transmitters is proportional to the square of their geographical distance and the absolute difference between their assigned frequencies.
+Each transmitter must be assigned exactly one frequency.
+The frequency assigned to a transmitter must be within its allowed frequency range.
+Transmitters that are geographically close to each other must have a minimum frequency separation to avoid interference.
+Some transmitters may have pre-assigned frequencies that cannot be changed. Pre_assigned_frequency of -1 means no pre-assignment.
+
+Example input data:
+1. transmitters.csv
+transmitter_id,x_coordinate,y_coordinate,min_frequency,max_frequency,pre_assigned_frequency
+T1,10,20,1,10,-1
+T2,15,25,1,10,-1
+T3,30,40,1,15,-1
+T4,35,45,1,15,7
+T5,50,60,5,20,-1
+
+available_frequencies.csv
+frequency_id,frequency_value
+F1,1
+F2,2
+F3,3
+F4,4
+F5,5
+
+3. interference_matrix.csv
+transmitter1_id,transmitter2_id,min_frequency_separation,interference_cost
+T1,T2,2,10
+T1,T3,1,5
+T1,T4,1,3
+T1,T5,0,1
+T2,T3,2,8
diff --git a/dataset/prompts/golomb.txt b/dataset/prompts/golomb.txt
@@ -0,0 +1,8 @@
+We need to find a feasible Golomb ruler of a specified length m and order n.
+A Golomb ruler is a set of n marks placed along a ruler such that all pairwise distances between marks are distinct. 
+The goal is to determine the positions of the marks that satisfy the distinct distance condition for the given length.
+
+Example input data:
+1. input.csv
+ruler_length,number_of_marks
+10,4
diff --git a/dataset/prompts/job_shop_scheduling.txt b/dataset/prompts/job_shop_scheduling.txt
@@ -0,0 +1,12 @@
+We need to schedule a set of jobs on 4 machines, where each job consists of a sequence of tasks. 
+Each task must be processed on a specific machine for a given duration, and tasks within a job must follow a predefined order (dependency graph). 
+The objective is to find a feasible schedule that minimizes the overall completion time (makespan) while respecting the task dependencies and machine availability.
+
+Example input data:
+1. input.txt
+task_id,job_id,machine_id,processing_time,dependencies
+1,1,1,3,
+2,1,2,2,1
+3,1,3,4,2
+4,2,2,5,
+5,2,1,3,4
diff --git a/dataset/prompts/knapsack.txt b/dataset/prompts/knapsack.txt
@@ -0,0 +1,16 @@
+We aim to solve a Knapsack problem where a set of items is given. 
+For each item, we define a binary decision variable to indicate whether the item is included in the knapsack. 
+The objective is to maximize the total utility of the selected items without exceeding a given weight limit.
+
+Example input data:
+1. items.csv
+item_id,weight,utility
+1,2,2
+2,3,3
+3,7,1
+4,4,2
+5,1,3
+
+2. weight.csv
+weight_limit
+10
diff --git a/dataset/prompts/marriage_seats.txt b/dataset/prompts/marriage_seats.txt
@@ -0,0 +1,23 @@
+You are tasked with creating a seating arrangement for a wedding reception. 
+The reception will be held in a venue with round tables, each seating 8 people.
+The bride and groom must be seated at the same table (Table 1).
+Immediate family members of the bride and groom must be seated at Tables 1 and 2.
+Couples must be seated together.
+People with known conflicts (e.g., divorced couples, family feuds) must be seated at different tables.
+Maximize the number of guests seated with others they know, and seat guests with similar interests together when possible.
+
+Example input data:
+1. guests.csv
+guest_id,name,group,dietary_requirement,interests
+1,John Smith,Groom's Family,None,No,Sports
+2,Jane Smith,Groom's Family,Vegetarian,No,Art
+3,Alice Johnson,Bride's Family,Nut Allergy,Yes,Music
+4,Bob Johnson,Bride's Family,None,No,Travel
+
+2. relationships.csv
+guest_id1,guest_id2,relationship
+1,2,Couple
+3,4,Couple
+5,6,Conflict
+7,8,Strangers
+1,6,Friends
diff --git a/dataset/prompts/min_cut.txt b/dataset/prompts/min_cut.txt
diff --git a/dataset/prompts/n_queens.txt b/dataset/prompts/n_queens.txt
@@ -0,0 +1,2 @@
+I want to solve an n-queen puzzle where n is a positive integer.
+The n-queen puzzle is the problem of placing n queens on an n x n chessboard such that no two queens can attack each other.
diff --git a/dataset/prompts/nurse_rostering.txt b/dataset/prompts/nurse_rostering.txt
@@ -0,0 +1,37 @@
+We need to create a weekly schedule for nurses in a hospital. 
+Each nurse must be assigned shifts for a 7-day week, with three shift types: morning (7:00-15:00), evening (15:00-23:00), and night (23:00-7:00).
+Each shift must be covered by the required number of nurses with appropriate skills.
+Nurses cannot be assigned to more than one shift per day.
+Each nurse must have at least 11 hours of rest between shifts.
+Each nurse should work between 30 and 40 hours per week.
+Each nurse should have at least one weekend day (Saturday or Sunday) off every two weeks.
+The number of night shifts for each nurse should be distributed fairly.
+The goal is to find a feasible schedule while maximizing fairness in satisfying nurse preferences.
+
+Example input data:
+
+1. nurses.csv
+nurse_id,name,skill_level,max_shifts_per_week,max_night_shifts_per_week
+N1,Alice,senior,5,2
+N2,Bob,junior,5,2
+N3,Charlie,senior,4,1
+N4,Diana,mid,5,2
+N5,Eve,junior,4,1
+
+2. shift_requirements.csv
+day,shift_type,required_seniors,required_mid,required_juniors
+Monday,morning,1,1,1
+Monday,evening,1,1,1
+Monday,night,1,0,1
+Tuesday,morning,1,1,1
+Tuesday,evening,1,1,1
+Tuesday,night,1,0,1
+
+3. nurse_preferences.csv
+nurse_id,day,shift_type,preference_score
+N1,Monday,morning,3
+N1,Monday,evening,1
+N1,Monday,night,0
+N2,Monday,morning,2
+N2,Monday,evening,2
+N2,Monday,night,1
diff --git a/dataset/prompts/quadratic_assignment.txt b/dataset/prompts/quadratic_assignment.txt
@@ -0,0 +1,35 @@
+We aim to assign a set of facilities to a set of locations.
+The objective is to minimize the total cost, which is equal to the distance between locations times the flow between facilities.
+Example input data:
+
+1. facilities.csv
+facility_id
+1
+2
+3
+4
+
+2. locations.csv
+location_id
+1
+2
+3
+4
+
+3. flow.csv
+facility_id_1,facility_id_2,flow
+1,2,10
+1,3,8
+1,4,12
+2,3,6
+2,4,9
+3,4,7
+
+4. distance.csv
+location_id_1,location_id_2,distance
+1,2,4
+1,3,7
+1,4,3
+2,3,5
+2,4,6
+3,4,2
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		I want to solve an n-queen puzzle where n is a positive integer.
		The n-queen puzzle is the problem of placing n queens on an n x n chessboard such that no two queens can attack each other.