Skip to content

Loading csv files into a database

ushnisha edited this page Feb 12, 2018 · 6 revisions

This page provides more information on how to use the utility scripts to load csv text files into a database. Helper scripts are currently available for sqlite, mysql and postgresql and we explain the usage below.

Required csv file format:

  1. The csv files must have names of the format "tablename.csv" for each of the tables we wish to load.

  2. A header row, starting with the "#" symbol must be provided for each file. The header must contain a comma separated list of the column names which must be in the exact same order as specified in the database schema files.

  3. You can use the csv files provided in the tests/testXXXX directories to guide you in the creation of these files. All columns MUST be specified (as specified in the test plan examples) for these scripts to work.

Of course, you can always choose any other method/tools to populate the database of your choice and ignore the scripts provided below if you find the above requirements excessively restrictive.

Loading csv files into a sqlite database:

Please try out sqlitebrowser available at http://sqlitebrowser.org/ to load and manage your sqlite database. If for some reason you want to avoid using sqlitebrowser, then a couple of utility scripts are provided below.

Python script:

A python script is available at: https://github.com/ushnisha/jobshop-minimal/blob/master/scripts/load_sqlite_db.py

This script requires 3 arguments.

  1. Path to the file that contains the list of tables that must be loaded (and the sequence in which they must be loaded). A sample file for this is available at: https://github.com/ushnisha/jobshop-minimal/blob/master/scripts/load_sequence.dat
  1. Path to the sqlite database file. The python script assumes that the database schema has already been created in the database.
  1. Path to the directory containing the csv text files with suitable data. Any of the tests/testXXXX directories can be used as a template for creating your data.

For example, you can run this python script from the root directory of the project as follows:

python scripts/load_sqlite_db.py scripts/load_sequence.dat db/jobshop.db tests/test0001

to load the test plan named test0001 into the database.

You can then invoke the application by running the following command:

java -jar lib/JobShop.jar jobshop_options.opt

Check https://github.com/ushnisha/jobshop-minimal/wiki/Configuring-the-Options-File for details on how to configure the option file.

Shell script:

A shell script is also provided in case you are more comfortable using shell scripts as opposed to python scripts. This approach uses the native loader facility of sqlite and it is possible that for large datasets, this method may be faster that the python script. No benchmarking tests have been run; so this is just a hypothesis at this time.

The shell script is available at: https://github.com/ushnisha/jobshop-minimal/blob/master/scripts/load_sqlite_db.sh

It requires 3 arguments:

  1. Path to the root directory of the jobshop-minimal project
  2. Path to the database file location
  3. Path to the directory containing the csv data files

For example, you can run this shell script as follows from the root directory of the project (spacing exaggerated to separate the parameters for easy reading):

scripts/load_sqlite_db.sh . db/jobshop.db tests/test0001

and it will load the dataset into the sqlite database. You can then run the jobshop-minimal application as described earlier.

The output of the application is automatically written to the testplan table in the database and you can extract this information using any custom method you prefer.

Loading csv files into a mysql database:

A shell script is available at: https://github.com/ushnisha/jobshop-minimal/blob/master/scripts/load_mysql_db.sh

This script requires 3 arguments.

  1. Path to the root directory of the jobshop-minimal project
  1. Path to the directory containing the csv text files with suitable data. Any of the tests/testXXXX directories can be used as a template for creating your data.
  1. Path to a csv option file that contains database connection details. This file would contain the hostname, database name, database username that can be used to connect to the database. A sample of this file is below.

For example, you can run this shell script from the root directory of the project as follows:

scripts/load_mysql_db.sh . tests/test0001 mysql_options.txt

to load the test plan named test0001 into the database.

A sample mysql_options.txt would contain the following records:

DBNAME,jobshop
DBHOSTNAME,localhost
DBUSERNAME,jobshop

The script will prompt you for the password corresponding to the database details specified above and will authenticate the database user and load the csv flat files into the database.

Loading csv files into a postgresql database:

A shell script is available at: https://github.com/ushnisha/jobshop-minimal/blob/master/scripts/load_pgsql_db.sh

This script requires 3 arguments.

  1. Path to the root directory of the jobshop-minimal project
  1. Path to the directory containing the csv text files with suitable data. Any of the tests/testXXXX directories can be used as a template for creating your data.
  1. Path to a csv option file that contains database connection details. This file would contain the hostname, database name, database username that can be used to connect to the database. A sample of this file is below.

For example, you can run this shell script from the root directory of the project as follows:

scripts/load_pgsql_db.sh . tests/test0001 pgsql_options.txt

to load the test plan named test0001 into the database.

A sample pgsql_options.txt would contain the following records:

DBNAME,jobshop
DBHOSTNAME,127.0.0.1
DBUSERNAME,jobshop

The script will prompt you for the password corresponding to the database details specified above and will authenticate the database user and load the csv flat files into the database. It is possible to configure postgresql with a .pgpass file in your home directory to avoid having to enter a password everytime. Please refer to the documentation of postgresql for this information.