GitHub - jhwinters/extractor: A utility to migrate a school's data from SchoolBase to iSAMS

jhwinters / extractor Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

A utility to migrate a school's data from SchoolBase to iSAMS

GPL-2.0, Unknown licenses found

Licenses found

1 star 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
code		code
config		config
sql		sql
.gitignore		.gitignore
.ruby-gemset		.ruby-gemset
.ruby-version		.ruby-version
COPYING		COPYING
ISSUES		ISSUES
LICENCE		LICENCE
README		README
diagram.png		diagram.png

Repository files navigation

Introduction
------------

This is a small utility intended for use in extracting a school's
historical data from SchoolBase, ready for loading into iSAMS.

It was written for one particular school and was not intended as
a general-purpose product. However, that school has given permission
for it to be released under the GPL, and it should greatly reduce
the amount of work needed to be done by other schools in the future.

It dumps some of the existing database tables in SchoolBase, then
re-structures the data as requested by iSAMS.

Because of the unfortunate choice of Excel files as the input
mechanism required by iSAMS, it is not possible fully to automate
the preparation. (Or at least, I couldn't find a way to do it, and
iSAMS were unable to make any useful suggestions. Others with more
familiarity with writing to Excel files may be able to suggest a
way.)

This utility writes CSV files which can then be manually injected
into iSAMS's Excel templates - one CSV file per sheet in a workbook.
As you do this, make sure you choose the right encoding (UTF-8) for
each CSV files. Excel will try to guess the encoding, and whilst
its guess varies, in my experience it absolutely always got it wrong.

There are two further problems with this process, around which you
will have to negotiate with iSAMS.

1) Some of the fields specified in the iSAMS template are too short.
1000 characters is not enough to hold the text of a report, and
various other things.

2) There is a bug in Excel which means it fails to handle correctly
fields containing an embedded line feed, even though the field is
correctly quoted in the CSV file. Two different approaches have
been taken to get around this.

For some of the affected text fields, this utility simply strips out
the line feed characters. The text comes across, but the formatting
into separate paragraphs is lost.

Where this was not acceptable (e.g. for academic reports) the utility
writes the fields unchanged and I did eventually find a workaround
which let me get them into Excel. Open the CSV file in LibreOffice,
being careful to specify the correct encoding (UTF-8). The CSV importer
in LibreOffice works correctly and preserves the line feeds. Then
save the new sheet in XLS or XLSX format, open it in Excel, and cut
and paste the whole sheet into your Excel workbook.

For some odd reason, Excel runs *incredibly* slowly - several minutes
to open a file - when faced with a spreadsheet which has embedded line
feeds in a file, but it does at least get there eventually.

Getting started
---------------

You will need a working copy of Ruby on your computer. This is relatively
straightforward on Linux or MacOS, and not too difficult on Windows. I would
suggest the use of RVM to get things set up quickly. The version of Ruby
used for our extraction was 2.1.10, but anything later should be fine too.

Then you need a few Gems installed. Look at code/extractor.rb for a
definitive list. The two obvious ones are:

* charlock_holmes - used because our SB d/b at least had text within
it in various encodings. The utility irons this out
and makes sure everything is in the same encoding,
UTF-8. If you want to switch to something else, this
is probably best done at the stage of writing the
final CSV files. You can specify it when opening the
files.

* tiny_tds For access to the SB database.

Create a user ID on your SB database which has only read access and edit
code/extractor.rb to use that user name. The password is entered
directly into the utility when it is run.

You also need some extra directories apart
from those created when you checked it out of git. The relevant
directories are not stored in git because they contain school-specific
confidential information.

In parallel to the code directory you need:

csv
output
yml

The csv directory is used to store the tables dumped from SchoolBase.
These are then processed to create the tables required by iSAMS, which
are written to the output directory.

The extractor works in two phases so that development work can be done
without needing immediate access to the SB database. You do:

./extractor.rb --extract

to dump the SB tables to intermediate CSV files, and then:

./extractor.rb --generate

to do all the conversion work. You can also do:

./extractor.rb --extract --generate

to do the whole thing in one go.

Type

./extractor.rb --help

to get information on other options, most of which relate to particular
little tweaks which we needed.

I would suggest setting up your database user, and then simply running
the utility and studying what comes out. It will be structurally
correct from the point of view of the iSAMS templates, but will
need adjusting to suit your particular school.

You will want to tweak the output to suit your particular requirements.
For instance, the code will do a pretty good job of taking UK addresses
from the SchoolBase address fields and breaking them up appropriately for
the iSAMS fields, but presumably you will need to handle addresses for
whatever country you're dealing with. Likewise, things like particular
rewards and sanctions which you have set up in SchoolBase will need to be
mapped to how you want them in iSAMS.

If there is an area you have decided not to transfer (e.g. medical records) you can simply ignore the corresponding output files.