-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any way to manage execution order of schema files? #20
Comments
The execution order is determined by the file name. In my own projects I prefixed all files with a number: The retry logic you're describing would result in very long runtimes. If any file contains a unrecoverable error you would need to try out all possible combinations. In the screenshot above you can see that I have 14 SQL files. That already up to 87,178,291,200 possible combinations (14 factorial). Trying those out isn't feasible at all. When writing the first version of Tusker I had the following idea in mind. Annotations that describe the dependencies of files: --> depends: a.sql, b.sql
CREATE TABLE c (
-- ...
); The algorithm for this kind of dependency resolution isn't hard to write and reading that annotation from migration files isn't either. I just didn't feel the need for this kind of feature, yet. In the case of circular dependencies between |
Actually, it is not as sad as you described. We just need to think on statement level, no files. Imagine 100 files, 10 of them in their statements have references to other files, let's say tables for simplicity. So the algorithm is:
After a while queue length will go to zero (decreasing on success, increasing on failures). So this algorithm has convergence (on solvable problems). Of course we will be needing some kind of check for circular dependency, which is quite easy to implement: if number of fails in a row is bigger that queue length, it's time to break (unsolvable). It's kinda brute force, but it works in practice. I already have a draft of such implementation, can share it with you. It's not crazy big, but still big change for the project. Let me know if you want to keep it simple. BTW you may be wondering what kind of problem do I solve? My folder structure looks like:
Every SQL-file is named like SQL-object it creates. Every file can have statements which are dependent on other file(s). Such structure is quite handy to manage. But is't very hard to use with tools other than SQLPackage. It handles DB projects of Visual Studio in a very ellegant way. Tusker with Postgres is the only hope for me these days, something similar to SQLPackage for MS SQL.. |
It's true that it's not a It is however possible to construct some schema where It's arguably bad style but I imagine there are other statements that could cause subtle changes to the schema depending on the execution order. Don't get me wrong. I like the idea. I'm just not 100% convinced yet that it's actually a safe thing to do. The configuration that enables this behavior needs to be marked as "dangerous" at least. |
I really like this feature, but I wouldn't make it something that happens automatically every time you call |
absolutely sure. What do you think about implementing this as alternative way of doing diff, controlled by ... let's say an option in [schema] block of TOML file? Just for information - reordering files was the very first idea for me. It's nice and easy from the beginning, but goes harder to implement more you think of it. Working with statements makes life much easier. I remember old days working with MS SQL without SQLPackage, back then we ended with statements too. In MS SQL they are divided by "GO" keyword, not semicolon token like in Postgres. So we just needed to split input file by "go"-statements, execute code parts one by one, rollback in case of errors. The same approach works fine in tusker by far. It's a bit slower because you need to commit transaction after every statement (or just reopen cursor). For 6k+ statements (273 files) it takes about 20 seconds to get diff. Not great not terrible. |
How about we treat each file as one execution unit that must succeed as a whole. That limits the amount of transactions and reorderings and doesn't require to parse the PostrgeSQL grammar. Just splitting by Executing files a whole also eliminates the need to track the individual statements in the files. We can just create an order in which the files need to be executed which could be as simple as a text file containing the names of the schema files in the order they need to be executed. This file should also contain a checksum of the files so the code can detect changes and let the user know that a reordering might be necessary or required if the list doesn't match the current files on disk. For circular FKs and more complex things the user is forced to create separate files moving those statements into their own execution unit. But that's not necessarily a bad thing as it makes those relationships very obvious and make the execution of migrations less magical. |
Can be unsolvable for the file structure I use. Imagine a file which creates table, adds some comments, constraints and foreign keys. Tables these foreigns keys reference can be non existent at the moment.
Yeah, such things should involve language parsers to be 100% percent correct. I propose well working approximation by now.
This is step back to tools like Liquibase (which does tracking of executed migrations). The power of tusker how I see it is in doing everything automatically (almost magically). You know, if it is too breaking for now, I can hard-fork tusker. Anyway if you interested I'll share implementation. What do you think about it? |
That's exactly what I meant by "create separate files moving those statements into their own execution unit". I currently see little value in that very error prone extra work of implementing a parser that is capable of parsing SQL files understood by PostgreSQL. With those circular FKs you can't create the FK constraint as part of the
If you really want a single file solution may I propose a special comment that is very hard to include by accident. e.g. something like If you implement that automatic dependency resolver I'd be happy to use it myself and ship it as part of the next Tusker release. |
I will, but on statement level :) So for now it's better for me to do hardfork. I'll share the implementation anyway. Thank you for the conversation! |
Just wanted to shed some light here. I've only ever used SQL Server, and despite dealing with all the visual studio and microsoft cruft, they have this really nice set of tools called SQL Server Data Tools (SSDT). One of the major benefits of SSDT is you can just declaratively describe your database schema in sql scripts, and SSDT will look at those scripts, look at a target database, and generate a script (and optionally run the script) to get your database in line with your desired schema. And SSDT handles all of the order of execution of schema scripts. I was looking for something similar for Postgres, and I stumbled on Tusker (and other similar migra-based tools, but I think Tusker would be my preferred option at this point). I was kind of bummed to realize that with Tusker I would have to manage the order of schema scripts. But I get it. If I understand it correctly, it seems to be a limitation of splitting the schema unto multiple scripts and however migra works. I had really simple example that I was working with to evaluate tools: a users table and a cars table with a foreign key constraint on users. Splitting these into users.sql and cars.sql, cars.sql ran first and complained that users didn't exist. Appending 1 onto the beginning of users.sql fixed the issue. I assume the error is happening when building the "clean" or "canonical" database that's used to diff with the target database. That being said, I feel like a per-file solution should be pretty straightforward to implement and would be a pretty good first step. Like, maybe a per-statement solution is the 100% correct thing to do, but a per-file solution would be much less effort and would offer a big improvement in terms of managing schema files. I wouldn't mind taking a stab at it sometime in the next few weeks. Rather than being a separate command like As a next step, one could also imagine caching the resolved dependencies so that future runs would be quicker. |
Personally, I think having one file as one execution unit is a good compromise. Annotating priority instead of dependencies would also simplify lots of potential issues. It's sort of a standard migration except it reads the annotated priority instead of the file name, allowing you to use whatever file/folder structure you want. But I also think this is probably better implemented as a separate package/software. |
I have bunch of SQL files named exactly as the tables they create. Some of these tables have foreign keys pointing to "other tables". Tusker sorts schema files, so in some moment "other tables" simply don't exist. Is there any way to specify exectuion order?
If no, I can implement some kind of clever execution logic: split schema files into statements, execute one by one, remember failed and retry later. What do you think about this idea?
The text was updated successfully, but these errors were encountered: