Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix creating message_fts table during database migration. #13809

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bepaald
Copy link

@bepaald bepaald commented Nov 21, 2024

First time contributor checklist

Contributor checklist

  • ZTE ZMax Pro (Z981), Android 6.0.1
  • Custom software
  • My contribution is fully baked and ready to be merged as is
  • I ensure that all the open issues my contribution fixes are mentioned in the commit message of my first commit using the Fixes #1234 syntax

Description

In database migration 175, the full text search table is dropped and recreated with a new name (mms -> message). First the old table and triggers are dropped:

db.execSQL("DROP TABLE mms_fts")
db.execSQL("DROP TRIGGER IF EXISTS mms_ai")
db.execSQL("DROP TRIGGER IF EXISTS mms_ad")
db.execSQL("DROP TRIGGER IF EXISTS mms_au")

Then the new one is created:

db.execSQL("CREATE VIRTUAL TABLE message_fts USING fts5(body, thread_id UNINDEXED, content=message, content_rowid=_id)")

Because no data is actually migrated from the old to the new search table (the old one doesn't even exist anymore), the new one is empty. However, the content table is (likely) not: content=message. This leaves the new table in an inconsistent state.

After this particular migration, any SQL statement that triggers one of the TRIGGERs of the message_fts table will result in a corrupted database (database disk image is malformed). This can happen while migrating to 191, which is issue #13034, and while migrating to 229, which is issue #13506. Possibly other migrations are also affected.

Besides rebuilding the search table, the old data could also be transferred over. But because:

  1. There is no performance difference. In the quick testing I did, rebuilding was slightly faster (but definitely within noise).
  2. Inspecting the schema of the mms_fts table before the 175 migration, it shows content=mms. While at this point, the mms table was already dropped. Without knowing with certainty the database use between the v174 and v175 migration, I would not dare assert that the mms_fts table is fully consistent at this point.

I went with the rebuild-option. It is also a smaller diff.

Simple demonstration of v175-migration corrupting a database (click to show)

This code can be executed directly in a sqlite shell.

$ sqlite3 
SQLite version 3.46.1 2024-08-13 09:16:08
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> 
sqlite> 
sqlite> # just setting up the initial situation: the message-table and the fts-table+triggers
sqlite> # these statements are taken right from the schema (just removed a bunch of message-columns)
sqlite> CREATE TABLE IF NOT EXISTS "message" (_id INTEGER PRIMARY KEY AUTOINCREMENT, body TEXT, thread_id INTEGER NOT NULL);
sqlite> CREATE VIRTUAL TABLE mms_fts USING fts5(body, thread_id UNINDEXED, content=message, content_rowid=_id);
sqlite> CREATE TRIGGER mms_ad AFTER DELETE ON "message" BEGIN INSERT INTO mms_fts(mms_fts, rowid, body, thread_id) VALUES('delete', old._id, old.body, old.thread_id); END;
sqlite> CREATE TRIGGER mms_au AFTER UPDATE ON "message" BEGIN INSERT INTO mms_fts(mms_fts, rowid, body, thread_id) VALUES('delete', old._id, old.body, old.thread_id); INSERT INTO mms_fts(rowid, body, thread_id) VALUES (new._id, new.body, new.thread_id); END;
sqlite> CREATE TRIGGER mms_ai AFTER INSERT ON "message" BEGIN INSERT INTO mms_fts (rowid, body, thread_id) VALUES (new._id, new.body, new.thread_id); END;
sqlite> 
sqlite> 
sqlite> # insert some data in the message table
sqlite> INSERT INTO message(body, thread_id) VALUES ('something', 1), ('something', 3), ('something', 8);
sqlite> 
sqlite> 
sqlite> # everything is still fine now, but here comes the 175-migration:
sqlite> DROP TABLE mms_fts;
sqlite> DROP TRIGGER IF EXISTS mms_ai;
sqlite> DROP TRIGGER IF EXISTS mms_ad;
sqlite> DROP TRIGGER IF EXISTS mms_au;
sqlite> CREATE VIRTUAL TABLE message_fts USING fts5(body, thread_id UNINDEXED, content=message, content_rowid=_id);
sqlite> CREATE TRIGGER message_ad AFTER DELETE ON message BEGIN INSERT INTO message_fts(message_fts, rowid, body, thread_id) VALUES ('delete', old._id, old.body, old.thread_id); END;
sqlite> CREATE TRIGGER message_au AFTER UPDATE ON message BEGIN INSERT INTO message_fts(message_fts, rowid, body, thread_id) VALUES('delete', old._id, old.body, old.thread_id); INSERT INTO message_fts(rowid, body, thread_id) VALUES (new._id, new.body, new.thread_id); END;
sqlite> CREATE TRIGGER message_ai AFTER INSERT ON message BEGIN INSERT INTO message_fts(rowid, body, thread_id) VALUES (new._id, new.body, new.thread_id); END;
sqlite> 
sqlite> 
sqlite> # now things are broken:
sqlite> DELETE FROM message WHERE _id IS (SELECT min(_id) FROM message);
Runtime error: database disk image is malformed (11)
sqlite>

If a rebuild is performed right after the CREATE VIRTUAL TABLE statement, the error disappears.

Examples of migration before and after patch (click to show)

A database with a doubled message (wrt recipient, thread, and timestamp), at version 170. This is #13034. Excerpt from the debug log before the patch:

11-21 08:19:51.751  9670  9757 I SignalDatabaseMigration: Successfully completed migration for version 188 in 96 ms
11-21 08:19:51.751  9670  9757 I SignalDatabaseMigration: Running migration for version 189: V189_CreateCallLinkTableColumnsAndRebuildFKReference. Foreign keys: false
11-21 08:19:51.791  9670  9757 I SignalDatabaseMigration: Successfully completed migration for version 189 in 40 ms
11-21 08:19:51.791  9670  9757 I SignalDatabaseMigration: Running migration for version 190: V190_UniqueMessageMigration. Foreign keys: false
11-21 08:19:51.791  9670  9757 I SignalDatabaseMigration: Successfully completed migration for version 190 in 1 ms
11-21 08:19:51.791  9670  9757 I SignalDatabaseMigration: Running migration for version 191: V191_UniqueMessageMigrationV2. Foreign keys: false
11-21 08:19:51.801  9670  9757 E SQLiteLog: (267) statement aborts at 10: [WITH needs_delete AS (
11-21 08:19:51.801  9670  9757 E SQLiteLog:         SELECT
11-21 08:19:51.801  9670  9757 E SQLiteLog:           _id
11-21 08:19:51.801  9670  9757 E SQLiteLog:         FROM
11-21 08:19:51.801  9670  9757 E SQLiteLog:           message M
11-21 08:19:51.801  9670  9757 E SQLiteLog:         WHERE
11-21 08:19:51.801  9670  9757 E SQLiteLog:           _id > (
11-21 08:19:51.801  9670  9757 E SQLiteLog:             SELECT
11-21 08:19:51.801  9670  9757 E SQLiteLog:               min(_id)
11-21 08:19:51.801  9670  9757 E SQLiteLog:             FROM
11-21 08:19:51.801  9670  9757 E SQLiteLog:         
11-21 08:19:51.811  9670  9757 I SQLiteCursor: Set CursorWindow allocation size to 8389120
11-21 08:19:51.841  9670  9757 I SQLiteCursor: Set CursorWindow allocation size to 8389120
11-21 08:19:51.851  9670  9757 E SqlCipherErrorHandler: Database 'signal.db' corrupted!
11-21 08:19:51.851  9670  9757 E SqlCipherErrorHandler: [sqlite] FullCode: 267 | ErrorCode: 11 | ExtendedErrorCode: 1 | Message: database disk image is malformed | ExtraMessage: null

After the patch:

11-21 08:26:26.691 10574 10714 I SignalDatabaseMigration: Successfully completed migration for version 188 in 90 ms
11-21 08:26:26.691 10574 10714 I SignalDatabaseMigration: Running migration for version 189: V189_CreateCallLinkTableColumnsAndRebuildFKReference. Foreign keys: false
11-21 08:26:26.731 10574 10714 I SignalDatabaseMigration: Successfully completed migration for version 189 in 40 ms
11-21 08:26:26.731 10574 10714 I SignalDatabaseMigration: Running migration for version 190: V190_UniqueMessageMigration. Foreign keys: false
11-21 08:26:26.731 10574 10714 I SignalDatabaseMigration: Successfully completed migration for version 190 in 0 ms
11-21 08:26:26.731 10574 10714 I SignalDatabaseMigration: Running migration for version 191: V191_UniqueMessageMigrationV2. Foreign keys: false
11-21 08:26:26.731 10574 10714 I SQLiteCursor: Set CursorWindow allocation size to 8389120
11-21 08:26:26.741 10574 10714 I SQLiteCursor: Set CursorWindow allocation size to 8389120
11-21 08:26:26.741 10574 10714 D V191_UniqueMessageMigra: [migration] fix-timers-errors: 2, dedupe: 4, dupe-purge: 2, fk-deletes: 1, index: 1, fk-check: 1, total: 11
11-21 08:26:26.741 10574 10714 I SignalDatabaseMigration: Successfully completed migration for version 191 in 13 ms
11-21 08:26:26.741 10574 10714 I SignalDatabaseMigration: Running migration for version 192: V192_CallLinkTableNullableRootKeys. Foreign keys: false
11-21 08:26:26.741 10574 10714 I SQLiteCursor: Set CursorWindow allocation size to 8389120
11-21 08:26:26.741 10574 10714 I SignalDatabaseMigration: Successfully completed migration for version 192 in 4 ms
11-21 08:26:26.741 10574 10714 I SignalDatabaseMigration: Running migration for version 193: V193_BackCallLinksWithRecipient. Foreign keys: false
11-21 08:26:26.741 10574 10714 I SignalDatabaseMigration: Successfully completed migration for version 193 in 0 ms
[etc...]

A database with missed call messages with notified=0. This is #13506. Before the patch:

11-21 08:21:01.931  9887  9975 I SignalDatabaseMigration: Successfully completed migration for version 227 in 10 ms
11-21 08:21:01.931  9887  9975 I SignalDatabaseMigration: Running migration for version 228: V228_AddNameCollisionTables. Foreign keys: false
11-21 08:21:01.931  9887  9975 I SignalDatabaseMigration: Successfully completed migration for version 228 in 5 ms
11-21 08:21:01.931  9887  9975 I SignalDatabaseMigration: Running migration for version 229: V229_MarkMissedCallEventsNotified. Foreign keys: false
11-21 08:21:01.941  9887  9975 E SQLiteLog: (267) statement aborts at 10: [UPDATE message
11-21 08:21:01.941  9887  9975 E SQLiteLog: SET notified = 1
11-21 08:21:01.941  9887  9975 E SQLiteLog: WHERE (type = 3) OR (type = 8)] database disk image is malformed
11-21 08:21:01.941  9887  9975 I SQLiteCursor: Set CursorWindow allocation size to 8389120
11-21 08:21:01.961  9887  9975 I SQLiteCursor: Set CursorWindow allocation size to 8389120
11-21 08:21:01.981  9887  9975 E SqlCipherErrorHandler: Database 'signal.db' corrupted!
11-21 08:21:01.981  9887  9975 E SqlCipherErrorHandler: [sqlite] FullCode: 267 | ErrorCode: 11 | ExtendedErrorCode: 1 | Message: database disk image is malformed | ExtraMessage: null

After the patch:

11-21 08:27:47.281 10887 11001 I SignalDatabaseMigration: Running migration for version 228: V228_AddNameCollisionTables. Foreign keys: false
11-21 08:27:47.291 10887 11001 I SignalDatabaseMigration: Successfully completed migration for version 228 in 5 ms
11-21 08:27:47.291 10887 11001 I SignalDatabaseMigration: Running migration for version 229: V229_MarkMissedCallEventsNotified. Foreign keys: false
11-21 08:27:47.291 10887 11001 I SignalDatabaseMigration: Successfully completed migration for version 229 in 5 ms
11-21 08:27:47.291 10887 11001 I SignalDatabaseMigration: Running migration for version 230: V230_UnreadCountIndices. Foreign keys: false
11-21 08:27:47.301 10887 11001 I SignalDatabaseMigration: Successfully completed migration for version 230 in 2 ms
11-21 08:27:47.301 10887 11001 I SignalDatabaseMigration: Running migration for version 231: V231_ArchiveThumbnailColumns. Foreign keys: false
11-21 08:27:47.321 10887 11001 I SignalDatabaseMigration: Successfully completed migration for version 231 in 28 ms
[etc...]

Notes:

  • While I only mention #13034 and #13506 in this PR, it is likely more issues exist that are solved by this. I did not do an exhaustive search. A few more likely candidates have dead debuglog-links, so I could not verify.
  • The mistake made in the table creation is more or less verbatim from the caveats section of SQLite's FTS documentation:
    https://sqlite.org/fts5.html#external_content_table_pitfalls
  • I believe this patch will fix most problems when migrating older database versions, both when restoring a backup and when updating a (very) old live Signal installation on a phone. However, in #13034 at least one person has already done an app update (see here 13034#issuecomment) and is running into the database corruption in the 191-migration. I assume they are now stuck at v190, so a fix in the 175-migration is not going to help them. I have my thoughts on possible fixes, if needed.

Please let me know if any edits to this PR are required, I'd be happy to make any changes you need.

Thanks!

@greyson-signal
Copy link
Contributor

Hmm, according to the FTS5 documentation, the rebuild command doesn't work with contentless tables, which is what we have: https://sqlite.org/fts5.html#the_rebuild_command

@bepaald
Copy link
Author

bepaald commented Nov 22, 2024

Hmm, according to the FTS5 documentation, the rebuild command doesn't work with contentless tables, which is what we have: https://sqlite.org/fts5.html#the_rebuild_command

No you don't? You have an external content table content=message (see also line 24 I already linked in my previous post).

This code is quite thoroughly tested by the way.

Thanks!

@greyson-signal
Copy link
Contributor

That's my bad, a misread of "external content" vs "contentless" in the docs.

It's worth noting that the reason these migrations have no rebuild is that we have a separate job that does it. However, when that job was written, it wasn't known how brittle SQLite's FTS stuff was, and it created this gap where things could go wrong.

I don't like changing old migrations, but in this case this probably makes sense for the few users who are stuck back here. I'll get this into 7.27. Thank you!

@bepaald
Copy link
Author

bepaald commented Nov 22, 2024

Great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants