Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: add an option to park the rpmdb #2219

Open
cgwalters opened this issue Oct 4, 2022 · 9 comments
Open

RFE: add an option to park the rpmdb #2219

cgwalters opened this issue Oct 4, 2022 · 9 comments
Labels

Comments

@cgwalters
Copy link
Contributor

This is a followup to 71456f2
which is about generating bit-for-bit reproducible images (container/disk) that include an RPM database.

At the time, the person working on that PR was looking at RHEL8 (BDB format rpm database). We've since moved on to sqlite, which is nicer.

Except in my testing, this rpmdb.sqlite-shm is the very last thing blocking reproducible rpm-ostree container images. I hacked up coreos/rpm-ostree#4074
but this issue is to track fixing this more properly inside librpm.

I think what would be the best is if this file could be deleted entirely at the end of a build. It seems like rpm (or sqlite) will auto-recreate it in that case, but what blocks us from doing that is that things fail if the mount is read-only.

@pmatilai
Copy link
Member

pmatilai commented Oct 5, 2022

The existence of .sqlite-shm is required for read-only WAL mode to work at all (a very important use-case being queries by regular users), see https://www.sqlite.org/wal.html#read_only_database

@pmatilai
Copy link
Member

pmatilai commented Oct 5, 2022

Hmm. Actually doing the WAL docs suggested PRAGMA journal_mode=DELETE on the database results in a database that can still be opened by rpm, regular users and all. Rpm will try to re-establish WAL if opened in read-write mode, so nothing too terrible should happen if you do the journal delete as a part of the image building. Except voiding the warranty 😉

It'd be nicer of course if rpm had a supported procedure to "park" databases for this kind of thing. --rebuilddb with some special flag maybe.

@champtar
Copy link

rpmdb reproducibility is also an issue when building (EL9) containers, it would be nice to have a park option indeed.

@DemiMarie
Copy link
Contributor

It'd be nicer of course if rpm had a supported procedure to "park" databases for this kind of thing. --rebuilddb with some special flag maybe.

--rebuilddb is much heavier than just a single SQL command. Perhaps --parkdb, along with a corresponding C API function?

@cgwalters
Copy link
Contributor Author

The existence of .sqlite-shm is required for read-only WAL mode to work at all (a very important use-case being queries by regular users), see https://www.sqlite.org/wal.html#read_only_database

I find this weird - because unprivileged code can't write directly to the database, what practical use is the shared memory?

Hmm, I just found https://www.sqlite.org/uri.html#uriimmutable - seems like rpm may be able to use that when it detects it can't write?

facebook-github-bot pushed a commit to facebookincubator/antlir that referenced this issue Mar 10, 2023
Summary:
- User reports CM fails to publish in https://fb.workplace.com/groups/antlirusers/permalink/1659896164442169/

- Reproduced error with `buck2 run fbcode//security/rceservice/tw:tw.rceservice.xpdf=publish`
```
tupperware.cm.antlir.cm_util.CMAssertionError: Replay differs from original: [b'usr/lib/sysimage/rpm/rpmdb.sqlite', b'usr/lib/sysimage/rpm/rpmdb.sqlite-shm'] (please post to https://fburl.com/cmhelp with this trace)
```

- Identified this is our first non-custom image CM being built with rpms on C9.

- Determined this a C9 only path
```
[[email protected] /data/users/naveedgol/fbsource/fbcode (282e6e8d6)]$ buck2 run fbcode//tupperware/image/base:base.c8=container
[14:15:26 nobody@/// /]$ ls usr/lib/sysimage/rpm/rpmdb.sqlite
ls: cannot access 'usr/lib/sysimage/rpm/rpmdb.sqlite': No such file or directory
[14:15:30 nobody@/// /]$ exit
logout
[[email protected] /data/users/naveedgol/fbsource/fbcode (282e6e8d6)]$ buck2 run fbcode//tupperware/image/base:base.c9=container
[14:15:44 nobody@/// /]$ ls usr/lib/sysimage/rpm/rpmdb.sqlite
rpmdb.sqlite      rpmdb.sqlite-shm  rpmdb.sqlite-wal
```

- Found this is a known problem rpm-software-management/rpm#2219
- Allowlisted the affected paths

Test Plan:
Successfully published below C9 CM with this diff which failed to publish on master

```
tw.container(
    name = "tw.rpm_c9",
    features = [
        tw.image.rpms_install([
            "rsync",
        ]),
    ],
    flavor = "centos9",
    oncall = "test_naveedgol",
)
```

Published customer's CM successfully with `buck2 run fbcode//security/rceservice/tw:tw.rceservice.xpdf=publish`, see https://www.internalfb.com/fbpkg/container_manifest/tw.rceservice.xpdf/rAdzb9-nmIozq2Xxz67-3J594BRJgSUtoWUe2uOgfFE

Reviewed By: epilatow

Differential Revision: D43959737

fbshipit-source-id: 03aff70ac17a3b874786b6d21dff3d11ac465ff4
@pmatilai pmatilai self-assigned this Oct 19, 2023
@pmatilai pmatilai added the RFE label Oct 19, 2023
@pmatilai
Copy link
Member

pmatilai commented Oct 19, 2023

As per https://www.sqlite.org/wal.html#read_only_databases:

Even though it is possible to open a read-only WAL-mode database, it is good practice to converted to PRAGMA journal_mode=DELETE prior to burning an SQLite database image onto read-only media.

We want to follow good practises so...

Any operation should do a database rebuild to free up any fluff in the files, so I think this should be either a default thing in rebuilddb or an extra flag to it if default is not feasible for technical reasons.

@pmatilai pmatilai added this to RPM Oct 25, 2023
@github-project-automation github-project-automation bot moved this to Backlog in RPM Oct 25, 2023
@pmatilai pmatilai removed their assignment Oct 25, 2023
@pmatilai pmatilai changed the title unreproducible rpmdb.sqlite-shm RFE: add an option to park the rpmdb Oct 25, 2023
@pmatilai pmatilai removed the triaged label Nov 8, 2023
@cgwalters
Copy link
Contributor Author

@ffesti the issue I mentioned at Devconf.cz

That said...now that I'm digging in again to reproducible builds, containers/buildah#5592 is higher on my radar, but this is probably in the top 5.

@cgwalters
Copy link
Contributor Author

so I think this should be either a default thing in rebuilddb

On this topic, it would be a nontrivial hit to the ergonomics of default container builds if we really start telling people to do:

RUN dnf -y install foo && dnf clean all && rpm --fix-database

So probably what should happen is dnf should detect it's in a container and do this by default as part of dnf clean all most likely.

@cgwalters
Copy link
Contributor Author

cgwalters commented Aug 21, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

No branches or pull requests

4 participants