Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Game updating - The Epic Epic #2357

Open
26 tasks
halgari opened this issue Dec 10, 2024 · 6 comments
Open
26 tasks

Game updating - The Epic Epic #2357

halgari opened this issue Dec 10, 2024 · 6 comments

Comments

@halgari
Copy link
Collaborator

halgari commented Dec 10, 2024

Game File Hashing and Integration with loadouts

General Requirements

Equating files

In order to detect the state of a loadout relative to an "unaltered" source-of-truth for game files
we need some way to determine what the "base" game state is, we can do this with hashing
files, but we need an authoritative state of files.

Human Friendly names

In addition, users wish to refer to their game versions by a human-friendly name. Users may say
"I have Phantom Liberty" or "I have version 1.3 of the game". But on the backend, each store (Steam, GOG, etc)
refer to these files by a hash. We need a way to resolve these human-friendly names to a collection of hashes.

Game updates

When a store updates a game, it will overwrite files in the user's game directory. At that point we may
be able to ask the store (via Game Finder) what files it last put into the game folder, but we need a way
to determine which of the files in the folders are updates, and which were modified by the user.

Hash Equality

Most stores use a cryptographic hash to identify files. For performance reasons we use xxHash3. So we
need a way to swap between hash values. Thus, we'll need some sort of global database that relates all the possible
hash types for files we are aware of.

The hashes we will likely need are:

  • xxHash3
  • Minimal Hash (see below)
  • MD5
  • SHA1
  • CRC32

Using these hashes, should be enough for us to swap between hash types. If we have one of these
hashes we can look up the "row" in the hash database to get the other hashes.

Suggested Implementation

In order to faciliate the above requirements, we will need to structure our code in the following way:

Index game files from the stores

We will need to go to each store we want to support, and index the files for the games we want to support. This is fairly
easy for GoG and Steam, but Epic and others may require a more involved process. We can get the information we need
from most stores without downloading the files, but the hashes from these stores will be in a cryptographic hash. In order to
build the global hash database, we will need to download the files and generate the other hashes.

It is suggested that these files be stored in a way that duplicate hashes can be deduplicated. For example, if we have a specific MD5 from one
version of the game, and the same MD5 from another version of the game, we do not need two separate entries in the database.

Depot indexing

For future reference and linking to the store, we should record what files we got from which depot/store id. In order to keep this information as
lossless as possible, it is recommended that we store each game's data in its own format. We shouldn't try to fit data from multiple stores into a
single logical model; if Steam calls them "depot" and "manifest", store them as those names, and merge the results at read-time.

Version mapping

We will then need to manually link the depot/manifest data into human-friendly names. This process is likely a bit tricky as we would need
to account for DLC and provide some sort of sanity to this information. For some stores (Steam) this information isn't stored in the API and may need to be maintained by hand. However this process is fairly simple for Steam. For example for Cyberpunk there is a 1:1 mapping of manifest Ids and game versions. For other stores like GoG it may be possible to determine this information programatically.

Overall structure

We will store each record for the data as a separate file in git, categorized by various criteria. The reason for "one record per file"
is to avoid merge conflicts and accidental merge issues. The suggested structure in git is:

hashes/
   AA/
     {xxHash3}.json
      - xxHash3: {xxHash3}
      - Minimal: {Minimal}
      - Md5: {MD5}
      - Sha1: {SHA1}
      - Crc32: {CRC32}
stores/
    Steam/
      {manifest_id}.json
        - ManifestId: {manifest_id}
        - DepotId: {depot_id}
        - AppId: {app_id}
        - Files: [ {xxHash3_1}, {xxHash3_2}, ... ]
    GOG/
        {gog_id}.json
            - Other GoG specific data
            - GogId: {gog_id}
            - Files: [ {xxHash3_1}, {xxHash3_2}, ... ]
games/
    {Nexus Game Id}/
        {version}.json
            - Version: {version}
            - Steam
              - ManifestIds: [ {manifest_id_1}, {manifest_id_2}, ... ]
            - GOG
              - GogIds: [ {gog_id_1}, {gog_id_2}, ... ]
            - DLC
              - {DLC Name}
                - Steam
                  - ManifestIds: [ {manifest_id_1}, {manifest_id_2}, ... ]
                - GOG
                  - GogIds: [ {gog_id_1}, {gog_id_2}, ... ]  

In the above example, the hashes and store folders are progamatically generated, while games is maintained by hand.

Compilation of Data

Whenever the git repository is updated, we will need to compile the data into a single database.
There are many formats we could use for this. Originally a suggestion was to zip the files into a .zip or .nx file,
but the number of files and the unsorted nature of archive TOC entries makes finding a specific file a
O(n) operation. Instead, we could use a SQLite database, MnemonicDB, a custom binary format, or perhaps some "read only"
database like MasterMemory. A decision on this can be made later
and is not critical to the design.

Usage

Based on the above structure we can easily perform any number of queries, and streamline parts of the application. For example:

Creating Loadouts

Now we have an authoritative source of files for a game, so when creating a loadout we can look at what hashes we have archived and on disk,
compare those to the hashes in the database, and the data in the game store. Based on this information we can provide users with a dropdown of
all game versions we can support. If the user has the files for 1.6, we can show 1.6 in the dropdown. Internally however we won't store 1.6,
as the game "version", instead we will store the manifest ids.

Separation of DLC

Since we know what manifest ids are associated with game versions and DLC we can split out specific files from the game and put them
into a separate loadout group. This will allow us to show Skyrim, Skyrim - Dawnguard, Skyrim - Hearthfire, and Skyrim - Dragonborn as separate
items in the loadout. This is mostly for organizational purposes, but will allow users to easily see what files are associated with what DLC.

Game Updates

When ingesting changes, we can detect if the files being updated are in the hash database. If they appear to have changed from one valid game hash to another,
and we see that the store has changed the manifests it has installed into the game, we can assume that the game has been updated, and apply these changes
to the game files in the loadout.

Naturally this means that we need to get hashes up as soon as possible after a new release, but if we move any non-matching files into the Override group, we can
later move them into the game group if we later find a matching manifest id.

Minimal Hash

The minimal hash format assumes that two files will have the same contents if they have the same size, exist on the same path
and generally match the same content. The algorithm for this hash is as follows:

  • If the file is less than or equal to 128KB in size, simply hash the file via xxHash3
  • Create a buffer of the first 64KB, the last 64KB and the middle 64KB (in that order)
  • Add to the size of the file to the buffer (unsigned 64-bit integer)
  • Compute the xxHash3 of the buffer. This is the minimal hash

There is some overlap in the middle if the file is below a certain size, this is expected. In general,
this means that only 192KB of the file is read, instead of the entire file. In games such as Cyberpunk this may
stop the app from reading the entire 100GB of the game files.

Implementation steps

  • Create a new repository for the hash database
  • Create a indexer for Steam
    • Given a PackageId, get the manifest ids and file hashes (md5)
    • Given manifest ids, download the files and generate the other hashes
  • Create a indexer for GOG
    • Given a GameId (or whatever this is called in GoG), get the file hashes (md5)
    • Given file hashes, download the files and generate the other hashes
  • Index the most recent version of each game we support. If possible, index all previous versions
    • Stardew Valley
    • Cyberpunk 2077
    • Baldur's Gate 3
    • Bannerlord
  • Link store data to human-friendly names
  • Think a bit about if it is possible to run the above as part of a github action (likely not possible for Steam). If not, consider a docker container with manual intervention for logins (Steam Guard).
  • Create a github action that will batch up the files and generate a new database release on each commit
  • In the app code, look for new updates to the hash database and update the local cache
  • Update the game finder code to export Steam and GoG manifest/depot data
  • On loadout creation figure out what game version is the default and use the authoritative source to determine the game version
  • Use the minimalist hash to reduce indexing time during loadout creation
  • Remove the ignored folders from each game, that were added only to speedup indexing
  • Handle game updates
    • When game files are seen to change external to the app, look if the store has updated the manifest
      • If the store has not updated the manifest, assume the user has modified the files, put the modifications in the Override group
      • If the store has updated the manifest, and the files match the manifest, update the game files in the loadout (move from Override to Game group)
      • If the store has updated the manifest, and the files do not match the manifest, put the files in the Override group
      • If the user is missing files from the manifest, add the reified delete to Overrides, we can remove it or move it if the files are added later.
    • Scan the override group for any files that are now in the hash database and match the current manifest, these may have been left by a previous update, move them to the game group
@halgari
Copy link
Collaborator Author

halgari commented Dec 10, 2024

@Nexus-Mods/nexusmods-app-developers

Request for feedback, this is a bit of an ADR draft. I'd rather not expand this ticket much more as it's already fairly large, but I'd love to get feedback.

@halgari halgari self-assigned this Dec 10, 2024
@erri120
Copy link
Member

erri120 commented Dec 10, 2024

For version mapping, I don't know where they're getting the data from, but GOGDB maps build IDs to "human versions": https://www.gogdb.org/product/1423049311#builds

@erri120
Copy link
Member

erri120 commented Dec 10, 2024

SteamDB is also able to gather patch notes and correlate them to a build ID: https://steamdb.info/patchnotes/16681394

@Al12rs
Copy link
Contributor

Al12rs commented Dec 10, 2024

I have a bunch of questions and potential changes/improvements that I would like to discuss after standup.

@erri120
Copy link
Member

erri120 commented Dec 10, 2024

Trying to illustrate what I mentioned yesterday (2024-12-09) in our meeting about full backups and game updates:

Image

Image

After an update happens, any new files or any modified files are part of the set of game files of the new version. The full set is a combination of the full set of game files of the previously applied version and the set of new and modified files.

@Al12rs
Copy link
Contributor

Al12rs commented Dec 10, 2024

Meeting notes (10/12/2024)

  • Organize repo by game at the top (for easier debugging), and follow steam/gog structure later down, with AppId and the like, to allow adding more information
  • Think what to do in case the user has multiple loadouts, and one loadout gets updated, should All the GameFiles get updated in all loadouts? Should GameFiles be a shared mod with a version selector?
  • Remove hand maintained version data for now, limit to data that can be generated automatically

@LukeNexusMods LukeNexusMods added this to the SDV Beta Release milestone Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants