Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Syncback to support converting Roblox files to a Rojo project #937

Open
wants to merge 372 commits into
base: master
Choose a base branch
from

Conversation

Dekkonot
Copy link
Member

Well, the time has come. Syncback is real. Over 300 commits and 9 months later, here we are.

I'd like to apologize. There are a lot of files in this pull request and most of them are for tests. I went back and forth about whether they should be included in the this PR or added in a second one, but ultimately I decided that someone would have to go through them either way so I may as well include them now. If you want them removed so that the code is easier to review, I can do so.

This is a general tech overview of what's changed, what's added, and how it works. It is't a detailed tech dive because I'm a firm believer in reading code, but this should serve as a primer for reading the code.

Overview

Syncback is a feature designed to solve the longstanding problem of converting Roblox files into Rojo projects. It works by accepting a Rojo project file and a Roblox file, then iterating through the file to match Instances to the Rojo project. Then, every Instance that's different has a middleware assigned to it that then has a method for converting the Instance back into the file type the middleware represents.

The middleware used is the old one (if an Instance is already present in the Rojo project) or a reasonable default depending upon the class of the Instance.

The path an Instance is written to is dependent upon where it is in project tree. So if you had a project like this:

{
    "tree": {
        "DataModel": {
            "ReplicatedStorage": {
                "$path": "src/Shared"
            }
        }
    }
}

All children of ReplicatedStorage would be placed under src/Shared. If one or more of those children became a directory (such as a ModuleScript with children becoming a directory with init.luau in it), then descendants of that Instance would be placed under that directory. This is recursive for every path that exists within a project file.

A direct consequence of this is that the tree of a project file controls syncback almost directly.

Change Summary

The following changes were made to Rojo to help support syncback. It does not cover syncback-specific additions, those will be later.

  • Added create_dir and create_dir_all functions to memofs
  • InstigatingSource::ProjectNode was changed to be a struct variant instead of a tuple variant
  • A fair few things were made public exports of librojo
  • Projects, JsonModels and Metadata now use BTreeMap instead of HashMap for properties and attributes
  • CSV example field now has an alias named examples
  • CSV now uses Cow<str> instead of just &str
  • Directories (both ones that become Folders and ones with init files) now use Middleware as their backing
  • The relevant_paths of directories now only contains actually existent paths
  • Middleware is now stored in the metadata for Instance
  • Middleware now displays in the web ui

Of all of these, I have an explanation for all of them except for changing what type of variant InstigatingSource::ProjectNode is. In my defense, that change as made 7 months ago and it probably made sense at the time.

You may have concerns with the BTreeMap change. I do too. I don't know what the performance hit of using a BTreeMap is at scale because I don't have any tests that uses projects, json models, or metadata files enough to stress test it. The testing I did seems to indicate it has a minimal impact because of how infrequently they're used though; at most it adds a bit of startup time for the server. A change of this nature is necessary in some capacity because the order of these maps has to be deterministic for syncback to function well.

I'm happy to answer any other questions you have on the changes.

Syncback Additions

There are three main parts to syncback:

  • The SyncbackSnapshot and the related SyncbackReturn structs
  • The syncback method for Middleware items
  • The core loop that iterates through the Roblox file provided

Throughout the code for syncback you will see that I've used the terms 'old' and 'new'. In all cases, the term 'old' refers to the Rojo project and the term 'new' refers to the Roblox file. So e.g. the 'new dom' refers to the DOM from the Roblox file. I can change this if you'd like, but it's pervasive throughout the code and I think it's cleaner than 'project' and 'roblox' or whatever.

A SyncbackSnapshot contains data for performing syncback on a specific Instance. It's utilized both by the syncback method for each Middleware as well as the core syncback loop. You'll want to read the code for exact details, but in summary these snapshots contain data that's global for syncback (things like a reference to a VFS) and data for syncing back a specific Instance from the Roblox file. This includes a path to that it should be written to and the Instance it corresponds to from the Rojo file.

A SyncbackReturn contains the actual results of running syncback on a SyncbackSnapshot. It's returned from the syncback methods for each Middleware, as you may expect. It includes a FsSnapshot that contains the serialized form of the Instance (including any meta.json files), a set of SyncbackSnapshots to run syncback over (this is used by directories and projects), and a list of paths that should be removed by Syncback. These are consumed by the core syncback loop to know what changes to make to the file system as well as to continue the syncback process.

The bulk of the work is done in the individual syncback methods for each Middleware. These are stored in the (now-misnamed) snapshot_middleware submodules along with the snapshot functions for each middleware. They accept a SyncbackSnapshot and return a SyncbackReturn. Using the information contained within the passed SyncbackSnapshot, each Middleware reserializes the provided Instance and if necessary returns a list of other Instances that need to have syncback ran on them. The vast majority of Middleware variants have a very simple syncback implementation. Directories and project files are the exception, so they're documented a bit more below.

The core loop is essentially a queue of SyncbackSnapshots that's extended by pulling the returned snapshots out of each SyncbackReturn. The loop does some comparisons to skip identical Instances, checks that a given snapshot is allowed to be written (it may be blocked by one of the syncback rules), and then determines a Middleware to use for each snapshot before running the syncback method for the Middleware.

The core loop assumes that the root of the project file and the root of the Roblox file are the same thing, so the loop is able to get started by constructing a SyncbackSnapshot that contains the roots.

This sounds simple because it is. By delegating most of the logic to the individual middleware implementations, the core design of syncback is relatively straightforward to follow.

Instance Matching

Syncback for a directory or project file has to perform matching between the Rojo project and Roblox file rather than relying upon SyncbackSnapshot like other middleware. This is because they need to generate the snapshots for their children and project nodes respectively. For directories, this also allows children that are present on the Rojo project but not in the Roblox file to be removed from the file system, which is an important quality of life feature for syncback.

Instance matching is done by name, since the file system requires unique names unlike Roblox. This runs into some problems with generic Instance names (such as 'Tree' or 'Part') so it isn't perfect. Rojo's project format also doesn't support multiple models having the same name but both being in the same place on the file system. Unless we want to adopt a method of this nature (multiroot models, multiple paths for a folder, or whatever), matching by name is going to have to work.

This is a surprisingly complex problem but both Dir and Project middlewares have a fair few comments describing the implementation as a result. Thus, I will simply encourage you to read the implementation and then ask me questions when you have them. It will be easier that way.

Instance Comparisons

Syncback's core loop compare Instances from the SyncbackSnapshot and skips if them and all of their descendants are identical. This is done for UX more than anything, because it makes for a bad experience if files are rewritten when they don't have to be. Syncback can't just naively skip if two Instances are the same (identical name, classname, and properties map) because it's is designed such that parents initiate syncback for their children.

To quickly tell if two Instances are identical, there's a variety of ways you could handle it. The easiest one is to just compare two Instances' names, class name, and properties. This works well, but it doesn't work for comparing subtrees. To do that, you'd potentially have to compare infinite Instances from each tree and match them by name and class and... Well, it gets out of hand. So, rather than doing that syncback needed a way to quickly and efficiently compare entire subtrees, regardless of the number of descendants they had.

This problem was solved by hashing and comparing the hashes of two different Instances. At a glance, this doesn't seem to solve the descendant comparison problem, but by making it so that each parent incorporates the hashes of its children, it makes it so that hashes are only identical if and only if their descendants are also identical. All it takes to accomplish this is by hashing a DOM depth-first and then making it so that a parent incorporates the hashes of its children during the hashing process. This has the consequence of making it so you have to frontload calculating hashes, but it's relatively quick so it isn't a big deal. Syncback precomputes hashes before the core loop even begins so that there's only an initial overhead.

The current hashing algorithm used is Blake3. There's no real reasoning behind this except that Blake3 is used by rbx-types already for hashing SharedString, so it doesn't add a direct dependency. Really the only requirement is that the hash be reasonably quick (not cryptographic), and to that end something like XXH3 or GxHash may be more appropriate, but that's mostly hypothetical. The hashing process is by no means a bottleneck for syncback.

An interesting note is the prospect of hashing floating point numbers. As most Roblox data types use floats, it's inevitable that we run into imprecisions. To try to mitigate this, floats are rounded somewhat (literally via: ((x * 10.0).round() / 10.0)) before hashing, but that only solves minor imprecisions. I do not have a solution for larger ones. In practice, this has proven to not be a big deal for Adopt Me and other tests, but it's still worth noting as a potential problem.

Also add ParticleEmitter as modeljson and remove screengui exceptionalism
Copy link
Contributor

@nezuo nezuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project file formatting

Syncback essentially forces you to use rojo's json formatting. I would be fine with this if the formatting was better. This can be done in a separate PR (I'll try to make an issue with my specific issues with the formatting) but I do think the default formatting needs to be improved.

Edit: Made an issue: #962. I noticed that syncback doesn't actually format the project file the same was fmt-project does. I definitely think the should match!

Directories are shown as changed when a child changes

image
I modified the src/server/init.server.luau file. It says that it will also write to src/server which is confusing and somewhat redundant since the src/server path is in the first change already.

default.project.json is marked as change every syncback even though it doesn't change at all

Repro file:
rojo-syncback-testing-bug.zip

Syncback adds lots of properties after first save

image
image
I'm not really sure what could or should be done about this but I thought I should bring it up at least.
I noticed after I built the default rojo project file and synced it back, Lighting was added with some default properties. Could we avoid syncing back properties that are default?

src/cli/syncback.rs Outdated Show resolved Hide resolved
Copy link
Member

@kennethloeffler kennethloeffler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE avoiding syncing back default properties: the implementation here already attempts to do so, but it won't work for properties that don't have defaults in the reflection database, which can happen when the reflection database is out of date. This is what happened for most of the Workspace properties in the screenshot, except for StreamingEnabled and WorldPivotData, which are not at their default values. We should update the reflection database (and all the rest of the rbx-dom libraries) on master, and maybe the two mentioned properties should have their defaults patched?

I'm also not seeing any unexpected Lighting properties after saving the rbxl of rojo init's "place" template in Roblox Studio, then syncing back - did you mean Workspace and/or StarterPlayer?

It looks like all the StarterPlayer properties appear because of a small mistake in variant_eq:

src/variant_eq.rs Outdated Show resolved Hide resolved
@nezuo
Copy link
Contributor

nezuo commented Aug 18, 2024

I'm also not seeing any unexpected Lighting properties after saving the rbxl of rojo init's "place" template in Roblox Studio, then syncing back - did you mean Workspace and/or StarterPlayer?

That's my mistake on the Lighting properties, I thought they weren't already apart of the default project file.

@kennethloeffler
Copy link
Member

It looks like the change in 47c702b slightly borked the warning message for when a property is unrepresentable in json:

[WARN  librojo::snapshot_middleware::project] Rojo cannot serialize the property Workspace.ModelMeshData in project files.
        If this is not acceptable, resave the Instance at '' manually as an RBXM or RBXMX.

Previously, this warning would say:

[WARN  librojo::snapshot_middleware::project] Rojo cannot serialize the property Workspace.ModelMeshData in project files.
        If this is not acceptable, resave the Instance at 'ken-test-project' manually as an RBXM or RBXMX.

@Dekkonot
Copy link
Member Author

I will write a better test harness for syncback that ignores new properties so that we don't have to change them every time a new database update happens. For now though, I'm just updating them manually after database updates.

@Dekkonot
Copy link
Member Author

Issue with the warning should be fixed. I've also prevented RBX_ attributes from serializing in JSON files because they're internal facing and thus we don't like them.

Copy link
Member

@kennethloeffler kennethloeffler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny problem: the protected attribute name prefix is actually RBX (no trailing underscore), which you can verify by running something like

game:SetAttribute("RBXVerySpecialAttribute", true)

in Roblox Studio's command bar.

Since we're not sure what sort of process (if any) Roblox uses when rolling out protected attributes, we have no reason to believe that the underscore will be present, so we should check for an RBX prefix.

src/snapshot_middleware/json_model.rs Outdated Show resolved Hide resolved
src/snapshot_middleware/meta_file.rs Outdated Show resolved Hide resolved
src/snapshot_middleware/meta_file.rs Outdated Show resolved Hide resolved
src/snapshot_middleware/project.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope: cli Relevant to the Rojo CLI size: very large type: enhancement Feature or improvement that should potentially happen
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants