Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SharedCache] Fix slide info parsing #6172

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

WeiN76LQh
Copy link

The issue this commit fixes was causing SharedCache::ParseAndApplySlideInfoForFile to completely fail to work with v5 slide info, which had a lot of knock on effects, i.e. lots of Objective-C analysis was failing due to invalid pointers which hadn't been fixed up.

dyld_cache_slide_info5 has 4 bytes of padding before value_add. Whilst value_add is not actually being read from, SharedCache::ParseAndApplySlideInfoForFile will read at a location in the file based on the size of the structure dyld_cache_slide_info5. This being off by 4 bytes basically broke v5 slide info fixups.

With this fix many more Objective-C functions have names and a lot more msgSend calls are fixed up.

The issue this commit fixes was causing `SharedCache::ParseAndApplySlideInfoForFile` to completely fail to work with v5 slide info, which had a lot of knock on effects, i.e. lots of Objective-C analysis was failing due to invalid pointers which hadn't been fixed up.

`dyld_cache_slide_info5` has 4 bytes of padding before `value_add`. Whilst `value_add` is not actually being read from, `SharedCache::ParseAndApplySlideInfoForFile` will read at a location in the file based on the size of the structure `dyld_cache_slide_info5`. This being off by 4 bytes basically broke v5 slide info fixups.

With this fix many more Objective-C functions have names and a lot more `msgSend` calls are fixed up.
Prior to this commit the function `DSCObjCProcessor::PostProcessObjCSections` never does anything because it doesn't use the correct names to get the Objective-C sections of the recently loaded library. In fact it never does anything because the DSC never has sections with the names its searching for.

This commit passes the `baseName` (the name of the library that was loaded), which is what other Objective-C section processing code does. Combining the base name with the section names it will now find them and process them as intended. This was resulting in alot of Objective-C related stuff being missed.

There is however still an issue of the fact that the way this DSC plugin works means it only analyzes Objective-C sections once. This catches alot of things but there are a number of cases where other libraries need to be loaded first due to information being referenced in another library. For instance errors like `Failed to determine base classname for category` can be caused by the class reference in the category being to a class outside of the loaded library. Once the library containing the class has been loaded, the section containing the category should be re-proccessed.
@WeiN76LQh WeiN76LQh changed the title [SharedCache] Fix padding issue with v5 slide info [SharedCache] Fix v5 slide info & Objective-C post-processing Nov 21, 2024
@WeiN76LQh
Copy link
Author

I forgot there was another issue with Objective-C processing that meant a bunch was being skipped. This is fixed by the 2nd commit.

Something which I didn't fully touch on in the commit is the issue of processing Objective-C sections which may require data from another image that is not yet loaded. I think the naive but simplistic approach would be to go through all Objective-C sections after loading an image and re-processing all of them. I think this would solve the problem but it would be quite costly, particularly if you're loading alot of sections. In that case it would make sense to have the ability to batch load images and then defer Objective-C processing until the end. Additionally this may mess with changes that the user has made, although I think its unlikely the user would be messing about with Objective-C sections unless they're using a script, in which case they could re-run their script.

What probably makes the most sense in the long run is to keep track of any bits of Objective-C data that failed analysis and then re-analyze those bits after loading of images.

@WeiN76LQh WeiN76LQh marked this pull request as ready for review November 21, 2024 19:13
@bdash
Copy link
Contributor

bdash commented Nov 22, 2024

I took this branch for a quick test and found that it breaks Obj-C message names when loading from the macOS 15.1 shared cache.

Screenshot 2024-11-21 at 15 03 26

I think this is tickling another existing bug in how dyld_cache_slide_info5 is handled rather than being the fault of your change.

dyld checks page_start[i] for DYLD_CACHE_SLIDE_V5_PAGE_ATTR_NO_REBASE. If not equal, it divides page_start[i] by sizeof(uint64_t) and uses that value as the initial offset into the page.

The code in ParseAndApplySlideInfoForFile for dyld_cache_slide_info5 leaves page_start[i] as-is. It works correctly when page_start[i] is 0, which is admittedly the most common case, but generates garbage otherwise.

-                                       pageStartFileOffsets.push_back(mapping.mappingInfo.fileOffset + (pageSize * i) + pageStart);
+                                       pageStartFileOffsets.push_back(mapping.mappingInfo.fileOffset + (pageSize * i) + (pageStart / 8));

@WeiN76LQh
Copy link
Author

-                                       pageStartFileOffsets.push_back(mapping.mappingInfo.fileOffset + (pageSize * i) + pageStart);
+                                       pageStartFileOffsets.push_back(mapping.mappingInfo.fileOffset + (pageSize * i) + (pageStart / 8));

I'm not 100% sure this is enough/or correct, I still had issues with v5 slides at least. I've gone and reworked the slide parsing code and think I've got it fixed. The way "starts" were pulled out into a vector and then processed below in another loop seemed to be making things difficult and disjointed for no visible benefit. Instead I'm processing "starts" in the loops where they are read and largely mimicking exactly what dyld does. I'm just trying to test things to ensure I've got it correct.

@WeiN76LQh WeiN76LQh changed the title [SharedCache] Fix v5 slide info & Objective-C post-processing [SharedCache] Fix slide info parsing & Objective-C post-processing Nov 23, 2024
There seem to be a number of issues with slide info parsing. I decided the simplest fix was to largely mimick what `dyld` is doing.

Part of this process was to remove creating a vector of page starts and processing them in another loop below the one where they were read out. It wasn't clear to me the original design decision to separate it into 2 loops like that. I think this was part of the problem that was causing issues.

By adding rewrites in the same loop where page starts are being read out, it was much easier to mimick the code in `dyld` which I assume has to be correct. So as long as my copying was correct then I believe this should work as intended.

Not everything has been thoroughly tested but I'm pretty confident v3 and v5 are now working as intended. v2 should be but less testing of it has been done.
@WeiN76LQh
Copy link
Author

The latest commit re-writes some of the slide parsing stuff. More details in the commit message but I think at this point it might be working. Not everything has been thoroughly tested but I'm pretty confident v3 and v5 are now working as intended. v2 should be but less testing of it has been done.

@bdash
Copy link
Contributor

bdash commented Nov 23, 2024

I did some testing against the macOS 15.1 dyld shared cache I was seeing problems with yesterday. I've confirmed the slide handling exactly matches what dyld is doing, so this change looks good.

Sadly I'm still seeing bogus Obj-C selector names. I'm guessing that's a result of Obj-C data being processed that wasn't before and that there's a pre-existing bug in parsing Obj-C metadata that wasn't visible due to the data being skipped. I'll look into that separately.

@WeiN76LQh
Copy link
Author

Sadly I'm still seeing bogus Obj-C selector names. I'm guessing that's a result of Obj-C data being processed that wasn't before and that there's a pre-existing bug in parsing Obj-C metadata that wasn't visible due to the data being skipped. I'll look into that separately.

Does this happen to be a result of this comment on issue 6173?

@WeiN76LQh WeiN76LQh changed the title [SharedCache] Fix slide info parsing & Objective-C post-processing [SharedCache] Fix slide info parsing Nov 25, 2024
@WeiN76LQh
Copy link
Author

I've removed the fixes for the Objective-C code because it didn't make sense to be part of this pull request. I'll re-add it in another one which is focussed on some Objective-C changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants