Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SharedCache] Use m_exportInfos as an export list cache #6197

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 73 additions & 50 deletions view/sharedcache/core/SharedCache.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2634,33 +2634,34 @@ void SharedCache::InitializeHeader(

if (header.exportTriePresent && header.linkeditPresent && vm->AddressIsMapped(header.linkeditSegment.vmaddr))
{
auto symbols = SharedCache::ParseExportTrie(vm->MappingAtAddress(header.linkeditSegment.vmaddr).first.fileAccessor->lock(), header);
std::vector<std::pair<uint64_t, std::pair<BNSymbolType, std::string>>> exportMapping;
auto symbols = GetExportListForHeader(header, [&]() {
return vm->MappingAtAddress(header.linkeditSegment.vmaddr).first.fileAccessor->lock();
});
for (const auto& symbol : symbols)
{
exportMapping.push_back({symbol->GetAddress(), {symbol->GetType(), symbol->GetRawName()}});
auto bnSymbol = new Symbol(symbol.second.first, symbol.second.second, symbol.first);
if (typeLib)
{
auto type = m_dscView->ImportTypeLibraryObject(typeLib, {symbol->GetFullName()});
auto type = m_dscView->ImportTypeLibraryObject(typeLib, {symbol.second.second});

if (type)
{
view->DefineAutoSymbolAndVariableOrFunction(view->GetDefaultPlatform(), symbol, type);
view->DefineAutoSymbolAndVariableOrFunction(view->GetDefaultPlatform(), bnSymbol, type);
}
else
view->DefineAutoSymbol(symbol);
view->DefineAutoSymbol(bnSymbol);

if (view->GetAnalysisFunction(view->GetDefaultPlatform(), symbol->GetAddress()))
if (view->GetAnalysisFunction(view->GetDefaultPlatform(), symbol.first))
{
auto func = view->GetAnalysisFunction(view->GetDefaultPlatform(), symbol->GetAddress());
if (symbol->GetFullName() == "_objc_msgSend")
auto func = view->GetAnalysisFunction(view->GetDefaultPlatform(), symbol.first);
if (symbol.second.second == "_objc_msgSend")
{
func->SetHasVariableArguments(false);
}
else if (symbol->GetFullName().find("_objc_retain_x") != std::string::npos || symbol->GetFullName().find("_objc_release_x") != std::string::npos)
else if (symbol.second.second.find("_objc_retain_x") != std::string::npos || symbol.second.second.find("_objc_release_x") != std::string::npos)
{
auto x = symbol->GetFullName().rfind("x");
auto num = symbol->GetFullName().substr(x + 1);
auto x = symbol.second.second.rfind("x");
auto num = symbol.second.second.substr(x + 1);

std::vector<BinaryNinja::FunctionParameter> callTypeParams;
auto cc = m_dscView->GetDefaultArchitecture()->GetCallingConventionByName("apple-arm64-objc-fast-arc-" + num);
Expand All @@ -2673,9 +2674,8 @@ void SharedCache::InitializeHeader(
}
}
else
view->DefineAutoSymbol(symbol);
view->DefineAutoSymbol(bnSymbol);
}
m_exportInfos[header.textBase] = exportMapping;
}
view->EndBulkModifySymbols();

Expand Down Expand Up @@ -2779,6 +2779,31 @@ std::vector<Ref<Symbol>> SharedCache::ParseExportTrie(std::shared_ptr<MMappedFil
return symbols;
}


std::vector<std::pair<uint64_t, std::pair<BNSymbolType, std::string>>> SharedCache::GetExportListForHeader(SharedCacheMachOHeader header, std::function<std::shared_ptr<MMappedFileAccessor>()> provideLinkeditFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns a copy of this chonky vector each time it is called. If the thread-safety story permits it, it would be better to return a const reference instead.

It might be worth replacing this whacky nested std::pair with a struct with named fields that describe what they contain. That'd clear up a lot of the confusing pair.first / pair.second.first stuff. There's an ExportNode structure declared in this file that looks to be intended for this purpose, but is currently never used for anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be possible as once a vector is created it shouldn't be reallocated by m_exportInfos?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thread safety problems arise when m_exportInfos is modified. If the hash table needs to grow it will reallocate and moves its contents. This would invalidate any existing references to data it stores.

One solution that doesn't require any additional locking would be to add a layer of indirection: have m_exportInfos store a std::unique_ptr to the vector. Then the vector itself will never move, only the pointer that owns it. You could then safely return a reference to the pointed-at vector from this function.

Copy link
Author

@WeiN76LQh WeiN76LQh Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure using std::unique_ptr is pratical here because m_exportInfos in ViewStateCacheStore causes compiler errors due to copying mechanics when m_exportInfos has a unique_ptr type in its template. shared_ptr works fine and I don't see much of an issue in using that instead.

{
if (auto it = m_exportInfos.find(header.textBase); it != m_exportInfos.end())
{
return it->second;
}
else
{
std::shared_ptr<MMappedFileAccessor> linkeditFile = provideLinkeditFile();
if (!linkeditFile)
return std::vector<std::pair<uint64_t, std::pair<BNSymbolType, std::string>>>();

auto exportList = SharedCache::ParseExportTrie(linkeditFile, header);
std::vector<std::pair<uint64_t, std::pair<BNSymbolType, std::string>>> exportMapping(exportList.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now the only caller of ParseExportTrie, it'd be beneficial to have it produce data in the format that this expects. This will eliminate the need for ParseExportTrie / ReadExportNode to allocate a bunch of Symbol instances only for this code to copy data out of them and then discard them. All those memory allocations and copying all of the names around is surprisingly expensive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was lazy of me, I'll add another commit to do that

for (const auto& sym : exportList)
{
exportMapping.push_back({sym->GetAddress(), {sym->GetType(), sym->GetRawName()}});
}
m_exportInfos[header.textBase] = exportMapping;
return exportMapping;
}
}


std::vector<std::string> SharedCache::GetAvailableImages()
{
std::vector<std::string> installNames;
Expand All @@ -2794,30 +2819,32 @@ std::vector<std::pair<std::string, Ref<Symbol>>> SharedCache::LoadAllSymbolsAndW
{
std::unique_lock<std::mutex> initialLoadBlock(viewSpecificMutexes[m_dscView->GetFile()->GetSessionId()].viewOperationsThatInfluenceMetadataMutex);

bool doSave = false;
std::vector<std::pair<std::string, Ref<Symbol>>> symbols;
for (const auto& img : m_images)
{
auto header = HeaderForAddress(img.headerLocation);
std::shared_ptr<MMappedFileAccessor> mapping;
try {
mapping = MMappedFileAccessor::Open(m_dscView, m_dscView->GetFile()->GetSessionId(), header->exportTriePath)->lock();
}
catch (...)
{
m_logger->LogWarn("Serious Error: Failed to open export trie %s for %s", header->exportTriePath.c_str(), header->installName.c_str());
continue;
}
auto exportList = SharedCache::ParseExportTrie(mapping, *header);
std::vector<std::pair<uint64_t, std::pair<BNSymbolType, std::string>>> exportMapping;
auto exportList = GetExportListForHeader(*header, [&]() {
try {
auto mapping = MMappedFileAccessor::Open(m_dscView, m_dscView->GetFile()->GetSessionId(), header->exportTriePath)->lock();
doSave = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only difference between the various "open a file" lambdas passed to GetExportListForHeader? If so you could simplify these call sites by having GetExportListForHeader take bool* didModifyExportList = nullptr. Callers that are interested in knowing that m_exportInfos was modified could pass a pointer to bool in. Others would omit it. This would let you move the file opening logic into GetExportListForHeader.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have done this but the call in SharedCache::InitializeHeader has a different way of loading a mapped file.

bool* didModifyExportList = nullptr - I didn't do this just cause it seemed this is the only instance where it mattered so it seemed 50/50 whether the extra parameter made sense or what I did. For future stuff though it might make more sense to have the parameter so its clear to a caller that they need to do a save if they didn't know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that SharedCache::InitializeHeader is doing something different. Bummer!

return mapping;
}
catch (...)
{
m_logger->LogWarn("Serious Error: Failed to open export trie %s for %s", header->exportTriePath.c_str(), header->installName.c_str());
return std::shared_ptr<MMappedFileAccessor>(nullptr);
}
});
for (const auto& sym : exportList)
{
exportMapping.push_back({sym->GetAddress(), {sym->GetType(), sym->GetRawName()}});
symbols.push_back({img.installName, sym});
symbols.push_back({img.installName, new Symbol(sym.second.first, sym.second.second, sym.first)});
}
m_exportInfos[header->textBase] = exportMapping;
}

SaveToDSCView();
// Only save to DSC view if a header was actually loaded
if (doSave)
SaveToDSCView();

return symbols;
}
Expand Down Expand Up @@ -2874,17 +2901,6 @@ void SharedCache::FindSymbolAtAddrAndApplyToAddr(uint64_t symbolLocation, uint64
auto header = HeaderForAddress(symbolLocation);
if (header)
{
std::shared_ptr<MMappedFileAccessor> mapping;
try {
mapping = MMappedFileAccessor::Open(m_dscView, m_dscView->GetFile()->GetSessionId(), header->exportTriePath)->lock();
}
catch (...)
{
m_logger->LogWarn("Serious Error: Failed to open export trie for %s", header->installName.c_str());
return;
}
auto exportList = SharedCache::ParseExportTrie(mapping, *header);
std::vector<std::pair<uint64_t, std::pair<BNSymbolType, std::string>>> exportMapping;
std::unique_lock<std::mutex> lock(viewSpecificMutexes[m_dscView->GetFile()->GetSessionId()].typeLibraryLookupAndApplicationMutex);
auto typeLib = m_dscView->GetTypeLibrary(header->installName);
if (!typeLib)
Expand All @@ -2897,29 +2913,40 @@ void SharedCache::FindSymbolAtAddrAndApplyToAddr(uint64_t symbolLocation, uint64
}
}
lock.unlock();

auto exportList = GetExportListForHeader(*header, [&]() {
try {
return MMappedFileAccessor::Open(m_dscView, m_dscView->GetFile()->GetSessionId(), header->exportTriePath)->lock();
}
catch (...)
{
m_logger->LogWarn("Serious Error: Failed to open export trie %s for %s", header->exportTriePath.c_str(), header->installName.c_str());
return std::shared_ptr<MMappedFileAccessor>(nullptr);
}
});

id = m_dscView->BeginUndoActions();
m_dscView->BeginBulkModifySymbols();
for (const auto& sym : exportList)
{
exportMapping.push_back({sym->GetAddress(), {sym->GetType(), sym->GetRawName()}});
if (sym->GetAddress() == symbolLocation)
if (sym.first == symbolLocation)
{
if (auto func = m_dscView->GetAnalysisFunction(m_dscView->GetDefaultPlatform(), targetLocation))
{
m_dscView->DefineUserSymbol(
new Symbol(FunctionSymbol, prefix + sym->GetFullName(), targetLocation));
new Symbol(FunctionSymbol, prefix + sym.second.second, targetLocation));

if (typeLib)
if (auto type = m_dscView->ImportTypeLibraryObject(typeLib, {sym->GetFullName()}))
if (auto type = m_dscView->ImportTypeLibraryObject(typeLib, {sym.second.second}))
func->SetUserType(type);
}
else
{
m_dscView->DefineUserSymbol(
new Symbol(sym->GetType(), prefix + sym->GetFullName(), targetLocation));
new Symbol(sym.second.first, prefix + sym.second.second, targetLocation));

if (typeLib)
if (auto type = m_dscView->ImportTypeLibraryObject(typeLib, {sym->GetFullName()}))
if (auto type = m_dscView->ImportTypeLibraryObject(typeLib, {sym.second.second}))
m_dscView->DefineUserDataVariable(targetLocation, type);
}
if (triggerReanalysis)
Expand All @@ -2931,10 +2958,6 @@ void SharedCache::FindSymbolAtAddrAndApplyToAddr(uint64_t symbolLocation, uint64
break;
}
}
{
std::unique_lock<std::mutex> _lock(viewSpecificMutexes[m_dscView->GetFile()->GetSessionId()].viewOperationsThatInfluenceMetadataMutex);
m_exportInfos[header->textBase] = exportMapping;
}
m_dscView->EndBulkModifySymbols();
m_dscView->ForgetUndoActions(id);
}
Expand Down
1 change: 1 addition & 0 deletions view/sharedcache/core/SharedCache.h
Original file line number Diff line number Diff line change
Expand Up @@ -1196,6 +1196,7 @@ namespace SharedCacheCore {
const std::string& currentText, size_t cursor, uint32_t endGuard);
std::vector<Ref<Symbol>> ParseExportTrie(
std::shared_ptr<MMappedFileAccessor> linkeditFile, SharedCacheMachOHeader header);
std::vector<std::pair<uint64_t, std::pair<BNSymbolType, std::string>>> GetExportListForHeader(SharedCacheMachOHeader header, std::function<std::shared_ptr<MMappedFileAccessor>()> provideLinkeditFile);
};


Expand Down