-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add macho parser for use by C inject_hash #1435
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1435 +/- ##
==========================================
+ Coverage 77.01% 77.17% +0.15%
==========================================
Files 426 426
Lines 71738 71449 -289
==========================================
- Hits 55252 55139 -113
+ Misses 16486 16310 -176 ☔ View full report in Codecov by Sentry. |
ae4bdd8
to
6674cdd
Compare
fseek(file, macho->sections[i].offset, SEEK_SET); | ||
fread(section_data, 1, macho->sections[i].size, file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the return values on these calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
if( 0 != fseek(file, macho->sections[i].offset, SEEK_SET)) {
free(section_data); // if section_data is local to loop
LOG_ERROR("Failed to seek in file %s", filename);
goto end;
}
|
||
uint8_t* get_macho_section_data(const char *filename, machofile *macho, const char *section_name, size_t *size, uint32_t *offset) { | ||
FILE *file = NULL; | ||
uint8_t *section_data = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move the declaration of section_data
down to the first place it's assigned to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm under the impression that I need to leave it out of the for loop in order to be able to free it, unless I'm misunderstanding you here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have it like this:
...
for (uint32_t i = 0; i < macho->num_sections; i++) {
if (strcmp(macho->sections[i].name, section_name) == 0) {
uint8_t *section_data = malloc(macho->sections[i].size); // declaration
if (section_data == NULL) {
LOG_ERROR("Error allocating memory for section data");
goto end;
}
fseek(file, macho->sections[i].offset, SEEK_SET);
uint32_t bytes_read = fread(section_data, 1, macho->sections[i].size, file);
if (bytes_read != macho->sections[i].size) {
free(section_data); // free on error
LOG_ERROR("Error reading section data from file %s", filename);
goto end;
}
...
if (strcmp(segment->segname, "__TEXT") == 0) { | ||
section_data *sections = (section_data *)&segment[1]; | ||
for (uint32_t j = 0; j < segment->nsects; j++) { | ||
if (strcmp(sections[j].sectname, "__text") == 0 || strcmp(sections[j].sectname, "__const") == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of incrementing a section_index
counter and using it as an index, why not just have an explicit index at which each section (e.g., "__text" at 0, "__const" at 1, etc.) is stored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current implementation is more easily expandable if we ever decide that we're interested in more than just the __text
and __const
sections, and I'm unsure if changing it around provides any added benefit to our current use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a known location for each section could make it more robust to potential(?) changes in the order that sections are processed. I don't see it limiting expansion to handle other sections; sections being added to the processing would likewise be assigned to a specific index.
6674cdd
to
e66306e
Compare
f592630
to
87a8750
Compare
int const_found = 0; | ||
int symtab_found = 0; | ||
|
||
for (uint32_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic looks suspcious. Is load_commands[i].cmdsize
not always the same as sizeof(struct load_command)
? This logic assumes that it can be an exact multiple (>1). A comment here should explain the "why".
Otherwise, if the two values are equal it could be simplified:
const uint32_t num_cmds = macho->macho_header.sizeofcmds / sizeof(struct load_command);
for (uint32_t i = 0; i < num_cmds; i += 1) {
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason this works is that sizeof(struct load_command) results in 8 bytes (the struct consists of two uint32_t
s). The documentation in loader.h
says that .cmdsize
is always a multiple of 8 on 64bit systems and .sizeofcmds
is the sum of all present .cmdsize
fields, so this should work here.
I'll add a comment explaining this, but I don't think your simplified code works since your code will increment i
by sizeof(struct load_command)
(which will always be 8) instead of the actual size of the command provided by load_commands[i].cmdsize
.
87a8750
to
09a89bc
Compare
.strsize = symtab_command_strsize, | ||
}; | ||
|
||
if (fwrite(&test_header, sizeof(struct mach_header_64), 1, file) != 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno what the definition are of these structures. But generally sizeof(struct) != <sum of size of fields in struct>
.
This should work locally here, because you use sizeof(struct) to be both write and read. But in general, this is not the correct way to serialise.
int symtab_found = 0; | ||
|
||
// mach-o/loader.h explains that cmdsize (and by extension sizeofcmds) must be a multiple of 8 on 64-bit systems. struct load_command will always be 8 bytes. | ||
for (uint32_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (uint32_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) { | |
for (size_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) { |
Idiomatic to use size_t
. If one can speak of idiomatic C. Same for the other occurrences.
613e5ae
to
a064787
Compare
900b33b
900b33b
to
a9a1d4c
Compare
Issues:
Addresses CryptoAlg-2240
Description of changes:
Adds a macho file parser and tests in anticipation of usage in an upcoming C language replacement for
inject_hash.go
Call-outs:
Since this is only meant to be used in a specific use-case, it's reading capabilities are tailored to only read & store information on the parts of the macho file that we're interested in (__test section, __const section, the string table, and the symbol table. We can open another PR to add additional capabilities if the need arises.
Testing:
Tests pass locally
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.