Add macho parser for use by C inject_hash #1435

billbo-yang · 2024-02-06T20:30:55Z

Issues:

Addresses CryptoAlg-2240

Description of changes:

Adds a macho file parser and tests in anticipation of usage in an upcoming C language replacement for inject_hash.go

Call-outs:

Since this is only meant to be used in a specific use-case, it's reading capabilities are tailored to only read & store information on the parts of the macho file that we're interested in (__test section, __const section, the string table, and the symbol table. We can open another PR to add additional capabilities if the need arises.

Testing:

Tests pass locally

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

codecov-commenter · 2024-02-06T20:57:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.17%. Comparing base (1e4601e) to head (613e5ae).
Report is 25 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1435      +/-   ##
==========================================
+ Coverage   77.01%   77.17%   +0.15%     
==========================================
  Files         426      426              
  Lines       71738    71449     -289     
==========================================
- Hits        55252    55139     -113     
+ Misses      16486    16310     -176

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

util/fipstools/inject_hash/macho_parser/tests/macho_tests.h

CMakeLists.txt

util/fipstools/inject_hash/macho_parser/macho_parser.c

util/fipstools/inject_hash/macho_parser/macho_parser.h

justsmth · 2024-02-20T21:10:28Z

util/fipstools/inject_hash/macho_parser/macho_parser.c

+            fseek(file, macho->sections[i].offset, SEEK_SET);
+            fread(section_data, 1, macho->sections[i].size, file);


Check the return values on these calls.

Suggestion:

if( 0 != fseek(file, macho->sections[i].offset, SEEK_SET)) { free(section_data); // if section_data is local to loop LOG_ERROR("Failed to seek in file %s", filename); goto end; }

justsmth · 2024-02-20T21:11:35Z

util/fipstools/inject_hash/macho_parser/macho_parser.c

+
+uint8_t* get_macho_section_data(const char *filename, machofile *macho, const char *section_name, size_t *size, uint32_t *offset) {
+    FILE *file = NULL;
+    uint8_t *section_data = NULL;


You can move the declaration of section_data down to the first place it's assigned to.

I'm under the impression that I need to leave it out of the for loop in order to be able to free it, unless I'm misunderstanding you here?

You could have it like this:

... for (uint32_t i = 0; i < macho->num_sections; i++) { if (strcmp(macho->sections[i].name, section_name) == 0) { uint8_t *section_data = malloc(macho->sections[i].size); // declaration if (section_data == NULL) { LOG_ERROR("Error allocating memory for section data"); goto end; } fseek(file, macho->sections[i].offset, SEEK_SET); uint32_t bytes_read = fread(section_data, 1, macho->sections[i].size, file); if (bytes_read != macho->sections[i].size) { free(section_data); // free on error LOG_ERROR("Error reading section data from file %s", filename); goto end; } ...

util/fipstools/inject_hash/macho_parser/macho_parser.c

util/fipstools/inject_hash/macho_parser/macho_parser.h

util/fipstools/inject_hash/macho_parser/macho_parser.c

justsmth · 2024-02-20T21:48:04Z

util/fipstools/inject_hash/macho_parser/macho_parser.c

+            if (strcmp(segment->segname, "__TEXT") == 0) {
+                section_data *sections = (section_data *)&segment[1];
+                for (uint32_t j = 0; j < segment->nsects; j++) {
+                    if (strcmp(sections[j].sectname, "__text") == 0 || strcmp(sections[j].sectname, "__const") == 0) {


Instead of incrementing a section_index counter and using it as an index, why not just have an explicit index at which each section (e.g., "__text" at 0, "__const" at 1, etc.) is stored?

I think the current implementation is more easily expandable if we ever decide that we're interested in more than just the __text and __const sections, and I'm unsure if changing it around provides any added benefit to our current use-case.

I think having a known location for each section could make it more robust to potential(?) changes in the order that sections are processed. I don't see it limiting expansion to handle other sections; sections being added to the processing would likewise be assigned to a specific index.

justsmth · 2024-03-18T18:00:08Z

util/fipstools/inject_hash/macho_parser/macho_parser.c

+    int const_found = 0;
+    int symtab_found = 0;
+
+    for (uint32_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) {


This logic looks suspcious. Is load_commands[i].cmdsize not always the same as sizeof(struct load_command)? This logic assumes that it can be an exact multiple (>1). A comment here should explain the "why".

Otherwise, if the two values are equal it could be simplified:

const uint32_t num_cmds = macho->macho_header.sizeofcmds / sizeof(struct load_command); for (uint32_t i = 0; i < num_cmds; i += 1) { ...

The reason this works is that sizeof(struct load_command) results in 8 bytes (the struct consists of two uint32_ts). The documentation in loader.h says that .cmdsize is always a multiple of 8 on 64bit systems and .sizeofcmds is the sum of all present .cmdsize fields, so this should work here.

I'll add a comment explaining this, but I don't think your simplified code works since your code will increment i by sizeof(struct load_command) (which will always be 8) instead of the actual size of the command provided by load_commands[i].cmdsize.

torben-hansen · 2024-03-27T16:16:29Z

util/fipstools/inject_hash/macho_parser/tests/macho_tests.h

+            .strsize = symtab_command_strsize,
+        };
+
+        if (fwrite(&test_header, sizeof(struct mach_header_64), 1, file) != 1) {


I dunno what the definition are of these structures. But generally sizeof(struct) != <sum of size of fields in struct>.

This should work locally here, because you use sizeof(struct) to be both write and read. But in general, this is not the correct way to serialise.

torben-hansen · 2024-03-27T16:22:37Z

util/fipstools/inject_hash/macho_parser/macho_parser.c

+    int symtab_found = 0;
+
+    // mach-o/loader.h explains that cmdsize (and by extension sizeofcmds) must be a multiple of 8 on 64-bit systems. struct load_command will always be 8 bytes.
+    for (uint32_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) {


Suggested change

for (uint32_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) {

for (size_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) {

Idiomatic to use size_t. If one can speak of idiomatic C. Same for the other occurrences.

billbo-yang requested a review from a team as a code owner February 6, 2024 20:30

torben-hansen self-requested a review February 6, 2024 20:46

justsmth self-requested a review February 8, 2024 22:03

justsmth added the reviewers-assigned label Feb 8, 2024

billbo-yang force-pushed the macho_parser branch from ae4bdd8 to 6674cdd Compare February 9, 2024 22:07

torben-hansen reviewed Feb 15, 2024

View reviewed changes

justsmth reviewed Feb 20, 2024

View reviewed changes

billbo-yang force-pushed the macho_parser branch from 6674cdd to e66306e Compare February 28, 2024 19:21

billbo-yang requested review from torben-hansen and justsmth February 28, 2024 22:41

billbo-yang force-pushed the macho_parser branch from f592630 to 87a8750 Compare March 14, 2024 22:20

justsmth reviewed Mar 18, 2024

View reviewed changes

billbo-yang force-pushed the macho_parser branch from 87a8750 to 09a89bc Compare March 21, 2024 17:42

torben-hansen previously approved these changes Mar 27, 2024

View reviewed changes

justsmth previously approved these changes Apr 3, 2024

View reviewed changes

billbo-yang force-pushed the macho_parser branch from 613e5ae to a064787 Compare April 10, 2024 17:12

billbo-yang dismissed stale reviews from justsmth and torben-hansen via 900b33b April 10, 2024 17:36

billbo-yang requested review from justsmth and torben-hansen April 10, 2024 19:01

torben-hansen approved these changes Apr 15, 2024

View reviewed changes

justsmth approved these changes Apr 15, 2024

View reviewed changes

Bill Yang added 7 commits April 15, 2024 22:31

add macho parser

2d64aa3

add infra for macho parser tests

0a5d6ee

add macho testing file

b9048b5

get tests working

8ae5e57

more work on tests

12a1afe

get read macho file test working

a6f3890

rework macho free function

bf18e42

Bill Yang added 27 commits April 15, 2024 22:31

fix symbol table not being read correctly sometimes

2df5ec7

add working symbol table test

d27dfa0

refactor string table

166ed73

read index correctly in test

cff1be9

remove more camelcase

2195181

add better failure handling to macho parser

c25ce90

remove more camel case

27bb119

move global variables and defines into main function

4ed681e

remove todos

7bb4180

start convert C tests to C++ tests utilizing google test

093f28d

transfer remaining tests and remove legacy test file

d629c08

clean up code

24dd3d0

add missing copyright

42f007c

add macho parser tests to run_tests target

0adaeda

use calloc ensure correct contents of memcmp data

622c593

address PR comments

70b639a

remove typedef aliases

1d87e45

add section_index counting

2d33602

programmatically find symbol indices in expected string table

1d1a567

avoid using memcpy to assign string to arrays

f507f51

avoid memory leakage in tests

15f8083

correct return value

01cb226

add missing error handling

a564221

use set indices for sections we're looking for

be280e9

remove unnecessary macro

638ff8b

add comment explaining why load_command search works

d9e3234

use size_t where appropriate

a9a1d4c

billbo-yang force-pushed the macho_parser branch from 900b33b to a9a1d4c Compare April 15, 2024 19:31

billbo-yang enabled auto-merge (squash) April 15, 2024 19:32

billbo-yang merged commit 2d45531 into aws:main Apr 15, 2024
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add macho parser for use by C inject_hash #1435

Add macho parser for use by C inject_hash #1435

billbo-yang commented Feb 6, 2024

codecov-commenter commented Feb 6, 2024 •

edited

Loading

justsmth Feb 20, 2024

justsmth Mar 5, 2024

justsmth Feb 20, 2024

billbo-yang Feb 28, 2024

justsmth Mar 5, 2024 •

edited

Loading

justsmth Feb 20, 2024 •

edited

Loading

billbo-yang Feb 27, 2024

justsmth Mar 5, 2024

justsmth Mar 18, 2024 •

edited

Loading

billbo-yang Mar 21, 2024 •

edited

Loading

torben-hansen Mar 27, 2024

torben-hansen Mar 27, 2024

		fseek(file, macho->sections[i].offset, SEEK_SET);
		fread(section_data, 1, macho->sections[i].size, file);

	for (uint32_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) {
	for (size_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) {

Add macho parser for use by C inject_hash #1435

Add macho parser for use by C inject_hash #1435

Conversation

billbo-yang commented Feb 6, 2024

Issues:

Description of changes:

Call-outs:

Testing:

codecov-commenter commented Feb 6, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justsmth Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

justsmth Feb 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justsmth Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

billbo-yang Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 6, 2024 •

edited

Loading

justsmth Mar 5, 2024 •

edited

Loading

justsmth Feb 20, 2024 •

edited

Loading

justsmth Mar 18, 2024 •

edited

Loading

billbo-yang Mar 21, 2024 •

edited

Loading