Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

offsetof parsing fails due to TYPEID as offsetof_member_designator #504

Open
nxmaintainer opened this issue Apr 30, 2023 · 1 comment
Open

Comments

@nxmaintainer
Copy link

I'm parsing cpython/Object/exceptions.c with pycparser==2.21 (pypi), preprocessed (exceptions.i) with cpp -nostdinc -E -P -DPy_BUILD_CORE=1 -D_POSIX_THREADS=1 + standard includes and fake_libc_include. Nothing special or tricky.

Fails in this block:

static PyMemberDef UnicodeError_members[] = {
{"encoding", 6, offsetof(PyUnicodeErrorObject, encoding), 0,       // <- parsed correctly
"exception encoding"},
{"object", 6, offsetof(PyUnicodeErrorObject, object), 0,           // <- fails
"exception object"},
{"start", 19, offsetof(PyUnicodeErrorObject, start), 0,
"exception start"},
{"end", 19, offsetof(PyUnicodeErrorObject, end), 0,
"exception end"},
{"reason", 6, offsetof(PyUnicodeErrorObject, reason), 0,
"exception reason"},
{0}
};

and particularly on
offsetof(PyUnicodeErrorObject, object) with pycparser.plyparser.ParseError: :9792:46: before: object

Works perfectly fine if I replace object field name with anything else, or replace the offsetof function. There's a difference in parsing the first offsetof in this block (encoding field) and the next one (object field) according to the debug mode.

For `encoding`, take a look at `LexToken(ID,'encoding',9790,365338)` closer to the end:
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA . LexToken(OFFSETOF,'offsetof',9790,365307)
Action : Reduce rule [empty -> <empty>] with [] and goto state 533
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 533
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA empty . LexToken(OFFSETOF,'offsetof',9790,365307)
Action : Reduce rule [designation_opt -> empty] with [None] and goto state 532
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 532
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt . LexToken(OFFSETOF,'offsetof',9790,365307)
Action : Shift and goto state 165
State  : 165
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF . LexToken(LPAREN,'(',9790,365315)
Action : Shift and goto state 304
State  : 304
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN . LexToken(TYPEID,'PyUnicodeErrorObject',9790,365316)
Action : Shift and goto state 35
State  : 35
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN TYPEID . LexToken(COMMA,',',9790,365336)
Action : Reduce rule [typedef_name -> TYPEID] with [<str @ 0x7f13b5a11970>] and goto state 31
Result : <IdentifierType @ 0x7f13b5a14f50> (IdentifierType(names=['PyUnicodeErrorObj ...)
State  : 31
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN typedef_name . LexToken(COMMA,',',9790,365336)
Action : Reduce rule [type_specifier -> typedef_name] with [<IdentifierType @ 0x7f13b5a14f50>] and goto state 212
Result : <IdentifierType @ 0x7f13b5a14f50> (IdentifierType(names=['PyUnicodeErrorObj ...)
State  : 212
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_specifier . LexToken(COMMA,',',9790,365336)
Action : Reduce rule [specifier_qualifier_list -> type_specifier] with [<IdentifierType @ 0x7f13b5a14f50>] and goto state 216
Result : <dict @ 0x7f13b5a11700> ({'qual': [], 'storage': [], 'type': [Ide ...)
State  : 216
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN specifier_qualifier_list . LexToken(COMMA,',',9790,365336)
Action : Reduce rule [empty -> <empty>] with [] and goto state 320
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 320
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN specifier_qualifier_list empty . LexToken(COMMA,',',9790,365336)
Action : Reduce rule [abstract_declarator_opt -> empty] with [None] and goto state 350
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 350
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN specifier_qualifier_list abstract_declarator_opt . LexToken(COMMA,',',9790,365336)
Action : Reduce rule [type_name -> specifier_qualifier_list abstract_declarator_opt] with [<dict @ 0x7f13b5a11700>,None] and goto state 438
Result : <Typename @ 0x7f13b5e9fcb0> (Typename(name=None,quals=[],align=None,t ...)
State  : 438
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name . LexToken(COMMA,',',9790,365336)
Action : Shift and goto state 507
State  : 507
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA . LexToken(ID,'encoding',9790,365338)
Action : Shift and goto state 159
State  : 159
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA ID . LexToken(RPAREN,')',9790,365346)
Action : Reduce rule [identifier -> ID] with ['encoding'] and goto state 541
Result : <ID @ 0x7f13b5a14ff0> (ID(name='encoding'))
State  : 541
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA identifier . LexToken(RPAREN,')',9790,365346)
Action : Reduce rule [offsetof_member_designator -> identifier] with [<ID @ 0x7f13b5a14ff0>] and goto state 540
Result : <ID @ 0x7f13b5a14ff0> (ID(name='encoding'))
State  : 540
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA offsetof_member_designator . LexToken(RPAREN,')',9790,365346)
Action : Shift and goto state 563
State  : 563
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA offsetof_member_designator RPAREN . LexToken(COMMA,',',9790,365347)
Action : Reduce rule [primary_expression -> OFFSETOF LPAREN type_name COMMA offsetof_member_designator RPAREN] with ['offsetof','(',<Typename @ 0x7f13b5e9fcb0>,',',<ID @ 0x7f13b5a14ff0>,')'] and goto state 158
Result : <FuncCall @ 0x7f13b5a06ed0> (FuncCall(name=ID(name='offsetof'),args=E ...)
State  : 158
For `object`, take a look at `LexToken(TYPEID,'object',9792,365420)` in the same position where `encoding` has `ID` instead:
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA . LexToken(OFFSETOF,'offsetof',9792,365389)
Action : Reduce rule [empty -> <empty>] with [] and goto state 533
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 533
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA empty . LexToken(OFFSETOF,'offsetof',9792,365389)
Action : Reduce rule [designation_opt -> empty] with [None] and goto state 532
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 532
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt . LexToken(OFFSETOF,'offsetof',9792,365389)
Action : Shift and goto state 165
State  : 165
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF . LexToken(LPAREN,'(',9792,365397)
Action : Shift and goto state 304
State  : 304
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN . LexToken(TYPEID,'PyUnicodeErrorObject',9792,365398)
Action : Shift and goto state 35
State  : 35
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN TYPEID . LexToken(COMMA,',',9792,365418)
Action : Reduce rule [typedef_name -> TYPEID] with [<str @ 0x7f13b5a116f0>] and goto state 31
Result : <IdentifierType @ 0x7f13b5a153b0> (IdentifierType(names=['PyUnicodeErrorObj ...)
State  : 31
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN typedef_name . LexToken(COMMA,',',9792,365418)
Action : Reduce rule [type_specifier -> typedef_name] with [<IdentifierType @ 0x7f13b5a153b0>] and goto state 212
Result : <IdentifierType @ 0x7f13b5a153b0> (IdentifierType(names=['PyUnicodeErrorObj ...)
State  : 212
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_specifier . LexToken(COMMA,',',9792,365418)
Action : Reduce rule [specifier_qualifier_list -> type_specifier] with [<IdentifierType @ 0x7f13b5a153b0>] and goto state 216
Result : <dict @ 0x7f13b5a11280> ({'qual': [], 'storage': [], 'type': [Ide ...)
State  : 216
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN specifier_qualifier_list . LexToken(COMMA,',',9792,365418)
Action : Reduce rule [empty -> <empty>] with [] and goto state 320
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 320
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN specifier_qualifier_list empty . LexToken(COMMA,',',9792,365418)
Action : Reduce rule [abstract_declarator_opt -> empty] with [None] and goto state 350
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 350
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN specifier_qualifier_list abstract_declarator_opt . LexToken(COMMA,',',9792,365418)
Action : Reduce rule [type_name -> specifier_qualifier_list abstract_declarator_opt] with [<dict @ 0x7f13b5a11280>,None] and goto state 438
Result : <Typename @ 0x7f13b5e9fa10> (Typename(name=None,quals=[],align=None,t ...)
State  : 438
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name . LexToken(COMMA,',',9792,365418)
Action : Shift and goto state 507
State  : 507
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA . LexToken(TYPEID,'object',9792,365420)
ERROR: Error  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA . 

I approximately understand the issue, object is being interpreted as TYPEID for some reason (I've checked, and didn't find object type being defined/declared in the preprocessed file), so it doesn't fit offsetof_member_designator rule (which requires identifier, which is ID) and fails the primary OFFSETOF expression. I even have a dirty fix, like this:

    def p_offsetof_identifier(self, p):
        """ offsetof_identifier  : ID
                                   | TYPEID
        """
        p[0] = c_ast.ID(p[1], self._token_coord(p, 1))

    def p_offsetof_member_designator(self, p):
        """ offsetof_member_designator : offsetof_identifier
                                         | offsetof_member_designator PERIOD offsetof_identifier
                                         | offsetof_member_designator LBRACKET expression RBRACKET
        """
        if len(p) == 2:
            p[0] = p[1]
        ...

But I don't think this is a correct approach, and looks like the issue is deeper (object initially shouldn't be TYPEID in this context, no?). @eliben / @Ksero I'd really appreciate if you can point me to a better solution, I'd be happy to contribute.

P.S. Please, use exceptions.i for tests, I've tried to make smaller reproducible sample, it just works fine.

@eliben
Copy link
Owner

eliben commented May 5, 2023

Thanks for the detailed report.

To help further narrow down the issue you can insert a printout (or a stack trace) where CParser adds object to the type map (from which point on it considers it a TYPEID) - this can tell us why it thinks it's a pre-declared type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants