Fix incorrect types passed to Py_BuildValue and PyArg_ParseTupleAndKeywords #193

QuLogic · 2024-10-19T01:30:32Z

Always using i is broken, as that is 4 bytes, but Py_ssize_t and Py_uintptr_t are 8 bytes.

On little endian machines, this is usually fine, since the first 4 bytes (of 4-byte or 8-byte integers) will be the same, but on big-endian machines, a 4-byte integer corresponds with the last 4 bytes of the 8-byte integer

For small numbers on big-endian machines, it is thus likely that the result will be 0. If the value is over 32 bits, then the result will be wrong on little-endian machines as well.

Fixes #142

In all cases, these calls told Python it was getting ints (`i`), which are 4 bytes, but several cases passed either `Py_ssize_t` or `Py_uintptr_t`, which are 8 bytes. On little-endian machines, this will work, as the bytes align, but on big-endian machine, Python will most probably see 0, since the top bytes are more likely to be unfilled unless values are very large.

For a similar reason as `Py_BuildValue`, telling Python to use a smaller sized value than actually passed may cause strange values to end up in the result.

QuLogic · 2024-10-19T03:01:05Z

Here's a downstream build with this PR applied that works on all (little- and big-endian) architectures: https://src.fedoraproject.org/rpms/python-pyahocorasick/pull-request/3

QuLogic · 2024-10-27T10:26:35Z

There seems to be one more oddity at:

pyahocorasick/src/AutomatonItemsIter.c

Lines 254 to 262 in 88256f5

    
           #ifdef PY3K 
        
               #ifdef AHOCORASICK_UNICODE 
        
                                           "(u#O)", /*key*/ iter->buffer + 1, depth, 
        
               #else 
        
                                           "(y#O)", /*key*/ iter->buffer + 1, depth, 
        
               #endif 
        
           #else 
        
                                           "(s#O)", /*key*/ iter->char_buffer + 1, depth, 
        
           #endif

vs:

pyahocorasick/src/common.h

Lines 49 to 66 in 88256f5

    
           // setup supported character set 
        
           #ifdef AHOCORASICK_UNICODE 
        
           #       if defined PEP393_UNICODE || defined Py_UNICODE_WIDE 
        
                   // Either Python uses UCS-4 or we don't know what Python uses, 
        
                   // but we use UCS-4 
        
           #       define TRIE_LETTER_TYPE uint32_t 
        
           #       define TRIE_LETTER_SIZE 4 
        
           #   else 
        
                   // Python use UCS-2 
        
           #       define TRIE_LETTER_TYPE uint16_t 
        
           #       define TRIE_LETTER_SIZE 2 
        
           #       define VARIABLE_LEN_CHARCODES 1 
        
           #   endif 
        
           #else 
        
               // only bytes are supported 
        
           #   define TRIE_LETTER_TYPE uint16_t 
        
           #   define TRIE_LETTER_SIZE 2 
        
           #endif

where if AHOCORASICK_UNICODE is not defined, it states that it is const char * to Python, but the TRIE_LETTER_TYPE is uint16_t. I'm not sure if it's the former or latter that is wrong.

QuLogic added 2 commits October 18, 2024 21:51

Fix incorrect types passed to PyArg_ParseTupleAndKeywords

4875abb

For a similar reason as `Py_BuildValue`, telling Python to use a smaller sized value than actually passed may cause strange values to end up in the result.

QuLogic force-pushed the build-types branch from 00182cd to 4875abb Compare October 19, 2024 01:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect types passed to Py_BuildValue and PyArg_ParseTupleAndKeywords #193

Fix incorrect types passed to Py_BuildValue and PyArg_ParseTupleAndKeywords #193

QuLogic commented Oct 19, 2024

QuLogic commented Oct 19, 2024

QuLogic commented Oct 27, 2024

Fix incorrect types passed to Py_BuildValue and PyArg_ParseTupleAndKeywords #193

Are you sure you want to change the base?

Fix incorrect types passed to Py_BuildValue and PyArg_ParseTupleAndKeywords #193

Conversation

QuLogic commented Oct 19, 2024

QuLogic commented Oct 19, 2024

QuLogic commented Oct 27, 2024