Cannot process caches with unescaped `&#` in the cache name #236

GeoTime61 · 2024-02-12T22:10:46Z

When I try to process GC25WQJ, name "How Do I Solve All These &#$@! Puzzle Caches?", I get an error:

self.name = cache_details.find(id="ctl00_ContentBody_CacheName").text
                ^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'find'

Is it because the cache name has many punctuation characters in it?
These GC codes also fail: GC8AKHK, GCA9PAE, GC6PJNF, GC1FJJT (archived)

This simple program shows the error:

import pycaching
geocaching = pycaching.login()
cache = geocaching.get_cache("GC25WQJ")
print(cache.name)
geocaching.logout()

It is difficult to search for additional caches for testing because the geocaching.com search filter "Geocache name contains" seems to really mean "Geocache name starts with".

The text was updated successfully, but these errors were encountered:

BelKed · 2024-02-12T23:32:07Z

The easiest solution I found was to use the lxml parser instead of html.parser.
The working version can be found in my fork :)

I'm not going to open a PR yet, as the parser change is quite groundbreaking and I'd like to hear the maintainer's opinion :)

FriedrichFroebel · 2024-02-13T16:09:47Z

I'm not going to open a PR yet, as the parser change is quite groundbreaking and I'd like to hear the maintainer's opinion :)

Do you have some more details how much this actually affects pycaching?

Apart from this, while using the lxml backend might be a solution, I would argue that this is a Groundspeak bug due to insufficient sanitization/escaping of user input: &# should usually prefix some integer and end with a semicolon, which Firefox complains about as well.

<h1 class="visually-hidden">How Do I Solve All These &#$@! Puzzle Caches? Rätsel-Geocaches</h1>

BelKed · 2024-02-14T04:56:11Z

Do you have some more details how much this actually affects pycaching?

The tests in CI passed, so I assume the impact of the change is minimal or none. I've also tested it manually and everything seems to be working fine. The biggest change is a new dependency (lxml parser).

Apart from this, while using the lxml backend might be a solution, I would argue that this is a Groundspeak bug due to insufficient sanitization/escaping of user input: &# should usually prefix some integer and end with a semicolon, which Firefox complains about as well.
<h1 class="visually-hidden">How Do I Solve All These &#$@! Puzzle Caches? Rätsel-Geocaches</h1>

Yup, this is definitely a Groundspeak bug, but I don't think they would fix it just because some library.

GeoTime61 · 2024-02-14T16:57:48Z

Here's a similar problem, but in the Geocache Description instead of the name: GCR0EF
The cache is archived, so not really much of an issue.
@BelKed - does your lxml change allow this cache to be processed?

BelKed · 2024-02-15T16:53:19Z

Yeah, the cache is processed without any errors :)
I've added it to the tests (BelKed@09ed157).

FriedrichFroebel added the bug report label Feb 13, 2024

FriedrichFroebel changed the title ~~Cannot process caches with certain characters in the cache name~~ Cannot process caches with unescaped &# in the cache name Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot process caches with unescaped `&#` in the cache name #236

Cannot process caches with unescaped `&#` in the cache name #236

GeoTime61 commented Feb 12, 2024

BelKed commented Feb 12, 2024

FriedrichFroebel commented Feb 13, 2024

BelKed commented Feb 14, 2024

GeoTime61 commented Feb 14, 2024

BelKed commented Feb 15, 2024

Cannot process caches with unescaped &# in the cache name #236

Cannot process caches with unescaped &# in the cache name #236

Comments

GeoTime61 commented Feb 12, 2024

BelKed commented Feb 12, 2024

FriedrichFroebel commented Feb 13, 2024

BelKed commented Feb 14, 2024

GeoTime61 commented Feb 14, 2024

BelKed commented Feb 15, 2024

Cannot process caches with unescaped `&#` in the cache name #236

Cannot process caches with unescaped `&#` in the cache name #236