Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot process caches with unescaped &# in the cache name #236

Open
GeoTime61 opened this issue Feb 12, 2024 · 5 comments
Open

Cannot process caches with unescaped &# in the cache name #236

GeoTime61 opened this issue Feb 12, 2024 · 5 comments

Comments

@GeoTime61
Copy link
Contributor

When I try to process GC25WQJ, name "How Do I Solve All These &#$@! Puzzle Caches?", I get an error:

self.name = cache_details.find(id="ctl00_ContentBody_CacheName").text
                ^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'find'

Is it because the cache name has many punctuation characters in it?
These GC codes also fail: GC8AKHK, GCA9PAE, GC6PJNF, GC1FJJT (archived)

This simple program shows the error:

import pycaching
geocaching = pycaching.login()
cache = geocaching.get_cache("GC25WQJ")
print(cache.name)
geocaching.logout()

It is difficult to search for additional caches for testing because the geocaching.com search filter "Geocache name contains" seems to really mean "Geocache name starts with".

@BelKed
Copy link
Contributor

BelKed commented Feb 12, 2024

The easiest solution I found was to use the lxml parser instead of html.parser.
The working version can be found in my fork :)


I'm not going to open a PR yet, as the parser change is quite groundbreaking and I'd like to hear the maintainer's opinion :)

@FriedrichFroebel
Copy link
Collaborator

I'm not going to open a PR yet, as the parser change is quite groundbreaking and I'd like to hear the maintainer's opinion :)

Do you have some more details how much this actually affects pycaching?

Apart from this, while using the lxml backend might be a solution, I would argue that this is a Groundspeak bug due to insufficient sanitization/escaping of user input: &# should usually prefix some integer and end with a semicolon, which Firefox complains about as well.

<h1 class="visually-hidden">How Do I Solve All These &#$@! Puzzle Caches? Rätsel-Geocaches</h1>

@FriedrichFroebel FriedrichFroebel changed the title Cannot process caches with certain characters in the cache name Cannot process caches with unescaped &# in the cache name Feb 13, 2024
@BelKed
Copy link
Contributor

BelKed commented Feb 14, 2024

Do you have some more details how much this actually affects pycaching?

The tests in CI passed, so I assume the impact of the change is minimal or none. I've also tested it manually and everything seems to be working fine. The biggest change is a new dependency (lxml parser).

Apart from this, while using the lxml backend might be a solution, I would argue that this is a Groundspeak bug due to insufficient sanitization/escaping of user input: &# should usually prefix some integer and end with a semicolon, which Firefox complains about as well.

<h1 class="visually-hidden">How Do I Solve All These &#$@! Puzzle Caches? Rätsel-Geocaches</h1>

Yup, this is definitely a Groundspeak bug, but I don't think they would fix it just because some library.

@GeoTime61
Copy link
Contributor Author

Here's a similar problem, but in the Geocache Description instead of the name: GCR0EF
The cache is archived, so not really much of an issue.
@BelKed - does your lxml change allow this cache to be processed?

@BelKed
Copy link
Contributor

BelKed commented Feb 15, 2024

Yeah, the cache is processed without any errors :)
I've added it to the tests (BelKed@09ed157).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants