Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CorpusQuery.xml() returns invalid XML if multiple requests were made #12

Open
AntheSevenants opened this issue Mar 9, 2022 · 0 comments

Comments

@AntheSevenants
Copy link
Contributor

AntheSevenants commented Mar 9, 2022

If a corpus query response is sizeable, another search will be performed starting from the end index of the previous search. However, the xml() method just returns a concatenation of all BlackLab XML responses:

return "\n".join(self._response)

The issue with this is that, essentially, we're combining multiple standalone XML files into one string. Feeding this string into any XML parser will not yield a parse, since there are multiple XML declarations in the document.

Unfortunately, I don't see how the xml() method in itself can be improved. There doesn't seem to be an elegant way to combine the information from multiple responses, but I think returning broken XML isn't a viable option either.

Some other options:

  1. Always return a list of all XML responses, regardless of how many requests were made
  2. Make the xml() method index-based, so it returns the XML response of that index. This implies that there should be a way to find out how many requests were made in the first place.

Of course, I'm just thinking out loud here. A workaround for me currently is to use CorpusQuery._response, which also contains the different requests separately (which is what I need).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant