CorpusQuery.xml() returns invalid XML if multiple requests were made #12

AntheSevenants · 2022-03-09T14:55:19Z

If a corpus query response is sizeable, another search will be performed starting from the end index of the previous search. However, the xml() method just returns a concatenation of all BlackLab XML responses:

chaining-search/chaininglib/search/CorpusQuery.py

Line 278 in ff005f0

return "\n".join(self._response)

The issue with this is that, essentially, we're combining multiple standalone XML files into one string. Feeding this string into any XML parser will not yield a parse, since there are multiple XML declarations in the document.

Unfortunately, I don't see how the xml() method in itself can be improved. There doesn't seem to be an elegant way to combine the information from multiple responses, but I think returning broken XML isn't a viable option either.

Some other options:

Always return a list of all XML responses, regardless of how many requests were made
Make the xml() method index-based, so it returns the XML response of that index. This implies that there should be a way to find out how many requests were made in the first place.

Of course, I'm just thinking out loud here. A workaround for me currently is to use CorpusQuery._response, which also contains the different requests separately (which is what I need).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CorpusQuery.xml() returns invalid XML if multiple requests were made #12

CorpusQuery.xml() returns invalid XML if multiple requests were made #12

AntheSevenants commented Mar 9, 2022 •

edited

Loading

CorpusQuery.xml() returns invalid XML if multiple requests were made #12

CorpusQuery.xml() returns invalid XML if multiple requests were made #12

Comments

AntheSevenants commented Mar 9, 2022 • edited Loading

AntheSevenants commented Mar 9, 2022 •

edited

Loading