Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accurate search #73

Open
wants to merge 2 commits into
base: DEV
Choose a base branch
from

Conversation

somewordstoolate
Copy link

  1. Background and Problem
    When using the query "Haagsma C, van Riel P, de Jong A, van de Putte L. Combination of sulphasalazine and methotrexate versus the single components in early rheumatoid arthritis: a randomized, controlled, double-blind, 52 week clinical trial. British Journal of Rheumatology. 1997;36(10):1082.", the PDF could not be downloaded even though an accurate search result is available on Google Scholar.

Through debugging, it was discovered that when performing an accurate search (e.g., using the paper title), and Google Scholar returns only one search result, the div's class_ attribute value should be gs_r gs_or gs_scl gs_fmar.

  1. Modifications
    Updated the HTMLparsers.scholarParser function name (previously named schoolarParser, corrected a spelling error from "schoolar" to "scholar") and modified its soup.findAll logic to correctly identify the div element with the specific class_ attribute when there is only one search result.

@goghvan1113
Copy link

The newest branch v1.4.1 modifications for accurate search: (in HTMLparsers.py)
replacing
for element in soup.findAll("div", class_="gs_r gs_or gs_scl"):
with

for element in soup.findAll(
        "div", class_=["gs_r gs_or gs_scl", "gs_r gs_or gs_scl gs_fmar"]
    ):  # "gs_r gs_or gs_scl gs_fmar" for only one search result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants