Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requests coming from your computer network which appear to be in violation of the Google Terms of Service? #17

Open
solyarisoftware opened this issue Nov 18, 2022 · 1 comment

Comments

@solyarisoftware
Copy link

solyarisoftware commented Nov 18, 2022

Hi,
interesting project, thanks!

But just submitting a SINGLE request from my VM

$ py
>>> import people_also_ask
>>> people_also_ask.get_related_questions("caffè")

Google reject immediately the request:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/g/.local/lib/python3.8/site-packages/people_also_ask/google.py", line 68, in get_related_questions
    return _get_related_questions(text)
  File "/home/g/.local/lib/python3.8/site-packages/people_also_ask/google.py", line 34, in _get_related_questions
    document = search(text)
  File "/home/g/.local/lib/python3.8/site-packages/people_also_ask/google.py", line 23, in search
    response = get(URL, params=params)
  File "/home/g/.local/lib/python3.8/site-packages/people_also_ask/tools.py", line 30, in wrapper
    return func(*args, **kwargs)
  File "/home/g/.local/lib/python3.8/site-packages/people_also_ask/request/session.py", line 97, in get
    raise RequestError(
people_also_ask.exceptions.RequestError: ('https://www.google.com/search', {'q': 'caffè', 'gl': 'us'}, {}, '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">\n<html>\n<head><meta http-equiv="content-type" content="text/html; charset=utf-8"><meta name="viewport" content="initial-scale=1"><title>https://www.google.com/search?q=caff%C3%A8&amp;gl=us</title></head>\n<body style="font-family: arial, sans-serif; background-color: #fff; color: #000; padding:20px; font-size:18px;" onload="e=document.getElementById(\'captcha\');if(e){e.focus();} if(solveSimpleChallenge) {solveSimpleChallenge(,);}">\n<div style="max-width:400px;">\n<hr noshade size="1" style="color:#ccc; background-color:#ccc;"><br>\n<form id="captcha-form" action="index" method="post">\n<noscript>\n<div style="font-size:13px;">\n  In order to continue, please enable javascript on your web browser.\n</div>\n</noscript>\n<script src="https://www.google.com/recaptcha/api.js" async defer></script>\n<script>var submitCallback = function(response) {document.getElementById(\'captcha-form\').submit();};</script>\n<div id="recaptcha" class="g-recaptcha" data-sitekey="6LfwuyUTAAAAAOAmoS0fdqijC2PbbdH4kjq62Y1b" data-callback="submitCallback" data-s="1pJeuENGgR-f2ddVZoUSb5TUjwf7rmOYEjYy_TuRtMXV2fNFVbdq2HOvPMevjd_kFx1Lond3VajTgDu1y9Kiy50FeuMcNThhA7GI3kXF9yVwQwqUg9Au8lPD-Vd9gPrLdCwU-n07YXTSvpTr5ay62PFN9DWDbgeG0J8olpH569erT_hqjW18qYMgDpHBy_83jL0mG0pyiJH0FbA4B6MgKGVRi_yqhN5Ovt9qGUZMfK3XWLJqsWl2tpXxrmI1dIZY8atANhIdo5Pk50z2OGOLmLbbhEU-404"></div>\n\n<input type=\'hidden\' name=\'q\' value=\'EgQz_nshGPzJ3ZsGIjBzVInVJEEZtOA3ZI_Wp-HjlzA55jr-KrDYNOGYctMYOaX5gPBm3UbuukWmOwLdeUQyAXI\'><input type="hidden" name="continue" value="https://www.google.com/search?q=caff%C3%A8&amp;gl=us">\n</form>\n<hr noshade size="1" style="color:#ccc; background-color:#ccc;">\n\n<div style="font-size:13px;">\n<b>About this page</b><br><br>\n\nOur systems have detected unusual traffic from your computer network.  This page checks to see if it&#39;s really you sending the requests, and not a robot.  <a href="#" onclick="document.getElementById(\'infoDiv\').style.display=\'block\';">Why did this happen?</a><br><br>\n\n<div id="infoDiv" style="display:none; background-color:#eee; padding:10px; margin:0 0 15px 0; line-height:1.4em;">\nThis page appears when Google automatically detects requests coming from your computer network which appear to be in violation of the <a href="//www.google.com/policies/terms/">Terms of Service</a>. The block will expire shortly after those requests stop.  In the meantime, solving the above CAPTCHA will let you continue to use our services.<br><br>This traffic may have been sent by malicious software, a browser plug-in, or a script that sends automated requests.  If you share your network connection, ask your administrator for help &mdash; a different computer using the same IP address may be responsible.  <a href="//support.google.com/websearch/answer/86640">Learn more</a><br><br>Sometimes you may be asked to solve the CAPTCHA if you are using advanced terms that robots are known to use, or sending requests very quickly.\n</div>\n\nIP address: xxx.yyy.zzz.www<br>Time: 2022-11-18T10:57:00Z<br>URL: https://www.google.com/search?q=caff%C3%A8&amp;gl=us<br>\n</div>\n</div>\n</body>\n</html>\n')

There is any workaround?
Thanks

@charliemday
Copy link

Hi 👋 are you making repeated requests (could be that it's triggered Google's reCAPTCHA and you might need to just wait a couple of minutes/or hours before you can request more from that machine)?

Alternatively, it looks like Google is detecting your IP Address for your VM is xxx.yyy.zzz.www - did you intentionally obscure this for posting on this repo or was that unchanged? If not, you might need to apply an IP address to your VM (this is purely a guess btw because if I'm Google I'd want to know where any request is originating from)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants