Question: Unable to fetch HTML from whosampled.com using reqwests. #1608
-
Hey there! I have a really basic reqwest GET call to whosampled.com, I'm expecting to get back the html document back. I'm using actix for my server. #[get("/example")]
pub async fn example() -> impl Responder {
let client_builder = reqwest::ClientBuilder::new();
let client = client_builder
.user_agent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36")
.cookie_store(true)
.build().unwrap();
let result = client.get("https://www.whosampled.com/Kanye-West/Bound-2/")
.send()
.await.unwrap().text().await.unwrap();
return HttpResponse::Ok().json(&result)
} But what I end up getting back isn't the document, it's a cloud flare page asking security be verified. <h1 class="zone-name-title h1">
<img
class="heading-favicon"
src="/favicon.ico"
onerror="this.onerror=null;this.parentNode.removeChild(this)"
/>
www.whosampled.com
</h1>
<h2 class="h2" id="cf-challenge-running">
Checking if the site connection is secure
</h2>
<noscript>
<div id="cf-challenge-error-title">
<div class="h2">
<span class="icon-wrapper">
<div class="heading-icon warning-icon"></div>
</span>
<span id="cf-challenge-error-text">
Enable JavaScript and cookies to continue
</span>
</div>
</div>
</noscript> What I find interesting is that, I can make the same get call in Python and get the correct result back without any problem. I'm stumped on what the gap is. # saving on lines, the header is exactly the same as in the rust function
requests.get("https://www.whosampled.com/Kanye-West/Bound-2/"), headers=headers) I don't think it's necessarily a problem with reqwests but probably a configuration item I'm missing with my client. The whole goal was to scrape the page to get sample info. Anyone have any experience with this? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I don't know anything about that site, but they must be looking for certain specific headers, is my guess. I noticed in Firefox, it was using HTTP/2. So just to try it, I took the reqwest simple example and adjusted it to use rustls-tls instead of default-tls, which advertises h2 over ALPN, and it seemed to work. 🤷 |
Beta Was this translation helpful? Give feedback.
I don't know anything about that site, but they must be looking for certain specific headers, is my guess. I noticed in Firefox, it was using HTTP/2. So just to try it, I took the reqwest simple example and adjusted it to use rustls-tls instead of default-tls, which advertises h2 over ALPN, and it seemed to work. 🤷