You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Thank you for your repository, but I have one question regarding the implementation. You use browser_type.launch(proxy=proxy) in every task. If I understand correctly, it turns out that a new browser starts in each new task of celery
I'm trying to optimize the parsing process. To do this, I transfer the browser to a separate docker container with the npx playwright run-server command and connect to it via a web socket.
But I have a feeling that I'm doing something wrong, since the load has not decreased and the parsing speed has not increased. Do you know anything about this and can you help?
Thank you in advance
The text was updated successfully, but these errors were encountered:
AndyS1mpson
changed the title
Using one browser for web scraping
Using only one browser for web scraping
Jul 28, 2022
Hi Andy,
As you say, we launch a browser for each request, which is an overkill in a real-world use case.
If you have a browser in a different container, you could leave it running and create only new pages/contexts per each request you need. And set the proxy in each of those. This approach will incur in some networking overhead, but maybe less than launching new browsers each request.
This approach needs close monitoring of the container and the used memory, since browsers can leak. And running them for long periods of time can lead to performance decay over time.
To improve speed (if this approach does not work), you can always block resources and save time/bandwidth.
Hi! Thank you for your repository, but I have one question regarding the implementation. You use
browser_type.launch(proxy=proxy)
in every task. If I understand correctly, it turns out that a new browser starts in each new task of celeryI'm trying to optimize the parsing process. To do this, I transfer the browser to a separate docker container with the
npx playwright run-server
command and connect to it via a web socket.But I have a feeling that I'm doing something wrong, since the load has not decreased and the parsing speed has not increased. Do you know anything about this and can you help?
Thank you in advance
The text was updated successfully, but these errors were encountered: