Using only one browser for web scraping #1

AndyS1mpson · 2022-07-28T14:45:57Z

Hi! Thank you for your repository, but I have one question regarding the implementation. You use browser_type.launch(proxy=proxy) in every task. If I understand correctly, it turns out that a new browser starts in each new task of celery

I'm trying to optimize the parsing process. To do this, I transfer the browser to a separate docker container with the
npx playwright run-server command and connect to it via a web socket.

But I have a feeling that I'm doing something wrong, since the load has not decreased and the parsing speed has not increased. Do you know anything about this and can you help?

Thank you in advance

The text was updated successfully, but these errors were encountered:

AnderRV · 2022-07-29T10:04:56Z

Hi Andy,
As you say, we launch a browser for each request, which is an overkill in a real-world use case.

If you have a browser in a different container, you could leave it running and create only new pages/contexts per each request you need. And set the proxy in each of those. This approach will incur in some networking overhead, but maybe less than launching new browsers each request.

This approach needs close monitoring of the container and the used memory, since browsers can leak. And running them for long periods of time can lead to performance decay over time.

To improve speed (if this approach does not work), you can always block resources and save time/bandwidth.

Regards

AndyS1mpson changed the title ~~Using one browser for web scraping~~ Using only one browser for web scraping Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using only one browser for web scraping #1

Using only one browser for web scraping #1

AndyS1mpson commented Jul 28, 2022 •

edited

Loading

AnderRV commented Jul 29, 2022

Using only one browser for web scraping #1

Using only one browser for web scraping #1

Comments

AndyS1mpson commented Jul 28, 2022 • edited Loading

AnderRV commented Jul 29, 2022

AndyS1mpson commented Jul 28, 2022 •

edited

Loading