Skip to content

Commit

Permalink
refactor: replace fakestore with warehouse-theme-metal.myshopify.com
Browse files Browse the repository at this point in the history
  • Loading branch information
honzajavorek committed Jul 19, 2024
1 parent a9f7e16 commit 34073b6
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 68 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import TabItem from '@theme/TabItem';

---

Now that we know how to execute scripts on a page, we're ready to learn a bit about [data extraction](../../scraping_basics_javascript/data_extraction/index.md). In this lesson, we'll be scraping all the on-sale products from our [Fakestore](https://demo-webstore.apify.org/search/on-sale) website.
Now that we know how to execute scripts on a page, we're ready to learn a bit about [data extraction](../../scraping_basics_javascript/data_extraction/index.md). In this lesson, we'll be scraping all the on-sale products from [warehouse-theme-metal.myshopify.com](https://warehouse-theme-metal.myshopify.com/), a sample Shopify website.

> Most web data extraction cases involve looping through a list of items of some sort.
Expand All @@ -36,7 +36,7 @@ import { chromium } from 'playwright';
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();

await page.goto('https://demo-webstore.apify.org/search/on-sale');
await page.goto('https://warehouse-theme-metal.myshopify.com/collections/sales');

// code will go here

Expand All @@ -54,7 +54,7 @@ import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();

await page.goto('https://demo-webstore.apify.org/search/on-sale');
await page.goto('https://warehouse-theme-metal.myshopify.com/collections/sales');

// code will go here

Expand Down Expand Up @@ -82,16 +82,12 @@ We'll be returning a bunch of product objects from this function, which will be

```js
const products = await page.evaluate(() => {
const productCards = Array.from(document.querySelectorAll('a[class*="ProductCard_root"]'));
const productCards = Array.from(document.querySelectorAll('.product-item'));

return productCards.map((element) => {
const name = element.querySelector('h3[class*="ProductCard_name"]').textContent;
const price = element.querySelector('div[class*="ProductCard_price"]').textContent;

return {
name,
price,
};
const name = element.querySelector('.product-item__title').textContent;
const price = element.querySelector('.price').lastChild.textContent;
return { name, price };
});
});

Expand All @@ -100,7 +96,20 @@ console.log(products);

When we run this code, we see this logged to our console:

![Products logged to the console](./images/log-products.png)
```text
$ node index.js
[
{
name: 'JBL Flip 4 Waterproof Portable Bluetooth Speaker',
price: '$74.95'
},
{
name: 'Sony XBR-950G BRAVIA 4K HDR Ultra HD TV',
price: 'From $1,398.00'
},
...
]
```

## Using jQuery {#using-jquery}

Expand All @@ -118,19 +127,12 @@ Now, since we're able to use jQuery, let's translate our vanilla JavaScript code
await page.addScriptTag({ url: 'https://code.jquery.com/jquery-3.6.0.min.js' });

const products = await page.evaluate(() => {
const productCards = Array.from($('a[class*="ProductCard_root"]'));

return productCards.map((element) => {
const card = $(element);

const name = card.find('h3[class*="ProductCard_name"]').text();
const price = card.find('div[class*="ProductCard_price"]').text();

return {
name,
price,
};
});
return Array.from($('.product-item').map(function () {
const card = $(this);
const name = card.find('.product-item__title').text();
const price = card.find('.price').contents().last().text();
return { name, price };
}));
});

console.log(products);
Expand Down Expand Up @@ -178,7 +180,7 @@ import { load } from 'cheerio';
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();

await page.goto('https://demo-webstore.apify.org/search/on-sale');
await page.goto('https://warehouse-theme-metal.myshopify.com/collections/sales');

const $ = load(await page.content());

Expand All @@ -197,7 +199,7 @@ import { load } from 'cheerio';
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();

await page.goto('https://demo-webstore.apify.org/search/on-sale');
await page.goto('https://warehouse-theme-metal.myshopify.com/collections/sales');

const $ = load(await page.content());

Expand All @@ -214,19 +216,12 @@ Now, to loop through all of the products, we'll make use of the `$` object and l
```js
const $ = load(await page.content());

const productCards = Array.from($('a[class*="ProductCard_root"]'));

const products = productCards.map((element) => {
const card = $(element);

const name = card.find('h3[class*="ProductCard_name"]').text();
const price = card.find('div[class*="ProductCard_price"]').text();

return {
name,
price,
};
});
const products = Array.from($('.product-item').map(function () {
const card = $(this);
const name = card.find('.product-item__title').text();
const price = card.find('.price').contents().last().text();
return { name, price };
}));

console.log(products);
```
Expand All @@ -245,23 +240,16 @@ import { load } from 'cheerio';
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();

await page.goto('https://demo-webstore.apify.org/search/on-sale');
await page.goto('https://warehouse-theme-metal.myshopify.com/collections/sales');

const $ = load(await page.content());

const productCards = Array.from($('a[class*="ProductCard_root"]'));

const products = productCards.map((element) => {
const card = $(element);

const name = card.find('h3[class*="ProductCard_name"]').text();
const price = card.find('div[class*="ProductCard_price"]').text();

return {
name,
price,
};
});
const products = Array.from($('.product-item').map(function () {
const card = $(this);
const name = card.find('.product-item__title').text();
const price = card.find('.price').contents().last().text();
return { name, price };
}));

console.log(products);

Expand All @@ -278,23 +266,16 @@ import { load } from 'cheerio';
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();

await page.goto('https://demo-webstore.apify.org/search/on-sale');
await page.goto('https://warehouse-theme-metal.myshopify.com/collections/sales');

const $ = load(await page.content());

const productCards = Array.from($('a[class*="ProductCard_root"]'));

const products = productCards.map((element) => {
const card = $(element);

const name = card.find('h3[class*="ProductCard_name"]').text();
const price = card.find('div[class*="ProductCard_price"]').text();

return {
name,
price,
};
});
const products = Array.from($('.product-item').map(function () {
const card = $(this);
const name = card.find('.product-item__title').text();
const price = card.find('.price').contents().last().text();
return { name, price };
}));

console.log(products);

Expand Down
Binary file not shown.

0 comments on commit 34073b6

Please sign in to comment.