Skip to content

Commit

Permalink
fix: part with samples more comprehensible, add gif of variants
Browse files Browse the repository at this point in the history
  • Loading branch information
honzajavorek committed Nov 25, 2024
1 parent 017d2ae commit 30a75bd
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ First, let's extract information about the variants. If we go to [Sony XBR-950G

Nice! We can extract the variant names, but we also need to extract the price for each variant. Switching the variants using the buttons shows us that the HTML changes dynamically. This means the page uses JavaScript to display information about the variants.

![Switching variants](images/variants-js.gif)

If we can't find a workaround, we'd need our scraper to run JavaScript. That's not impossible. Scrapers can spin up their own browser instance and automate clicking on buttons, but it's slow and resource-intensive. Ideally, we want to stick to plain HTTP requests and Beautiful Soup as much as possible.

After a bit of detective work, we notice that not far below the `block-swatch-list` there's also a block of HTML with a class `no-js`, which contains all the data!
Expand Down Expand Up @@ -103,7 +105,7 @@ Since Python 3.9, you can use `|` to merge two dictionaries. If the [docs](https

:::

If you run the program, you should see 34 items in total. Some items should have no variant:
If you run the program, you should see 34 items in total. Some items don't have variants, so they won't have a variant name. However, they should still have a price set—our scraper should already have that info from the product listing page.

<!-- eslint-skip -->
```json title=products.json
Expand All @@ -121,7 +123,7 @@ If you run the program, you should see 34 items in total. Some items should have
]
```

Some products where we're missing the actual price should now have several variants:
Some products will break into several items, each with a different variant name. We don't know their exact prices from the product listing, just the min price. In the next step, we should be able to parse the actual price from the variant name for those items.

<!-- eslint-skip -->
```json title=products.json
Expand All @@ -147,7 +149,7 @@ Some products where we're missing the actual price should now have several varia
]
```

However, some products with variants will have the `price` field set. That's because the shop sells all these variants for the same price, so the product listing displays the price as a fixed amount:
Perhaps surprisingly, some products with variants will have the price field set. That's because the shop sells all variants of the product for the same price, so the product listing shows the price as a fixed amount, like _$74.95_, instead of _from $74.95_.

<!-- eslint-skip -->
```json title=products.json
Expand All @@ -167,7 +169,7 @@ However, some products with variants will have the `price` field set. That's bec

## Parsing price

The items now contain the variant as text, which is good for a start, but it would be more useful to set the price in the `price` key. Let's introduce a new function to handle that:
The items now contain the variant as text, which is good for a start, but we want the price to be in the `price` key. Let's introduce a new function to handle that:

```py
def parse_variant(variant):
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 30a75bd

Please sign in to comment.