chore: throw on build with broken anchors (#1190)

Docusaurus v3 adds the [`onBrokenAnchors`](https://docusaurus.io/docs/api/docusaurus-config#onBrokenAnchors) setting that allows us to fail the build when Docusaurus finds broken anchor links (internal fragment links). Closes #952 --------- Co-authored-by: Michał Olender <[email protected]>
apify · Oct 8, 2024 · ec5b323 · ec5b323
1 parent eb7a498
commit ec5b323
Show file tree

Hide file tree

Showing 29 changed files with 134 additions and 138 deletions.
diff --git a/docusaurus.config.js b/docusaurus.config.js
@@ -51,6 +51,8 @@ module.exports = {
     /** @type {import('@docusaurus/types').ReportingSeverity} */ ('throw'),
     onBrokenMarkdownLinks:
     /** @type {import('@docusaurus/types').ReportingSeverity} */ ('throw'),
+    onBrokenAnchors:
+    /** @type {import('@docusaurus/types').ReportingSeverity} */ ('warn'),
     themes: [
         [
             require.resolve('./apify-docs-theme'),

diff --git a/sources/academy/platform/deploying_your_code/input_schema.md b/sources/academy/platform/deploying_your_code/input_schema.md
@@ -53,7 +53,7 @@ Each property's key corresponds to the name we're expecting within our code, whi
 
 ## Property types & editor types {#property-types}
 
-Within our new **numbers** property, there are two more fields we must specify. Firstly, we must let the platform know that we're expecting an array of numbers with the **type** field. Then, we should also instruct Apify on which UI component to render for this input property. In our case, we have an array of numbers, which means we should use the **json** editor type that we discovered in the ["array" section](/platform/actors/development/actor-definition/input-schema#array) of the input schema documentation. We could also use **stringList**, but then we'd have to parse out the numbers from the strings.
+Within our new **numbers** property, there are two more fields we must specify. Firstly, we must let the platform know that we're expecting an array of numbers with the **type** field. Then, we should also instruct Apify on which UI component to render for this input property. In our case, we have an array of numbers, which means we should use the **json** editor type that we discovered in the ["array" section](/platform/actors/development/actor-definition/input-schema/specification/v1#array) of the input schema documentation. We could also use **stringList**, but then we'd have to parse out the numbers from the strings.
 
 ```json
 {

diff --git a/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md b/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md
@@ -20,7 +20,7 @@ You might have already noticed that we've been using the **RESIDENTIAL** proxy g
 ## Learning 🧠 {#learning}
 
 - Skim [this page](https://apify.com/proxy) for a general idea of Apify Proxy.
-- Give the [proxy documentation](/platform/proxy#our-proxies) a solid readover (feel free to skip most of the examples).
+- Give the [proxy documentation](/platform/proxy) a solid readover (feel free to skip most of the examples).
 - Check out the [anti-scraping guide](../../webscraping/anti_scraping/index.md).
 - Gain a solid understanding of the [SessionPool](https://crawlee.dev/api/core/class/SessionPool).
 - Look at a few Actors on the [Apify store](https://apify.com/store). How are they utilizing proxies?

diff --git a/...es/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md b/...es/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md
@@ -231,7 +231,7 @@ That's everything! Now, even if the Actor migrates (or is gracefully aborted and
 
 **A:** It's not best to use this option by default. If it fails, there must be a reason, which would need to be thought through first - meaning that the edge case of failing should be handled when resurrecting the Actor. The state should be persisted beforehand.
 
-**Q: Migrations happen randomly, but by [aborting gracefully](/platform/actors/running#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted Actor's run?**
+**Q: Migrations happen randomly, but by [aborting gracefully](/platform/actors/running/runs-and-builds#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted Actor's run?**
 
 **A:** After aborting or throwing an error mid-process, it manages to start back from where it was upon resurrection.
 

diff --git a/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md b/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md
@@ -24,7 +24,7 @@ Storage allows us to save persistent data for further processing. As you'll lear
 ## Learning 🧠 {#learning}
 
 - Check out [the docs about Actor tasks](/platform/actors/running/tasks).
-- Read about the [two main storage options](/platform/storage#dataset) on the Apify platform.
+- Read about the [two main storage options](/platform/storage/dataset) on the Apify platform.
 - Understand the [crucial differences between named and unnamed storages](/platform/storage/usage#named-and-unnamed-storages).
 - Learn about the [`Dataset`](/sdk/js/reference/class/Dataset) and [`KeyValueStore`](/sdk/js/reference/class/KeyValueStore) objects in the Apify SDK.
 

diff --git a/sources/academy/platform/getting_started/inputs_outputs.md b/sources/academy/platform/getting_started/inputs_outputs.md
@@ -65,7 +65,7 @@ Then, replace everything in **INPUT_SCHEMA.json**  with this:
 }
 ```
 
-> If you're interested in learning more about how the code works, and what the **INPUT_SCHEMA.json** means, read about [inputs](/sdk/js/docs/examples/accept-user-input) and [adding data to a dataset](/sdk/js/docs/examples/add-data-to-dataset) in the Apify SDK documentation, and refer to the [input schema docs](/platform/actors/development/actor-definition/input-schema#integer).
+> If you're interested in learning more about how the code works, and what the **INPUT_SCHEMA.json** means, read about [inputs](/sdk/js/docs/examples/accept-user-input) and [adding data to a dataset](/sdk/js/docs/examples/add-data-to-dataset) in the Apify SDK documentation, and refer to the [input schema docs](/platform/actors/development/actor-definition/input-schema/specification/v1#integer).
 
 Finally, **Save** and **Build** the Actor just as you did in the previous lesson.
 
@@ -89,7 +89,7 @@ On the results tab, there are a whole lot of options for which format to view/do
 
 There's our solution! Did it work for you as well? Now, we can download the data right from the results tab to be used elsewhere, or even programmatically retrieve it by using [Apify's API](/api/v2) (we'll be discussing how to do this in the next lesson).
 
-It's important to note that the default dataset of the Actor, which we pushed our solution to, will be retained for 7 days. If we wanted the data to be retained for an indefinite period of time, we'd have to use a named dataset. For more information about named storages vs unnamed storages, read a bit about [data retention on the Apify platform](/platform/storage#data-retention).
+It's important to note that the default dataset of the Actor, which we pushed our solution to, will be retained for 7 days. If we wanted the data to be retained for an indefinite period of time, we'd have to use a named dataset. For more information about named storages vs unnamed storages, read a bit about [data retention on the Apify platform](/platform/storage/usage#data-retention).
 
 ## Next up {#next}
 

diff --git a/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md b/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md
@@ -28,7 +28,7 @@ If the Actor being run via API takes 5 minutes or less to complete a typical run
 
 > If you are unsure about the differences between an Actor and a task, you can read about them in the [tasks](/platform/actors/running/tasks) documentation. In brief, tasks are pre-configured inputs for Actors.
 
-The API endpoints and usage (for both sync and async) for [Actors](/api/v2#/reference/actors/run-collection/run-actor) and [tasks](/api/v2#/reference/actor-tasks/run-collection/run-task) are essentially the same.
+The API endpoints and usage (for both sync and async) for [Actors](/api/v2#tag/ActorsRun-collection/operation/act_runs_post) and [tasks](/api/v2#/reference/actor-tasks/run-collection/run-task) are essentially the same.
 
 To run, or **call**, an Actor/task, you will need a few things:
 

diff --git a/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md b/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md
@@ -17,7 +17,7 @@ tutorial, great! You are ready to continue where we left off. If you haven't see
 check it out, it will help you learn about Apify and scraping in general and set you up for this tutorial,
 because this one builds on topics and code examples discussed there.
 
-## [](#getting-to-know-our-tools) Getting to know our tools
+## Getting to know our tools
 
 In the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started) tutorial, we've confirmed that the scraper works as expected,
 so now it's time to add more data to the results.
@@ -36,7 +36,7 @@ Now that's out of the way, let's open one of the Actor detail pages in the Store
 > If you're wondering why we're using Web Scraper as an example instead of Cheerio Scraper,
 it's only because we didn't want to triple the number of screenshots we needed to make. Lazy developers!
 
-## [](#building-our-page-function) Building our Page function
+## Building our Page function
 
 Before we start, let's do a quick recap of the data we chose to scrape:
 
@@ -52,7 +52,7 @@ Before we start, let's do a quick recap of the data we chose to scrape:
 We've already scraped numbers 1 and 2 in the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started)
 tutorial, so let's get to the next one on the list: title.
 
-### [](#title) Title
+### Title
 
 ![$1](https://raw.githubusercontent.com/apify/actor-scraper/master/docs/img/title.webp)
 
@@ -79,7 +79,7 @@ async function pageFunction(context) {
 }
 ```
 
-### [](#description) Description
+### Description
 
 Getting the Actor's description is a little more involved, but still pretty straightforward. We cannot search for a `<p>` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the Actor description is nested within
 the `<header>` element too, same as the title. Moreover, the actual description is nested inside a `<span>` tag with a class `actor-description`.
@@ -97,7 +97,7 @@ async function pageFunction(context) {
 }
 ```
 
-### [](#modified-date) Modified date
+### Modified date
 
 The DevTools tell us that the `modifiedDate` can be found in a `<time>` element.
 
@@ -125,7 +125,7 @@ But we would much rather see a readable date in our results, not a unix timestam
 constructor will not accept a `string`, so we cast the `string` to a `number` using the `Number()` function before actually calling `new Date()`.
 Phew!
 
-### [](#run-count) Run count
+### Run count
 
 And so we're finishing up with the `runCount`. There's no specific element like `<time>`, so we need to create
 a complex selector and then do a transformation on the result.
@@ -164,7 +164,7 @@ using a regular expression, but its type is still a `string`, so we finally conv
 >
 > This will give us a string (e.g. `'1234567'`) that can be converted via `Number` function.
 
-### [](#wrapping-it-up) Wrapping it up
+### Wrapping it up
 
 And there we have it! All the data we needed in a single object. For the sake of completeness, let's add
 the properties we parsed from the URL earlier and we're good to go.
@@ -242,13 +242,13 @@ async function pageFunction(context) {
 }
 ```
 
-### [](#test-run) Test run
+### Test run
 
 As always, try hitting that **Save & Run** button and visit
 the **Dataset** preview of clean items. You should see a nice table of all the attributes correctly scraped.
 You nailed it!
 
-## [](#pagination) Pagination
+## Pagination
 
 Pagination is a term that represents "going to the next page of results". You may have noticed that we did not
 actually scrape all the Actors, just the first page of results. That's because to load the rest of the Actors,
@@ -264,7 +264,7 @@ with Cheerio? We don't have a browser to do it and we only have the HTML of the
 answer is that we can't click a button. Does that mean that we cannot get the data at all? Usually not,
 but it requires some clever DevTools-Fu.
 
-### [](#analyzing-the-page) Analyzing the page
+### Analyzing the page
 
 While with Web Scraper and **Puppeteer Scraper** ([apify/puppeteer-scraper](https://apify.com/apify/puppeteer-scraper)), we could get away with clicking a button,
 with Cheerio Scraper we need to dig a little deeper into the page's architecture. For this, we will use
@@ -280,7 +280,7 @@ Then we click the **Show more** button and wait for incoming requests to appear
 Now, this is interesting. It seems that we've only received two images after clicking the button and no additional
 data. This means that the data about Actors must already be available in the page and the **Show more** button only displays it. This is good news.
 
-### [](#finding-the-actors) Finding the Actors
+### Finding the Actors
 
 Now that we know the information we seek is already in the page, we just need to find it. The first Actor in the store
 is Web Scraper, so let's try using the search tool in the **Elements** tab to find some reference to it. The first
@@ -309,7 +309,7 @@ so you might already be wondering, can I make one request to the store to get th
 and then parse it out and be done with it in a single request? Yes you can! And that's the power
 of clever page analysis.
 
-### [](#using-the-data-to-enqueue-all-actor-details) Using the data to enqueue all Actor details
+### Using the data to enqueue all Actor details
 
 We don't really need to go to all the Actor details now, but for the sake of practice, let's imagine we only found
 Actor names such as `cheerio-scraper` and their owners, such as `apify` in the data. We will use this information
@@ -342,7 +342,7 @@ how to route those requests.
 >If you're wondering how we know the structure of the URL, see the [Getting started
 with Apify Scrapers](./getting_started.md) tutorial again.
 
-### [](#plugging-it-into-the-page-function) Plugging it into the Page function
+### Plugging it into the Page function
 
 We've got the general algorithm ready, so all that's left is to integrate it into our earlier `pageFunction`.
 Remember the `// Do some stuff later` comment? Let's replace it.
@@ -411,13 +411,13 @@ to get all results with Cheerio only and other times it takes hours of research.
 the right scraper for your job. But don't get discouraged. Often times, the only thing you will ever need is to
 define a correct Pseudo URL. Do your research first before giving up on Cheerio Scraper.
 
-## [](#downloading-our-scraped-data) Downloading the scraped data
+## Downloading the scraped data
 
 You already know the **Dataset** tab of the run console since this is where we've always previewed our data. Notice the row of data formats such as JSON, CSV, and Excel. Below it are options for viewing and downloading the data. Go ahead and try it.
 
 > If you prefer working with an API, you can find the example endpoint under the API tab: **Get dataset items**.
 
-### [](#clean-items) Clean items
+### Clean items
 
 You can view and download your data without modifications, or you can choose to only get **clean** items. Data that aren't cleaned include a record
 for each `pageFunction` invocation, even if you did not return any results. The record also includes hidden fields
@@ -427,7 +427,7 @@ Clean items, on the other hand, include only the data you returned from the `pag
 
 To control this, open the **Advanced options** view on the **Dataset** tab.
 
-## [](#bonus-making-your-code-neater) Bonus: Making your code neater
+## Bonus: Making your code neater
 
 You may have noticed that the `pageFunction` gets quite bulky. To make better sense of your code and have an easier
 time maintaining or extending your task, feel free to define other functions inside the `pageFunction`
@@ -495,11 +495,11 @@ async function pageFunction(context) {
 > If you're confused by the functions being declared below their executions, it's called hoisting and it's a feature
 of JavaScript. It helps you put what matters on top, if you so desire.
 
-## [](#final-word) Final word
+## Final word
 
 Thank you for reading this whole tutorial! Really! It's important to us that our users have the best information available to them so that they can use Apify and effectively. We're glad that you made it all the way here and congratulations on creating your first scraping task. We hope that you liked the tutorial and if there's anything you'd like to ask, [join us on Discord](https://discord.gg/jyEM2PRvMU)!
 
-## [](#whats-next) What's next
+## What's next
 
 * Check out the [Apify SDK](https://docs.apify.com/sdk) and its [Getting started](https://docs.apify.com/sdk/js/docs/guides/apify-platform) tutorial if you'd like to try building your own Actors. It's a bit more complex and involved than writing a `pageFunction`, but it allows you to fine-tune all the details of your scraper to your liking.
 * [Take a deep dive into Actors](/platform/actors), from how they work to [publishing](/platform/actors/publishing) them in Apify Store, and even [making money](https://blog.apify.com/make-regular-passive-income-developing-web-automation-actors-b0392278d085/) on Actors.
-Original file line number
+Diff line change
@@ Expand Up @@
     ## Property types & editor types {#property-types}
-    Within our new **numbers** property, there are two more fields we must specify. Firstly, we must let the platform know that we're expecting an array of numbers with the **type** field. Then, we should also instruct Apify on which UI component to render for this input property. In our case, we have an array of numbers, which means we should use the **json** editor type that we discovered in the ["array" section](/platform/actors/development/actor-definition/input-schema#array) of the input schema documentation. We could also use **stringList**, but then we'd have to parse out the numbers from the strings.
+    Within our new **numbers** property, there are two more fields we must specify. Firstly, we must let the platform know that we're expecting an array of numbers with the **type** field. Then, we should also instruct Apify on which UI component to render for this input property. In our case, we have an array of numbers, which means we should use the **json** editor type that we discovered in the ["array" section](/platform/actors/development/actor-definition/input-schema/specification/v1#array) of the input schema documentation. We could also use **stringList**, but then we'd have to parse out the numbers from the strings.
     ```json
     {
@@ Expand Down @@