diff --git a/README.md b/README.md index 066bde39f..b261bcfd5 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ ## Intro -This repository is the home of Apify's documentation, which you can find at [docs.apify.com](https://docs.apify.com/). The documentation is written using [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). Source files of the [platform documentation](https://docs.apify.com/platform) are located in the [/sources](https://github.com/apify/apify-docs/tree/master/sources) directory. However, other sections, such as SDKs for [JavaScript/Node.js](https://docs.apify.com/sdk/js/), [Python](https://docs.apify.com/sdk/python/), or [CLI](https://docs.apify.com/cli), have their own repositories. For more information, see the [Contributing guidelines](./CONTRIBUTING.md). +This repository is the home of Apify's documentation, which you can find at [docs.apify.com](https://docs.apify.com/). The documentation is written using [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). Source files of the [platform documentation](https://docs.apify.com/platform) are located in the [/sources](https://github.com/apify/apify-docs/tree/master/sources) directory. However, other sections, such as SDKs for [JavaScript/Node.js](https://docs.apify.com/sdk/js/), [Python](https://docs.apify.com/sdk/python/), or [CLI](https://docs.apify.com/cli/), have their own repositories. For more information, see the [Contributing guidelines](./CONTRIBUTING.md). ## Before you start contributing diff --git a/sources/academy/glossary/concepts/dynamic_pages.md b/sources/academy/glossary/concepts/dynamic_pages.md index e7e38f77b..ba38d1cc7 100644 --- a/sources/academy/glossary/concepts/dynamic_pages.md +++ b/sources/academy/glossary/concepts/dynamic_pages.md @@ -11,7 +11,7 @@ slug: /concepts/dynamic-pages --- -Oftentimes, web pages load additional information dynamically, long after their main body is loaded in the browser. A subset of dynamic pages takes this approach further and loads all of its content dynamically. Such style of constructing websites is called Single-page applications (SPAs), and it's widespread thanks to some popular JavaScript libraries, such as [React](https://reactjs.org/) or [Vue](https://vuejs.org/). +Oftentimes, web pages load additional information dynamically, long after their main body is loaded in the browser. A subset of dynamic pages takes this approach further and loads all of its content dynamically. Such style of constructing websites is called Single-page applications (SPAs), and it's widespread thanks to some popular JavaScript libraries, such as [React](https://react.dev/) or [Vue](https://vuejs.org/). As you progress in your scraping journey, you'll quickly realize that different websites load their content and populate their pages with data in different ways. Some pages are rendered entirely on the server, some retrieve the data dynamically, and some use a combination of both those methods. diff --git a/sources/academy/glossary/concepts/http_headers.md b/sources/academy/glossary/concepts/http_headers.md index 64266bc8d..2fce1b833 100644 --- a/sources/academy/glossary/concepts/http_headers.md +++ b/sources/academy/glossary/concepts/http_headers.md @@ -47,4 +47,4 @@ HTTP/1.1 and HTTP/2 headers have several differences. Here are the three key dif 2. Certain headers are no longer used in HTTP/2 (such as **Connection** along with a few others related to it like **Keep-Alive**). In HTTP/2, connection-specific headers are prohibited. While some browsers will ignore them, Safari and other Webkit-based browsers will outright reject any response that contains them. Easy to do by accident, and a big problem. 3. While HTTP/1.1 headers are case-insensitive and could be sent by the browsers with capitalized letters (e.g. **Accept-Encoding**, **Cache-Control**, **User-Agent**), HTTP/2 headers must be lower-cased (e.g. **accept-encoding**, **cache-control**, **user-agent**). -> To learn more about the difference between HTTP/1.1 and HTTP/2 headers, check out [this](https://httptoolkit.tech/blog/translating-http-2-into-http-1/) article +> To learn more about the difference between HTTP/1.1 and HTTP/2 headers, check out [this](https://httptoolkit.com/blog/translating-http-2-into-http-1/) article diff --git a/sources/academy/glossary/concepts/robot_process_automation.md b/sources/academy/glossary/concepts/robot_process_automation.md index e428f7ede..27d61dcde 100644 --- a/sources/academy/glossary/concepts/robot_process_automation.md +++ b/sources/academy/glossary/concepts/robot_process_automation.md @@ -27,7 +27,7 @@ In a traditional automation workflow, you 2. Program a bot that does each of those chunks. 3. Execute the chunks of code in the right order (or in parallel). -With the advance of [machine learning](https://en.wikipedia.org/wiki/Machine_learning), it is becoming possible to [record](https://www.nice.com/rpa/rpa-guide/process-recorder-function-in-rpa/) your workflows and analyze which can be automated. However, this technology is still not perfected and at times can even be less practical than the manual process. +With the advance of [machine learning](https://en.wikipedia.org/wiki/Machine_learning), it is becoming possible to [record](https://www.nice.com/info/rpa-guide/process-recorder-function-in-rpa/) your workflows and analyze which can be automated. However, this technology is still not perfected and at times can even be less practical than the manual process. ## Is RPA the same as web scraping? {#is-rpa-the-same-as-web-scraping} @@ -39,6 +39,6 @@ An easy-to-follow [video](https://www.youtube.com/watch?v=9URSbTOE4YI) on what R To learn about RPA in plain English, check out [this](https://enterprisersproject.com/article/2019/5/rpa-robotic-process-automation-how-explain) article. -[This](https://www.cio.com/article/3236451/what-is-rpa-robotic-process-automation-explained.html) article explains what RPA is and discusses both its advantages and disadvantages. +[This](https://www.cio.com/article/227908/what-is-rpa-robotic-process-automation-explained.html) article explains what RPA is and discusses both its advantages and disadvantages. You might also like to check out this article on [12 Steps to Automate Workflows](https://quandarycg.com/automating-workflows/). diff --git a/sources/academy/glossary/tools/postman.md b/sources/academy/glossary/tools/postman.md index ea1d37db3..5f37b8f4e 100644 --- a/sources/academy/glossary/tools/postman.md +++ b/sources/academy/glossary/tools/postman.md @@ -13,7 +13,7 @@ slug: /tools/postman [Postman](https://www.postman.com/) is a powerful collaboration platform for API development and testing. For scraping use-cases, it's mainly used to test requests and proxies (such as checking the response body of a raw request, without loading any additional resources such as JavaScript or CSS). This tool can do much more than that, but we will not be discussing all of its capabilities here. Postman allows us to test requests with cookies, headers, and payloads so that we can be entirely sure what the response looks like for a request URL we plan to eventually use in a scraper. -The desktop app can be downloaded from its [official download page](https://www.postman.com/downloads/), or the web app can be used with a signup - no download required. If this is your first time working with a tool like Postman, we recommend checking out their [Getting Started guide](https://learning.postman.com/docs/getting-started/introduction/). +The desktop app can be downloaded from its [official download page](https://www.postman.com/downloads/), or the web app can be used with a signup - no download required. If this is your first time working with a tool like Postman, we recommend checking out their [Getting Started guide](https://learning.postman.com/docs/introduction/overview/). ## Understanding the interface {#understanding-the-interface} diff --git a/sources/academy/platform/deploying_your_code/docker_file.md b/sources/academy/platform/deploying_your_code/docker_file.md index 43e0902dc..f69824d5f 100644 --- a/sources/academy/platform/deploying_your_code/docker_file.md +++ b/sources/academy/platform/deploying_your_code/docker_file.md @@ -22,7 +22,7 @@ The **Dockerfile** is a file which gives the Apify platform (or Docker, more spe If your project doesn’t already contain a Dockerfile, don’t worry! Apify offers [many base images](/sdk/js/docs/guides/docker-images) that are optimized for building and running Actors on the platform, which can be found [here](https://hub.docker.com/u/apify). When using a language for which Apify doesn't provide a base image, [Docker Hub](https://hub.docker.com/) provides a ton of free Docker images for most use-cases, upon which you can create your own images. -> Tip: You can see all of Apify's Docker images [on DockerHub](https://hub.docker.com/r/apify/). +> Tip: You can see all of Apify's Docker images [on DockerHub](https://hub.docker.com/u/apify). At the base level, each Docker image contains a base operating system and usually also a programming language runtime (such as Node.js or Python). You can also find images with preinstalled libraries or install them yourself during the build step. diff --git a/sources/academy/platform/expert_scraping_with_apify/index.md b/sources/academy/platform/expert_scraping_with_apify/index.md index e0160243a..f57f273fe 100644 --- a/sources/academy/platform/expert_scraping_with_apify/index.md +++ b/sources/academy/platform/expert_scraping_with_apify/index.md @@ -36,7 +36,7 @@ In one of the later lessons, we'll be learning how to integrate our Actor on the ### Docker {#docker} -Docker is a massive topic on its own, but don't be worried! We only expect you to know and understand the very basics of it, which can be learned about in [this short article](https://docs.docker.com/get-started/overview/) (10 minute read). +Docker is a massive topic on its own, but don't be worried! We only expect you to know and understand the very basics of it, which can be learned about in [this short article](https://docs.docker.com/guides/docker-overview/) (10 minute read). ### The basics of Actors {#actor-basics} diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md b/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md index b43eb8ce9..2f009dac5 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md @@ -230,7 +230,7 @@ await Actor.exit(); **A:** The Apify client mimics the Apify API, so there aren't any super significant differences. It's super handy as it helps with managing the API calls (parsing, error handling, retries, etc) and even adds convenience functions. -The one main difference is that the Apify client automatically uses [**exponential backoff**](/api/client/js#retries-with-exponential-backoff) to deal with errors. +The one main difference is that the Apify client automatically uses [**exponential backoff**](/api/client/js/docs#retries-with-exponential-backoff) to deal with errors. **Q: How do you pass input when running an Actor or task via API?** diff --git a/sources/academy/platform/get_most_of_actors/actor_readme.md b/sources/academy/platform/get_most_of_actors/actor_readme.md index 61c5508c1..4a4ae5205 100644 --- a/sources/academy/platform/get_most_of_actors/actor_readme.md +++ b/sources/academy/platform/get_most_of_actors/actor_readme.md @@ -16,7 +16,7 @@ slug: /get-most-of-actors/actor-readme - Whenever you build an Actor, think of the original request/idea and the "use case" = "user need" it should solve, please take notes and share them with Apify, so we can help you write a blog post supporting your Actor with more information, more detailed explanation, better SEO. - Consider adding a video, images, and screenshots to your README to break up the text. - This is an example of an Actor with a README that corresponds well to the guidelines below: - - https://apify.com/dtrungtin/airbnb-scraper + - [apify.com/tri_angle/airbnb-scraper](https://apify.com/tri_angle/airbnb-scraper) - Tip no.1: if you want to add snippets of code anywhere in your README, you can use [Carbon](https://github.com/carbon-app/carbon). - Tip no.2: if you need any quick Markdown guidance, check out https://www.markdownguide.org/cheat-sheet/ @@ -74,12 +74,12 @@ Aim for sections 1–6 below and try to include at least 300 words. You can move - Refer to the input tab on Actor's detail page. If you like, you can add a screenshot showing the user what the input fields will look like. - This is an example of how to refer to the input tab: - > Twitter Scraper has the following input options. Click on the [input tab](https://apify.com/vdrmota/twitter-scraper/input-schema) for more information. + > Twitter Scraper has the following input options. Click on the [input tab](https://apify.com/quacker/twitter-scraper/input-schema) for more information. 7. **Output** - Mention "You can download the dataset extracted by (Actor name) in various formats such as JSON, HTML, CSV, or Excel.” - - Add a simplified JSON dataset example, like here: https://apify.com/drobnikj/crawler-google-places#output-example + - Add a simplified JSON dataset example, like here: [apify.com/compass/crawler-google-places#output-example](https://apify.com/compass/crawler-google-places#output-example) 8. **Tips or Advanced options section** - Share any tips on how to best run the Actor, such as how to limit compute unit usage, get more accurate results, or improve speed. diff --git a/sources/academy/platform/get_most_of_actors/index.md b/sources/academy/platform/get_most_of_actors/index.md index 5a2e4c12c..f6f05cd54 100644 --- a/sources/academy/platform/get_most_of_actors/index.md +++ b/sources/academy/platform/get_most_of_actors/index.md @@ -12,7 +12,7 @@ slug: /get-most-of-actors --- -[Apify Store](https://apify.com/store) is home to hundreds of public Actors available to the Apify community. Anyone is welcome to [publish Actors](/platform/actors/publishing) in the store, and you can even [monetize your Actors](https://get.apify.com/monetize-your-code). +[Apify Store](https://apify.com/store) is home to hundreds of public Actors available to the Apify community. Anyone is welcome to [publish Actors](/platform/actors/publishing) in the store, and you can even [monetize your Actors](https://apify.com/partners/actor-developers). In this section, we will go over some of the practical steps you can take to ensure the high quality of your public Actors. You will learn: diff --git a/sources/academy/platform/get_most_of_actors/monetizing_your_actor.md b/sources/academy/platform/get_most_of_actors/monetizing_your_actor.md index a6118bf4a..b19b8f6e3 100644 --- a/sources/academy/platform/get_most_of_actors/monetizing_your_actor.md +++ b/sources/academy/platform/get_most_of_actors/monetizing_your_actor.md @@ -152,7 +152,7 @@ Getting new users can be an art in itself, but there are **two proven steps** yo Don’t underestimate your own network! Your social media connections can be a valuable ally in promoting your Actor. Not only can they use your tool to enrich their own professional activities, but also support your work by helping you promote your Actor to their network. - For inspiration, you can check Apify’s [Twitter](https://twitter.com/apify), [Facebook](https://www.facebook.com/apifytech/), and [LinkedIn](https://linkedin.com/company/apifytech) pages, and **don’t forget to tag Apify on your posts** we will retweet and share your posts to help you reach an even broader audience. + For inspiration, you can check Apify’s [Twitter](https://twitter.com/apify) or [LinkedIn](https://www.linkedin.com/company/apifytech/) pages, and **don’t forget to tag Apify on your posts** we will retweet and share your posts to help you reach an even broader audience. - **YouTube** diff --git a/sources/academy/platform/get_most_of_actors/naming_your_actor.md b/sources/academy/platform/get_most_of_actors/naming_your_actor.md index 93b81f573..1884865e9 100644 --- a/sources/academy/platform/get_most_of_actors/naming_your_actor.md +++ b/sources/academy/platform/get_most_of_actors/naming_your_actor.md @@ -17,7 +17,7 @@ Naming your Actor can be tricky. Especially when you've spent a long time coding ## Scrapers {#scrapers} -For Actors such as [YouTube Scraper](https://apify.com/bernardo/youtube-scraper) or [Amazon Scraper](https://apify.com/vaclavrut/amazon-crawler), which scrape web pages, we usually have one Actor per domain. This helps with naming, as the domain name serves as your Actor's name. +For Actors such as [YouTube Scraper](https://apify.com/streamers/youtube-scraper) or [Amazon Scraper](https://apify.com/junglee/amazon-crawler), which scrape web pages, we usually have one Actor per domain. This helps with naming, as the domain name serves as your Actor's name. GOOD: diff --git a/sources/academy/platform/get_most_of_actors/seo_and_promotion.md b/sources/academy/platform/get_most_of_actors/seo_and_promotion.md index 4a79d34d0..b23fbcc02 100644 --- a/sources/academy/platform/get_most_of_actors/seo_and_promotion.md +++ b/sources/academy/platform/get_most_of_actors/seo_and_promotion.md @@ -102,11 +102,11 @@ Now that you’ve created a cool new Actor, let others see it! Share it on your - Try to publish an article about your Actor in relevant external magazines like [hackernoon.com](https://hackernoon.com/) or [techcrunch.com](https://techcrunch.com/). Do not limit yourself to blogging platforms. - If you publish an article in external media (magazine, blog etc.), be sure to include backlinks to your Actor and the Apify website to strengthen the domain's SEO. - It's always better to use backlinks with the [`dofollow` attribute](https://raventools.com/marketing-glossary/dofollow-link/). -- Always use the most relevant URL as the backlink's landing page. For example, when talking about Apify Store, link to the Store page (https://apify.com/store), not to Apify homepage (https://apify.com). +- Always use the most relevant URL as the backlink's landing page. For example, when talking about Apify Store, link to the Store page ([apify.com/store](https://apify.com/store)), not to Apify homepage ([apify.com](https://apify.com)). - Always use the most relevant keyword or phrase for the backlink's text. This can boost the landing page's SEO and help the readers know what to expect from the link. -> **GOOD**: Try the [Facebook scraper](https://apify.com/pocesar/facebook-pages-scraper) now. ->
**AVOID**: Try the Facebook scraper [here](https://apify.com/pocesar/facebook-pages-scraper). +> **GOOD**: Try the [Facebook scraper](https://apify.com/apify/facebook-pages-scraper) now. +>
**AVOID**: Try the Facebook scraper [here](https://apify.com/apify/facebook-pages-scraper). ### Social media and forums diff --git a/sources/academy/platform/getting_started/actors.md b/sources/academy/platform/getting_started/actors.md index 1f89ee263..6b0a407c5 100644 --- a/sources/academy/platform/getting_started/actors.md +++ b/sources/academy/platform/getting_started/actors.md @@ -23,7 +23,7 @@ Once an Actor has been pushed to the Apify platform, they can be shared to the w ## Actors on the Apify platform {#actors-on-platform} -For a super quick and dirty understanding of what a published Actor looks like, and how it works, let's run an SEO audit of **apify.com** using the [SEO audit Actor](https://apify.com/drobnikj/seo-audit-tool). +For a super quick and dirty understanding of what a published Actor looks like, and how it works, let's run an SEO audit of **apify.com** using the [SEO audit Actor](https://apify.com/misceres/seo-audit-tool). On the front page of the Actor, click the green **Try for free** button. If you're logged into your Apify account which you created during the [**Getting started**](./index.md) lesson, you'll be taken to the Apify Console and greeted with a page that looks like this: diff --git a/sources/academy/platform/running_a_web_server.md b/sources/academy/platform/running_a_web_server.md index 8bd24416e..d35727bf2 100644 --- a/sources/academy/platform/running_a_web_server.md +++ b/sources/academy/platform/running_a_web_server.md @@ -236,4 +236,4 @@ When we deploy and run this Actor on the Apify platform, then we can open the ** With that, we're done! And our application works like a charm :) -The complete code of this Actor is available [here](https://www.apify.com/apify/example-web-server). You can run it there or copy it to your account. +The complete code of this Actor is available [here](https://apify.com/apify/example-web-server). You can run it there or copy it to your account. diff --git a/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md b/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md index d1f6e3050..dd126c6d6 100644 --- a/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md +++ b/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md @@ -14,7 +14,7 @@ import TabItem from '@theme/TabItem'; The most popular way of [integrating](https://help.apify.com/en/collections/1669769-integrations) the Apify platform with an external project/application is by programmatically running an [Actor](/platform/actors) or [task](/platform/actors/running/tasks), waiting for it to complete its run, then collecting its data and using it within the project. Follow this tutorial to have an idea on how to approach this, it isn't as complicated as it sounds! -> Remember to check out our [API documentation](/api/v2) with examples in different languages and a live API console. We also recommend testing the API with a desktop client like [Postman](https://www.getpostman.com/) or [Insomnia](https://insomnia.rest). +> Remember to check out our [API documentation](/api/v2) with examples in different languages and a live API console. We also recommend testing the API with a desktop client like [Postman](https://www.postman.com/) or [Insomnia](https://insomnia.rest). Apify API offers two ways of interacting with it: @@ -78,7 +78,7 @@ Via API, let's quickly try to run [Web Scraper](https://apify.com/apify/web-scra https://api.apify.com/v2/acts/apify~web-scraper/runs?token=YOUR_TOKEN ``` -Here is how it looks in [Postman](https://www.getpostman.com/): +Here is how it looks in [Postman](https://www.postman.com/): ![Run an Actor via API in Postman](./images/run-actor-postman.png) diff --git a/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md b/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md index 26efcf635..8ba3c521d 100644 --- a/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md +++ b/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md @@ -2,7 +2,7 @@ title: Scraping with Cheerio Scraper menuTitle: Cheerio Scraper description: Learn how to scrape a website using Apify's Cheerio Scraper. Build an Actor's page function, extract information from a web page and download your data. -externalSourceUrl: https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/build/cheerio-scraper-tutorial.md +externalSourceUrl: https://raw.githubusercontent.com/apify/actor-scraper/master/docs/build/cheerio-scraper-tutorial.md sidebar_position: 3 slug: /apify-scrapers/cheerio-scraper --- @@ -47,14 +47,14 @@ Before we start, let's do a quick recap of the data we chose to scrape: 5. **Last modification date** - When the Actor was last modified. 6. **Number of runs** - How many times the Actor was run. -![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/scraping-practice.webp) +![$1](https://raw.githubusercontent.com/apify/actor-scraper/master/docs/img/scraping-practice.webp) We've already scraped numbers 1 and 2 in the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started) tutorial, so let's get to the next one on the list: title. ### [](#title) Title -![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/title.webp) +![$1](https://raw.githubusercontent.com/apify/actor-scraper/master/docs/img/title.webp) By using the element selector tool, we find out that the title is there under an `

` tag, as titles should be. Maybe surprisingly, we find that there are actually two `

` tags on the detail page. This should get us thinking. @@ -84,7 +84,7 @@ async function pageFunction(context) { Getting the Actor's description is a little more involved, but still pretty straightforward. We cannot search for a `

` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the Actor description is nested within the `

` element too, same as the title. Moreover, the actual description is nested inside a `` tag with a class `actor-description`. -![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/description.webp) +![$1](https://raw.githubusercontent.com/apify/actor-scraper/master/docs/img/description.webp) ```js async function pageFunction(context) { @@ -101,7 +101,7 @@ async function pageFunction(context) { The DevTools tell us that the `modifiedDate` can be found in a `