diff --git a/sources/academy/glossary/concepts/dynamic_pages.md b/sources/academy/glossary/concepts/dynamic_pages.md index 2cf9326ee..e7e38f77b 100644 --- a/sources/academy/glossary/concepts/dynamic_pages.md +++ b/sources/academy/glossary/concepts/dynamic_pages.md @@ -36,6 +36,6 @@ Sometimes, it can be quite obvious when content is dynamically being rendered. F ![Image](https://blog.apify.com/content/images/2022/02/dynamicLoading-1--1--2.gif) -Here, it's very clear that new content is being generated. As we scroll down the Twitter feed, we can see the scroll bar jumping back up, signifying that more elements have been created using Javascript. +Here, it's very clear that new content is being generated. As we scroll down the Twitter feed, we can see the scroll bar jumping back up, signifying that more elements have been created using JavaScript. Other times, it's less obvious though. Content can appear to be static (non-dynamic) when it is not, or even sometimes the other way around. diff --git a/sources/academy/glossary/tools/apify_cli.md b/sources/academy/glossary/tools/apify_cli.md index 1b65144df..e55346d7b 100644 --- a/sources/academy/glossary/tools/apify_cli.md +++ b/sources/academy/glossary/tools/apify_cli.md @@ -15,7 +15,7 @@ The [Apify CLI](/cli) helps you create, develop, build and run Apify actors, and ## Installing {#installing} -To install the Apfiy CLI, you'll first need NPM, which comes preinstalled with Node.js. If you haven't yet installed Node, learn how to do that [here](../../webscraping/web_scraping_for_beginners/data_extraction/computer_preparation.md). Additionally, make sure you've got an Apify account, as you will need to log in to the CLI to gain access to its full potential. +To install the Apfiy CLI, you'll first need npm, which comes preinstalled with Node.js. If you haven't yet installed Node, learn how to do that [here](../../webscraping/web_scraping_for_beginners/data_extraction/computer_preparation.md). Additionally, make sure you've got an Apify account, as you will need to log in to the CLI to gain access to its full potential. Open up a terminal instance and run the following command: @@ -23,7 +23,7 @@ Open up a terminal instance and run the following command: npm i -g apify-cli ``` -This will install the CLI via NPM. +This will install the CLI via npm. ## Logging in {#logging-in} diff --git a/sources/academy/glossary/tools/quick_javascript_switcher.md b/sources/academy/glossary/tools/quick_javascript_switcher.md index fb939466e..ba2f3d580 100644 --- a/sources/academy/glossary/tools/quick_javascript_switcher.md +++ b/sources/academy/glossary/tools/quick_javascript_switcher.md @@ -11,7 +11,7 @@ slug: /tools/quick-javascript-switcher --- -**Quick Javascript Switcher** is a very simple Chrome extension that allows you to switch on/off the JavaScript for the current page with one click. It can be added to your browser via the [Chrome Web Store](https://chrome.google.com/webstore/category/extensions). After adding it to Chrome, you'll see its respective button next to any other Chrome extensions you might have installed. +**Quick JavaScript Switcher** is a very simple Chrome extension that allows you to switch on/off the JavaScript for the current page with one click. It can be added to your browser via the [Chrome Web Store](https://chrome.google.com/webstore/category/extensions). After adding it to Chrome, you'll see its respective button next to any other Chrome extensions you might have installed. If JavaScript is enabled - clicking the button will switch it off and reload the page. The next click will re-enable JavaScript and refresh the page. This extension is useful for checking whether a certain website will work without JavaScript (and thus could be parsed without using a browser with a plain HTTP request) or not. diff --git a/sources/academy/glossary/tools/user_agent_switcher.md b/sources/academy/glossary/tools/user_agent_switcher.md index a8b31d0ec..7b86fcbcc 100644 --- a/sources/academy/glossary/tools/user_agent_switcher.md +++ b/sources/academy/glossary/tools/user_agent_switcher.md @@ -15,7 +15,7 @@ slug: /tools/user-agent-switcher ![User-Agent Switcher groups](./images/user-agent-switcher-groups.png) -Clicking on a group will display a list of possible user-agents to set. +Clicking on a group will display a list of possible User-Agents to set. ![Default available Internet Explorer agents](./images/user-agent-switcher-agents.png) @@ -23,6 +23,6 @@ After setting the **User-Agent**, the page will be refreshed. ## Configuration -The extension configuration page allows you to edit the **User-Agent** list in case you want to add a specific user-agent that isn't already provided. You can find some other options, but most likely you will never need to modify those. +The extension configuration page allows you to edit the **User-Agent** list in case you want to add a specific User-Agent that isn't already provided. You can find some other options, but most likely you will never need to modify those. ![User-Agent Switcher configuration page](./images/user-agent-switcher-config.png) diff --git a/sources/academy/platform/deploying_your_code/docker_file.md b/sources/academy/platform/deploying_your_code/docker_file.md index 73f548eee..6993b3462 100644 --- a/sources/academy/platform/deploying_your_code/docker_file.md +++ b/sources/academy/platform/deploying_your_code/docker_file.md @@ -49,22 +49,22 @@ Here's the Dockerfile for our Node.js example project's actor: FROM apify/actor-node:16 # Second, copy just package.json and package-lock.json since they are the only files -# that affect NPM install in the next step +# that affect npm install in the next step COPY package*.json ./ -# Install NPM packages, skip optional and development dependencies to keep the +# Install npm packages, skip optional and development dependencies to keep the # image small. Avoid logging too much and print the dependency tree for debugging RUN npm --quiet set progress=false \ && npm install --only=prod --no-optional \ - && echo "Installed NPM packages:" \ + && echo "Installed npm packages:" \ && (npm list --all || true) \ && echo "Node.js version:" \ && node --version \ - && echo "NPM version:" \ + && echo "npm version:" \ && npm --version # Next, copy the remaining files and directories with the source code. -# Since we do this after NPM install, quick build will be really fast +# Since we do this after npm install, quick build will be really fast # for simple source file changes. COPY . ./ diff --git a/sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md b/sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md index 09ed2ad35..b1c86edfa 100644 --- a/sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md +++ b/sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md @@ -18,7 +18,7 @@ You can use one of the two main ways to programmatically interact with the Apify ## Learning 🧠 {#learning} - Scroll through the [Apify API docs](/api/v2) (there's a whole lot there, so you're not expected to memorize everything). -- Read about the Apify client in [Apify's docs](/api/client/js). It can also be seen on [GitHub](https://github.com/apify/apify-client-js) and [NPM](https://www.npmjs.com/package/apify-client). +- Read about the Apify client in [Apify's docs](/api/client/js). It can also be seen on [GitHub](https://github.com/apify/apify-client-js) and [npm](https://www.npmjs.com/package/apify-client). - Learn about the [`Actor.newClient()`](/sdk/js/reference/class/Actor#newClient) function in the Apify SDK. - Skim through [this article](https://help.apify.com/en/articles/2868670-how-to-pass-data-from-web-scraper-to-another-actor) about API integration (this article is old; however, still relevant). @@ -26,7 +26,7 @@ You can use one of the two main ways to programmatically interact with the Apify 1. What is the relationship between the Apify API and the Apify client? Are there any significant differences? 2. How do you pass input when running an actor or task via API? -3. Do you need to install the `apify-client` NPM package when already using the `apify` package? +3. Do you need to install the `apify-client` npm package when already using the `apify` package? ## Our task diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md b/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md index 7c8258c0d..827b8de27 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md @@ -236,7 +236,7 @@ The one main difference is that the Apify client automatically uses [**exponenti **A:** The input should be passed into the **body** of the request when running an actor/task via API. -**Q: Do you need to install the `apify-client` NPM package when already using the `apify` package?** +**Q: Do you need to install the `apify-client` npm package when already using the `apify` package?** **A:** No. The Apify client is available right in the SDK with the `Actor.newClient()` function. diff --git a/sources/academy/platform/getting_started/apify_client.md b/sources/academy/platform/getting_started/apify_client.md index 5b8943c39..76dcbfa07 100644 --- a/sources/academy/platform/getting_started/apify_client.md +++ b/sources/academy/platform/getting_started/apify_client.md @@ -26,7 +26,7 @@ You can access `apify-client` examples in the Console Actor detail page. Click t ## Installing and importing {#installing-and-importing} -If you are going to use the client in Node.js, use this command within one of your projects to install the package through NPM: +If you are going to use the client in Node.js, use this command within one of your projects to install the package through npm: ```shell npm install apify-client diff --git a/sources/academy/tutorials/node_js/choosing_the_right_scraper.md b/sources/academy/tutorials/node_js/choosing_the_right_scraper.md index 47950e085..58516d91f 100644 --- a/sources/academy/tutorials/node_js/choosing_the_right_scraper.md +++ b/sources/academy/tutorials/node_js/choosing_the_right_scraper.md @@ -30,7 +30,7 @@ Some websites do not load any data without a browser, as they need to execute so ## Making the choice {#making-the-choice} -When choosing which scraper to use, we would suggest first checking whether the website works without JavaScript or not. Probably the easiest way to do so is to use the [Quick Javascript Switcher](../../glossary/tools/quick_javascript_switcher.md) extension for Chrome. If JavaScript is not needed, or you've spotted some XHR requests in the **Network** tab with the data you need, you probably won't need to use an automated browser browser. You can then check what data is received in response using [Postman](../../glossary/tools/postman.md) or [Insomnia](../../glossary/tools/insomnia.md) or try to send a few requests programmatically. If the data is there and you're not blocked straight away, a request-based scraper is probably the way to go. +When choosing which scraper to use, we would suggest first checking whether the website works without JavaScript or not. Probably the easiest way to do so is to use the [Quick JavaScript Switcher](../../glossary/tools/quick_javascript_switcher.md) extension for Chrome. If JavaScript is not needed, or you've spotted some XHR requests in the **Network** tab with the data you need, you probably won't need to use an automated browser browser. You can then check what data is received in response using [Postman](../../glossary/tools/postman.md) or [Insomnia](../../glossary/tools/insomnia.md) or try to send a few requests programmatically. If the data is there and you're not blocked straight away, a request-based scraper is probably the way to go. It also depends of course on whether you need to fill in some data (like a username and password) or select a location (such as entering a zip code manually). Tasks where interacting with the page is absolutely necessary cannot be done using plain HTTP scraping, and require headless browsers. In some cases, you might also decide to use a browser-based solution in order to better blend in with the rest of the "regular" traffic coming from real users. diff --git a/sources/academy/webscraping/anti_scraping/index.md b/sources/academy/webscraping/anti_scraping/index.md index 6d590e87b..8446d0308 100644 --- a/sources/academy/webscraping/anti_scraping/index.md +++ b/sources/academy/webscraping/anti_scraping/index.md @@ -111,13 +111,13 @@ Because we here at Apify scrape for a living, we have discovered many popular an ### IP rate-limiting -This is the most straightforward and standard protection, which is mainly implemented to prevent DDOS attacks, but it also works for blocking scrapers. Websites using rating don't allow to more than some defined number of requests from one IP address in a certain time span. If the max-request number is low, then there is a high potential for false-positive due to IP address uniqueness, such as in large companies where hundreds of employees can share the same IP address. +This is the most straightforward and standard protection, which is mainly implemented to prevent DDoS attacks, but it also works for blocking scrapers. Websites using rating don't allow to more than some defined number of requests from one IP address in a certain time span. If the max-request number is low, then there is a high potential for false-positive due to IP address uniqueness, such as in large companies where hundreds of employees can share the same IP address. > Learn more about rate limiting [here](./techniques/rate_limiting.md) ### Header checking -This type of bot identification is based on the given fact that humans are accessing web pages through browsers, which have specific [header](../../glossary/concepts/http_headers.md) sets which they send along with every request. The most commonly known header that helps to detect bots is the `user-agent` header, which holds a value that identifies which browser is being used, and what version it's running. Though `user-agent` is the most commonly used header for the **Header checking** method, other headers are sometimes used as well. The evaluation is often also run based on the header consistency, and includes a known combination of browser headers. +This type of bot identification is based on the given fact that humans are accessing web pages through browsers, which have specific [header](../../glossary/concepts/http_headers.md) sets which they send along with every request. The most commonly known header that helps to detect bots is the `User-Agent` header, which holds a value that identifies which browser is being used, and what version it's running. Though `User-Agent` is the most commonly used header for the **Header checking** method, other headers are sometimes used as well. The evaluation is often also run based on the header consistency, and includes a known combination of browser headers. ### URL analysis @@ -131,7 +131,7 @@ One of the best ways of avoiding the possible breaking of your scraper due to we ### IP session consistency -This technique is commonly used to entirely block the bot from accessing the website altogether. It works on the principle that every entity that accesses the site gets a token. This token is then saved together with the IP address and HTTP request information such as user-agent and other specific headers. If the entity makes another request, but without the session token, the IP address is added on the greylist. +This technique is commonly used to entirely block the bot from accessing the website altogether. It works on the principle that every entity that accesses the site gets a token. This token is then saved together with the IP address and HTTP request information such as User-Agent and other specific headers. If the entity makes another request, but without the session token, the IP address is added on the greylist. ### Interval analysis diff --git a/sources/academy/webscraping/anti_scraping/mitigation/generating_fingerprints.md b/sources/academy/webscraping/anti_scraping/mitigation/generating_fingerprints.md index 07abeb8ea..5ba143ef4 100644 --- a/sources/academy/webscraping/anti_scraping/mitigation/generating_fingerprints.md +++ b/sources/academy/webscraping/anti_scraping/mitigation/generating_fingerprints.md @@ -1,13 +1,13 @@ --- title: Generating fingerprints -description: Learn how to use two super handy NPM libraries to easily generate fingerprints and inject them into a Playwright or Puppeteer page. +description: Learn how to use two super handy npm libraries to easily generate fingerprints and inject them into a Playwright or Puppeteer page. sidebar_position: 3 slug: /anti-scraping/mitigation/generating-fingerprints --- # Generating fingerprints {#generating-fingerprints} -**Learn how to use two super handy NPM libraries to easily generate fingerprints and inject them into a Playwright or Puppeteer page.** +**Learn how to use two super handy npm libraries to easily generate fingerprints and inject them into a Playwright or Puppeteer page.** --- diff --git a/sources/academy/webscraping/anti_scraping/mitigation/using_proxies.md b/sources/academy/webscraping/anti_scraping/mitigation/using_proxies.md index 91de0360b..cb0e21f38 100644 --- a/sources/academy/webscraping/anti_scraping/mitigation/using_proxies.md +++ b/sources/academy/webscraping/anti_scraping/mitigation/using_proxies.md @@ -140,4 +140,4 @@ Notice that we didn't provide it a list of proxy URLs. This is because the `SHAD ## Next up {#next} -[Next up](./generating_fingerprints.md), we'll be checking out how to use two NPM packages to generate and inject [browser fingerprints](../techniques/fingerprinting.md). +[Next up](./generating_fingerprints.md), we'll be checking out how to use two npm packages to generate and inject [browser fingerprints](../techniques/fingerprinting.md). diff --git a/sources/academy/webscraping/anti_scraping/techniques/browser_challenges.md b/sources/academy/webscraping/anti_scraping/techniques/browser_challenges.md index 90fe0c3e0..3a606317b 100644 --- a/sources/academy/webscraping/anti_scraping/techniques/browser_challenges.md +++ b/sources/academy/webscraping/anti_scraping/techniques/browser_challenges.md @@ -11,7 +11,7 @@ slug: /anti-scraping/techniques/browser-challenges ## Browser challenges -Browser challenges are a type of security measure that relies on browser fingerprints. These challenges typically involve a javascript script that collects both static and dynamic browser fingerprints. Static fingerprints include attributes such as user-agent, video card, and number of CPU cores available. Dynamic fingerprints, on the other hand, might involve rendering fonts or objects in the canvas (known as a [canvas fingerprint](./fingerprinting.md#with-canvases)), or playing audio in the [AudioContext](./fingerprinting.md#from-audiocontext). We were covering the details in the previous [fingerprinting](./fingerprinting.md) lesson. +Browser challenges are a type of security measure that relies on browser fingerprints. These challenges typically involve a JavaScript program that collects both static and dynamic browser fingerprints. Static fingerprints include attributes such as User-Agent, video card, and number of CPU cores available. Dynamic fingerprints, on the other hand, might involve rendering fonts or objects in the canvas (known as a [canvas fingerprint](./fingerprinting.md#with-canvases)), or playing audio in the [AudioContext](./fingerprinting.md#from-audiocontext). We were covering the details in the previous [fingerprinting](./fingerprinting.md) lesson. While some browser challenges are relatively straightforward - for example, just loading an image and checking if it renders correctly - others can be much more complex. One well-known example of a complex browser challenge is Cloudflare's browser screen check. In this challenge, Cloudflare visually inspects the browser screen and blocks the first request if any inconsistencies are found. This approach provides an extra layer of protection against automated attacks. @@ -19,7 +19,7 @@ Many online protections incorporate browser challenges into their security measu ## Cloudflare browser challenge -One of the most well-known browser challenges is the one used by Cloudflare. Cloudflare has a massive dataset of legitimate canvas fingerprints and user-agent pairs, which they use in conjunction with machine learning algorithms to detect any device property spoofing. This might include spoofed user-agents, operating systems, or GPUs. +One of the most well-known browser challenges is the one used by Cloudflare. Cloudflare has a massive dataset of legitimate canvas fingerprints and User-Agent pairs, which they use in conjunction with machine learning algorithms to detect any device property spoofing. This might include spoofed User-Agent headers, operating systems, or GPUs. ![Cloudflare browser check](https://images.ctfassets.net/slt3lc6tev37/55EYMR81XJCIG5uxLjQQOx/252a98adf90fa0ff2f70437cc5c0a3af/under-attack-mode_enabled.gif) diff --git a/sources/academy/webscraping/api_scraping/graphql_scraping/custom_queries.md b/sources/academy/webscraping/api_scraping/graphql_scraping/custom_queries.md index 04a969c25..0a433b31b 100644 --- a/sources/academy/webscraping/api_scraping/graphql_scraping/custom_queries.md +++ b/sources/academy/webscraping/api_scraping/graphql_scraping/custom_queries.md @@ -36,7 +36,7 @@ To make sure we're all on the same page, we're going to set up the project toget npm init -y && npm install graphql-tag puppeteer got-scraping ``` -This command will first initialize the project with NPM, then will install the `puppeteer`, `graphql-tag`, and `got-scraping` packages, which we will need in this lesson. +This command will first initialize the project with npm, then will install the `puppeteer`, `graphql-tag`, and `got-scraping` packages, which we will need in this lesson. Finally, create a file called **index.js**. This is the file we will be working in for the rest of the lesson. @@ -113,7 +113,7 @@ Also in the previous lesson, we learned that the **media** type is dependent on query SearchQuery($query: String!, $max_age: Int!) { organization { media(query: $query, max_age: $max_age , first: 1000) { - + } } } @@ -190,7 +190,7 @@ const GET_LATEST = gql` `; ``` -Alternatively, if you don't want to write your GraphQL queries right within your Javascript code, you can write them in files using the **.graphql** format, then read them from the filesystem or import them. +Alternatively, if you don't want to write your GraphQL queries right within your JavaScript code, you can write them in files using the **.graphql** format, then read them from the filesystem or import them. > In order to receive nice GraphQL syntax highlighting in these template literals, download the [GraphQL VSCode extension](https://marketplace.visualstudio.com/items?itemName=GraphQL.vscode-graphql) diff --git a/sources/academy/webscraping/switching_to_typescript/index.md b/sources/academy/webscraping/switching_to_typescript/index.md index cecb51b07..2d3587598 100644 --- a/sources/academy/webscraping/switching_to_typescript/index.md +++ b/sources/academy/webscraping/switching_to_typescript/index.md @@ -56,7 +56,7 @@ This means that when using TS (a popular acronym for "TypeScript") on a large pr ## How different is TypeScript from JavaScript? {#how-different-is-it} -Think of it this way: Javascript **IS** Typescript, but TypeScript isn't JavaScript. All JavaScript code is valid TypeScript code, which means that you can pretty much turn any **.js** file into a **.ts** file and it'll still work just the same after being compiled. It also means that to learn TypeScript, you aren't going to have to learn a whole new programming language if you already know JavaScript. +Think of it this way: JavaScript **IS** TypeScript, but TypeScript isn't JavaScript. All JavaScript code is valid TypeScript code, which means that you can pretty much turn any **.js** file into a **.ts** file and it'll still work just the same after being compiled. It also means that to learn TypeScript, you aren't going to have to learn a whole new programming language if you already know JavaScript. What are the differences? Well, there's really just one: TypeScript files cannot be run directly. They must first be compiled into regular JavaScript. diff --git a/sources/academy/webscraping/switching_to_typescript/installation.md b/sources/academy/webscraping/switching_to_typescript/installation.md index c276fbf6d..361bb16ba 100644 --- a/sources/academy/webscraping/switching_to_typescript/installation.md +++ b/sources/academy/webscraping/switching_to_typescript/installation.md @@ -11,7 +11,7 @@ slug: /switching-to-typescript/installation --- -> In order to install and use TypeScript, you'll first need to make sure you've installed [Node.js](https://nodejs.org). Node.js comes with a package manager called [NPM](https://npmjs.com), through which TypeScript can be installed. +> In order to install and use TypeScript, you'll first need to make sure you've installed [Node.js](https://nodejs.org). Node.js comes with a package manager called [npm](https://npmjs.com), through which TypeScript can be installed. To install TypeScript globally on your machine, run the following command in your terminal: diff --git a/sources/academy/webscraping/web_scraping_for_beginners/crawling/pro_scraping.md b/sources/academy/webscraping/web_scraping_for_beginners/crawling/pro_scraping.md index 7ad6608ee..45e1ed02d 100644 --- a/sources/academy/webscraping/web_scraping_for_beginners/crawling/pro_scraping.md +++ b/sources/academy/webscraping/web_scraping_for_beginners/crawling/pro_scraping.md @@ -43,11 +43,11 @@ Crawlee and its resources can be found in various different places: 1. [Official Crawlee documentation](https://crawlee.dev/) 2. [Crawlee GitHub repository (source code, issues)](https://github.com/apify/crawlee) -3. [Crawlee on NPM](https://www.npmjs.com/package/crawlee) +3. [Crawlee on npm](https://www.npmjs.com/package/crawlee) ## Install Crawlee {#crawlee-installation} -To use Crawlee, we have to install it from NPM. Let's add it to our project from the previous lessons by executing this command in your project's folder. +To use Crawlee, we have to install it from npm. Let's add it to our project from the previous lessons by executing this command in your project's folder. ```shell npm install crawlee diff --git a/sources/academy/webscraping/web_scraping_for_beginners/data_extraction/computer_preparation.md b/sources/academy/webscraping/web_scraping_for_beginners/data_extraction/computer_preparation.md index bd75f5869..724df66e7 100644 --- a/sources/academy/webscraping/web_scraping_for_beginners/data_extraction/computer_preparation.md +++ b/sources/academy/webscraping/web_scraping_for_beginners/data_extraction/computer_preparation.md @@ -1,13 +1,13 @@ --- title: Computer preparation -description: Set up your computer to be able to code scrapers with Node.js and JavaScript. Download Node.js and NPM and run a Hello World script. +description: Set up your computer to be able to code scrapers with Node.js and JavaScript. Download Node.js and npm and run a Hello World script. sidebar_position: 4 slug: /web-scraping-for-beginners/data-extraction/computer-preparation --- # Prepare your computer for programming {#prepare-computer} -**Set up your computer to be able to code scrapers with Node.js and JavaScript. Download Node.js and NPM and run a Hello World script.** +**Set up your computer to be able to code scrapers with Node.js and JavaScript. Download Node.js and npm and run a Hello World script.** --- @@ -29,7 +29,7 @@ Once you downloaded and installed it, you can open a folder where we will build ## Hello world! 👋 {#hello-world} -Before we start, let's confirm that Node.js was successfully installed on your computer. To do that, run those two commands in your terminal and see if they correctly print your Node.js and NPM versions. The next lessons **require Node.js version 16 or higher**. If you skipped Node.js installation and want to use your existing version of Node.js, **make sure that it's 16 or higher**. +Before we start, let's confirm that Node.js was successfully installed on your computer. To do that, run those two commands in your terminal and see if they correctly print your Node.js and npm versions. The next lessons **require Node.js version 16 or higher**. If you skipped Node.js installation and want to use your existing version of Node.js, **make sure that it's 16 or higher**. ```shell node -v diff --git a/sources/academy/webscraping/web_scraping_for_beginners/data_extraction/project_setup.md b/sources/academy/webscraping/web_scraping_for_beginners/data_extraction/project_setup.md index d22768f95..05abd5211 100644 --- a/sources/academy/webscraping/web_scraping_for_beginners/data_extraction/project_setup.md +++ b/sources/academy/webscraping/web_scraping_for_beginners/data_extraction/project_setup.md @@ -1,35 +1,35 @@ --- title: Project setup -description: Create a new project with NPM and Node.js. Install necessary libraries, and test that everything works before starting the next lesson. +description: Create a new project with npm and Node.js. Install necessary libraries, and test that everything works before starting the next lesson. sidebar_position: 5 slug: /web-scraping-for-beginners/data-extraction/project-setup --- # Setting up your project {#setting-up} -**Create a new project with NPM and Node.js. Install necessary libraries, and test that everything works before starting the next lesson.** +**Create a new project with npm and Node.js. Install necessary libraries, and test that everything works before starting the next lesson.** --- -When you open a website in a browser, the browser first downloads the page's HTML. To do the same thing with Node.js, we will install a program - an NPM module - to help us with it. NPM modules are installed using `npm`, which is another program, automatically installed with Node.js. +When you open a website in a browser, the browser first downloads the page's HTML. To do the same thing with Node.js, we will install a program - an npm module - to help us with it. npm modules are installed using `npm`, which is another program, automatically installed with Node.js. -> [NPM](https://www.npmjs.com/) has a huge collection of open-source libraries for Node.js. You can (and you should) utilize it to save time and tap into the amazing open-source community around JavaScript and Node.js. +> The [npmjs.com](https://www.npmjs.com/) registry offers a huge collection of open-source libraries for Node.js. You can (and you should) utilize it to save time and tap into the amazing open-source community around JavaScript and Node.js. -## Creating a new project with NPM {#creating-a-project} +## Creating a new project with npm {#creating-a-project} -Before we can install NPM modules, we need to create an NPM project. To do that, you can create a new directory or use the one that you already have open in VSCode (you can delete the **hello.js** file now) and from that directory run this command in your terminal: +Before we can install npm modules, we need to create an npm project. To do that, you can create a new directory or use the one that you already have open in VSCode (you can delete the **hello.js** file now) and from that directory run this command in your terminal: ```shell npm init -y ``` -It will set up an empty NPM project for you and create a file called **package.json**. This is a very important file in Node.js programming as it contains information about the project. +It will set up an empty npm project for you and create a file called **package.json**. This is a very important file in Node.js programming as it contains information about the project. -![NPM init with VSCode](./images/vscode-npm-init.png) +![npm init with VSCode](./images/vscode-npm-init.png) ### Use modern JavaScript {#modern-javascript} -Node.js and NPM support two types of projects, let's call them legacy and modern. For backwards compatibility, the legacy version is used by default. To switch to the modern version, open your **package.json** and add this line to the end of the JSON object. Don't forget to add a comma to the end of the previous line 😉 +Node.js and npm support two types of projects, let's call them legacy and modern. For backwards compatibility, the legacy version is used by default. To switch to the modern version, open your **package.json** and add this line to the end of the JSON object. Don't forget to add a comma to the end of the previous line 😉 ```text "type": "module" @@ -41,7 +41,7 @@ Node.js and NPM support two types of projects, let's call them legacy and modern ## Installing necessary libraries {#install-libraries} -Now that we have a project set up, we can install NPM modules into the project. Let's install libraries that will help us easily download and process websites' HTML. In the project directory, run the following command, which will install two libraries into your project. **got-scraping** and Cheerio. +Now that we have a project set up, we can install npm modules into the project. Let's install libraries that will help us easily download and process websites' HTML. In the project directory, run the following command, which will install two libraries into your project. **got-scraping** and Cheerio. ```shell npm install got-scraping cheerio diff --git a/sources/academy/webscraping/web_scraping_for_beginners/data_extraction/save_to_csv.md b/sources/academy/webscraping/web_scraping_for_beginners/data_extraction/save_to_csv.md index f44eca751..d74fd6f50 100644 --- a/sources/academy/webscraping/web_scraping_for_beginners/data_extraction/save_to_csv.md +++ b/sources/academy/webscraping/web_scraping_for_beginners/data_extraction/save_to_csv.md @@ -15,7 +15,7 @@ In the last lesson, we were able to extract data about all the on-sale products ## Converting to CSV {#converting-to-csv} -It might look like a big programming challenge to transform a JavaScript object into a CSV, but thanks to NPM, this is going to be a walk in the park. Google search **json to csv npm**. You will find that there's a library called [`json2csv`](https://www.npmjs.com/package/json2csv) that can convert a JavaScript object to CSV format with a single function call. _Perfect!_ +It might look like a big programming challenge to transform a JavaScript object into a CSV, but thanks to npm, this is going to be a walk in the park. Google search **json to csv npm**. You will find that there's a library called [`json2csv`](https://www.npmjs.com/package/json2csv) that can convert a JavaScript object to CSV format with a single function call. _Perfect!_ To install `json2csv`, run this command in your terminal. You need to be in the project's folder - the folder which has the `package.json` file. @@ -85,7 +85,7 @@ Now run the script with `node main.js`. The newly created CSV will be printed to ## Writing the CSV to a file {#writing-to-file} -The final task that remains is to save our CSV formatted data to a file on our disk, so we can open it or send it to someone. For this, we don't need any extra NPM packages because functions for saving files are included in Node.js. +The final task that remains is to save our CSV formatted data to a file on our disk, so we can open it or send it to someone. For this, we don't need any extra npm packages because functions for saving files are included in Node.js. First, we import the `writeFileSync` function from the `fs` (file system) package.