From d496f2fbda7a072b82fbc276036931f3af9e957a Mon Sep 17 00:00:00 2001 From: Christina Koch Date: Tue, 14 Jul 2015 21:32:18 -0500 Subject: [PATCH 1/3] adding Brent Shambaugh's lesson on file transfer options --- 03-file-transfer.md | 164 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 163 insertions(+), 1 deletion(-) diff --git a/03-file-transfer.md b/03-file-transfer.md index 97a668e..d4416ba 100644 --- a/03-file-transfer.md +++ b/03-file-transfer.md @@ -8,4 +8,166 @@ minutes: 5 > > * FIX ME -FIX ME \ No newline at end of file +There are other ways to interact with remote files other than git. + +It is true that we can clone an entire git repository, or even one level of a git repository using: 'git clone --depth-1 repository_name'. +What about files that do not exist in a git repository? If we wish to download files from the shell we can use tools such as +Wget, cURL, and lftp. + +#### Wget + +Wget is a simple tool developed for the GNU Project that downloads files with the HTTP, HTTPS and FTP protocols. It is widely used by Unix-like users and is available with most Linux distributions. + +To download this lesson (located at http://software-carpentry.org/v5/novice/extras/10-file_transfer.html) from the web via HTTP we can simply type: + +~~~ +$ wget http://software-carpentry.org/v5/novice/extras/10-file_transfer.html +~~~ +{:class="in"} +~~~ +--2014-11-21 09:41:31-- http://software-carpentry.org/v5/novice/extras/10-file_transfer.html +Resolving software-carpentry.org (software-carpentry.org)... 174.136.14.108 +Connecting to software-carpentry.org (software-carpentry.org)|174.136.14.108|:80... connected. +HTTP request sent, awaiting response... 200 OK +Length: 8901 (8.7K) [text/html] +Saving to: `10-file_transfer.html' + +100%[======================================>] 8,901 --.-K/s in 0.05s + +2014-11-21 09:41:31 (187 KB/s) - `10-file_transfer.html' saved [8901/8901] +~~~ +{:class="out"} + +Alternatively, you can add more options, which are in the form: + +~~~ +wget -r -np -D domain_name target_URL +~~~ +{:class="in"} + +where '-r' means recursively crawl to other files and directories, '-np' means avoid crawling to parent directories, and '-D' means to target only the following domain name + +For our URL it would be: + +~~~ +$ wget -r -np -D software-carpentry.org http://software-carpentry.org/v5/novice/extras/10-file_transfer.html +~~~ +{:class="in"} + +To restrict retrieval to a particular extension(s) we can use the '-A' option followed by a comma separated list: + +~~~ +wget -r -np -D software-carpentry.org -A html http://software-carpentry.org/v5/novice/extras/10-file_transfer.html +~~~ +{:class="in"} + +We can also clone a webpage with its local dependencies: + +~~~ +$ wget -mkq target_URL +~~~ + +We could also clone the entire website: + +~~~ +$ wget -mkq -np -D domain_name domain_name_URL +~~~ + +and add the '-nH' option if we do not want a subdirectory created for the websites content: + +e.g. + +~~~ +$ wget -mkq -np -nH -D example.com http://example.com +~~~ + +where: + +'-m' is for mirroring with time stamping, infinite recursion depth, and preservation of FTP directory settings +'-k' converts links to make them suitable for local viewing +'-q' supresses the output to the screen + +The above command can also save the clone the contents of one domain to another if we are using ssh or sshfs to access a webserver. + +Please refer to the man page by typing 'man wget' in the shell for more information. + +### cURL + +Alternatively, we can use cURL. It supports a much larger range of protocols including common mail based protocols like pop3 and smtp. + +To download this lesson (located at http://software-carpentry.org/v5/novice/extras/10-file_transfer.html) from the web via HTTP we can simply type: + +~~~ +$ curl -o 10-file_transfer.html http://software-carpentry.org/v5/novice/extras/10-file_transfer.html +~~~ +{:class="in"} +~~~ + % Total % Received % Xferd Average Speed Time Time Time Current + Dload Upload Total Spent Left Speed + 100 14005 100 14005 0 0 35170 0 --:--:-- --:--:-- --:--:-- 105k +~~~ +{:class="out"} + +This input to curl is in the form: + +~~~ +curl -o filename_for_local_machine target_url +~~~ + +where the '-o' option says write the output to a file instead of the stdout (the screen), and file_name_for_local_machine is any file name you choose to save to the local machine, and target_URL is where the file is the URL where the file is on the web + +Removing the '-o' option, and following the syntax 'curl target_URL' outputs the contents of the url to the screen. If we wanted to enhance the functionality we have we could use information from the pipes and filters section, which is lesson 4 from the unix shell session. +For example, we could type 'curl http://software-carpentry.org/v5/novice/extras/10-file_transfer.html + | grep curl' which would tell us that indeed this URL contains the string curl. We could make the output cleaner by limiting the output of curl to just the file contents by using the '-s' option +(e.g. curl -s http://software-carpentry.org/v5/novice/extras/10-file_transfer.html | grep curl). + +If we wanted only the text and not the html tags in our output we could use html to text parser such as html2text. + +~~~ +$ curl -s http://software-carpentry.org/v5/novice/extras/10-file_transfer.html | html2text | grep curl +~~~ +{:class="in"} + +With wget, we can obtain the same results by typing: + +~~~ +$ wget -q -D software-carpentry.org -O /dev/stdout http://software-carpentry.org/v5/novice/extras/10-file_transfer.html | html2text | grep curl +~~~ +{:class="in"} + +Wget offers more functionality natively than curl for retrieving entire directories. We could use Wget to first retrieve an entire directory and then run html2text and grep to find a particular string. cURL is limited to retrieving one or more specified URLs that cannot be obtained by recursively crawling a directory. The situation may be improved by combining with other unix tools, but is not thought as being as good as Wget. + +Please refer to the man pages by typing 'man wget', 'man curl', and 'man html2text' in the shell for more information. + +### lftp + +Another option is lftp. It has a lot of capability, and even does simple bitorrent. + +If we want to retrieve 03-review.html on the website and save it with the filename 03-review.html locally: + +~~~ +$ lftp -c get http://software-carpentry.org/v5/novice/extras/03-review.html +~~~ +{:class="in"} + +If we want to print 03-review.html to the screen instead: + +~~~ +$ lftp -c cat http://software-carpentry.org/v5/novice/extras/03-review.html +~~~ +{:class="in"} + +To obtain retrive all of the files with a particular extension in a directory we can type: + +~~~ +$ lftp -c mget {URL for directory}/*.extension_name +~~~ +{:class="in"} + +For example, to retrieve all of the .html files in the extras folder: + +~~~ +$ lftp -c mget http://software-carpentry.org/v5/novice/extras/*.html +~~~ + +Please refer to the man page by typing 'man lftp' in the shell for more information. \ No newline at end of file From 33468539e7f49130d9bbec6cfcabc9ca0af390c6 Mon Sep 17 00:00:00 2001 From: Christina Koch Date: Tue, 14 Jul 2015 21:54:06 -0500 Subject: [PATCH 2/3] fixing links and formatting --- 03-file-transfer.md | 107 ++++++++++++++++++++------------------------ 1 file changed, 48 insertions(+), 59 deletions(-) diff --git a/03-file-transfer.md b/03-file-transfer.md index d4416ba..628b6a6 100644 --- a/03-file-transfer.md +++ b/03-file-transfer.md @@ -18,58 +18,54 @@ Wget, cURL, and lftp. Wget is a simple tool developed for the GNU Project that downloads files with the HTTP, HTTPS and FTP protocols. It is widely used by Unix-like users and is available with most Linux distributions. -To download this lesson (located at http://software-carpentry.org/v5/novice/extras/10-file_transfer.html) from the web via HTTP we can simply type: +To download this lesson (located at http://swcarpentry.github.io/shell-extras/03-file-transfer.html) from the web via HTTP we can simply type: +~~~{.bash} +$ wget http://swcarpentry.github.io/shell-extras/03-file-transfer.html ~~~ -$ wget http://software-carpentry.org/v5/novice/extras/10-file_transfer.html -~~~ -{:class="in"} -~~~ ---2014-11-21 09:41:31-- http://software-carpentry.org/v5/novice/extras/10-file_transfer.html +~~~{.output} +--2014-11-21 09:41:31-- +http://swcarpentry.github.io/shell-extras/03-file-transfer.html Resolving software-carpentry.org (software-carpentry.org)... 174.136.14.108 Connecting to software-carpentry.org (software-carpentry.org)|174.136.14.108|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 8901 (8.7K) [text/html] -Saving to: `10-file_transfer.html' +Saving to: `03-file_transfer.html' 100%[======================================>] 8,901 --.-K/s in 0.05s -2014-11-21 09:41:31 (187 KB/s) - `10-file_transfer.html' saved [8901/8901] +2014-11-21 09:41:31 (187 KB/s) - `03-file_transfer.html' saved [8901/8901] ~~~ -{:class="out"} Alternatively, you can add more options, which are in the form: -~~~ +~~~{.bash} wget -r -np -D domain_name target_URL ~~~ -{:class="in"} -where '-r' means recursively crawl to other files and directories, '-np' means avoid crawling to parent directories, and '-D' means to target only the following domain name +where `-r` means recursively crawl to other files and directories, `-np` means avoid crawling to parent directories, and `-D` means to target only the following domain name For our URL it would be: +~~~{.bash} +$ wget -r -np -D software-carpentry.org http://swcarpentry.github.io/shell-extras/03-file-transfer.html ~~~ -$ wget -r -np -D software-carpentry.org http://software-carpentry.org/v5/novice/extras/10-file_transfer.html -~~~ -{:class="in"} To restrict retrieval to a particular extension(s) we can use the '-A' option followed by a comma separated list: +~~~{.bash} +wget -r -np -D software-carpentry.org -A html http://swcarpentry.github.io/shell-extras/03-file-transfer.html ~~~ -wget -r -np -D software-carpentry.org -A html http://software-carpentry.org/v5/novice/extras/10-file_transfer.html -~~~ -{:class="in"} We can also clone a webpage with its local dependencies: -~~~ +~~~{.bash} $ wget -mkq target_URL ~~~ We could also clone the entire website: -~~~ +~~~{.bash} $ wget -mkq -np -D domain_name domain_name_URL ~~~ @@ -77,15 +73,15 @@ and add the '-nH' option if we do not want a subdirectory created for the websit e.g. -~~~ +~~~{.bash} $ wget -mkq -np -nH -D example.com http://example.com ~~~ where: -'-m' is for mirroring with time stamping, infinite recursion depth, and preservation of FTP directory settings -'-k' converts links to make them suitable for local viewing -'-q' supresses the output to the screen +`-m` is for mirroring with time stamping, infinite recursion depth, and preservation of FTP directory settings +`-k` converts links to make them suitable for local viewing +`-q` supresses the output to the screen The above command can also save the clone the contents of one domain to another if we are using ssh or sshfs to access a webserver. @@ -93,81 +89,74 @@ Please refer to the man page by typing 'man wget' in the shell for more informat ### cURL -Alternatively, we can use cURL. It supports a much larger range of protocols including common mail based protocols like pop3 and smtp. +Alternatively, we can use `cURL`. It supports a much larger range of protocols including common mail based protocols like pop3 and smtp. -To download this lesson (located at http://software-carpentry.org/v5/novice/extras/10-file_transfer.html) from the web via HTTP we can simply type: +To download this lesson (located at http://swcarpentry.github.io/shell-extras/03-file-transfer.html) from the web via HTTP we can simply type: +~~~{.bash} +$ curl -o 10-file_transfer.html http://swcarpentry.github.io/shell-extras/03-file-transfer.html ~~~ -$ curl -o 10-file_transfer.html http://software-carpentry.org/v5/novice/extras/10-file_transfer.html -~~~ -{:class="in"} -~~~ +~~~{.output} % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 14005 100 14005 0 0 35170 0 --:--:-- --:--:-- --:--:-- 105k ~~~ -{:class="out"} This input to curl is in the form: -~~~ +~~~{.bash} curl -o filename_for_local_machine target_url ~~~ -where the '-o' option says write the output to a file instead of the stdout (the screen), and file_name_for_local_machine is any file name you choose to save to the local machine, and target_URL is where the file is the URL where the file is on the web +where the `-o` option says write the output to a file instead of the stdout (the screen), and file_name_for_local_machine is any file name you choose to save to the local machine, and target_URL is where the file is the URL where the file is on the web -Removing the '-o' option, and following the syntax 'curl target_URL' outputs the contents of the url to the screen. If we wanted to enhance the functionality we have we could use information from the pipes and filters section, which is lesson 4 from the unix shell session. -For example, we could type 'curl http://software-carpentry.org/v5/novice/extras/10-file_transfer.html - | grep curl' which would tell us that indeed this URL contains the string curl. We could make the output cleaner by limiting the output of curl to just the file contents by using the '-s' option -(e.g. curl -s http://software-carpentry.org/v5/novice/extras/10-file_transfer.html | grep curl). +Removing the `-o` option, and following the syntax `curl target_URL` outputs the contents of the url to the screen. If we wanted to enhance the functionality we have we could use information from the pipes and filters section, which is lesson 4 from the unix shell session. +For example, we could type `curl http://swcarpentry.github.io/shell-extras/03-file-transfer.html + | grep curl` which would tell us that indeed this URL contains the string curl. We could make the output cleaner by limiting the output of curl to just the file contents by using the `-s` option +(e.g. `curl -s http://swcarpentry.github.io/shell-extras/03-file-transfer.html | grep curl`). -If we wanted only the text and not the html tags in our output we could use html to text parser such as html2text. +If we wanted only the text and not the html tags in our output we could use html to text parser such as `html2text`. +~~~{.bash} +$ curl -s http://swcarpentry.github.io/shell-extras/03-file-transfer.html | html2text | grep curl ~~~ -$ curl -s http://software-carpentry.org/v5/novice/extras/10-file_transfer.html | html2text | grep curl -~~~ -{:class="in"} -With wget, we can obtain the same results by typing: +With `wget`, we can obtain the same results by typing: +~~~{.bash} +$ wget -q -D swcarpentry.github.io -O /dev/stdout http://swcarpentry.github.io/shell-extras/03-file-transfer.html | html2text | grep curl ~~~ -$ wget -q -D software-carpentry.org -O /dev/stdout http://software-carpentry.org/v5/novice/extras/10-file_transfer.html | html2text | grep curl -~~~ -{:class="in"} -Wget offers more functionality natively than curl for retrieving entire directories. We could use Wget to first retrieve an entire directory and then run html2text and grep to find a particular string. cURL is limited to retrieving one or more specified URLs that cannot be obtained by recursively crawling a directory. The situation may be improved by combining with other unix tools, but is not thought as being as good as Wget. +`Wget` offers more functionality natively than `curl` for retrieving entire directories. We could use `Wget` to first retrieve an entire directory and then run `html2text` and `grep` to find a particular string. `cURL` is limited to retrieving one or more specified URLs that cannot be obtained by recursively crawling a directory. The situation may be improved by combining with other unix tools, but is not thought as being as good as `Wget`. -Please refer to the man pages by typing 'man wget', 'man curl', and 'man html2text' in the shell for more information. +Please refer to the man pages by typing `man wget`, `man curl`, and `man html2text` in the shell for more information. ### lftp Another option is lftp. It has a lot of capability, and even does simple bitorrent. -If we want to retrieve 03-review.html on the website and save it with the filename 03-review.html locally: +If we want to retrieve `03-review.html` on the website and save it with the filename `03-review.html` locally: -~~~ +~~~{.bash} $ lftp -c get http://software-carpentry.org/v5/novice/extras/03-review.html ~~~ -{:class="in"} -If we want to print 03-review.html to the screen instead: +If we want to print `03-review.html` to the screen instead: -~~~ +~~~{.bash} $ lftp -c cat http://software-carpentry.org/v5/novice/extras/03-review.html ~~~ -{:class="in"} To obtain retrive all of the files with a particular extension in a directory we can type: -~~~ +~~~{.bash} $ lftp -c mget {URL for directory}/*.extension_name ~~~ -{:class="in"} -For example, to retrieve all of the .html files in the extras folder: +For example, to retrieve all of the `.html` files in the extras folder: -~~~ -$ lftp -c mget http://software-carpentry.org/v5/novice/extras/*.html +~~~{.bash} +$ lftp -c mget http://swcarpentry.github.io/shell-extras/*.html ~~~ -Please refer to the man page by typing 'man lftp' in the shell for more information. \ No newline at end of file +Please refer to the man page by typing `man lftp` in the shell for more information. \ No newline at end of file From c2e47e98bad6736c4c0e7339d2b326623e8279dc Mon Sep 17 00:00:00 2001 From: Christina Koch Date: Tue, 14 Jul 2015 21:57:56 -0500 Subject: [PATCH 3/3] changing last links --- 03-file-transfer.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/03-file-transfer.md b/03-file-transfer.md index 628b6a6..2d78f25 100644 --- a/03-file-transfer.md +++ b/03-file-transfer.md @@ -2,7 +2,7 @@ layout: page title: Extra Unix Shell Material subtitle: Transferring files -minutes: 5 +minutes: 10 --- > ## Learning Objectives {.objectives} > @@ -135,16 +135,16 @@ Please refer to the man pages by typing `man wget`, `man curl`, and `man html2te Another option is lftp. It has a lot of capability, and even does simple bitorrent. -If we want to retrieve `03-review.html` on the website and save it with the filename `03-review.html` locally: +If we want to retrieve `03-file-transfer.html` on the website and save it with the filename `03-file-transfer.html` locally: ~~~{.bash} -$ lftp -c get http://software-carpentry.org/v5/novice/extras/03-review.html +$ lftp -c get http://swcarpentry.github.io/shell-extras/03-file-transfer.html ~~~ -If we want to print `03-review.html` to the screen instead: +If we want to print `03-file-transfer.html` to the screen instead: ~~~{.bash} -$ lftp -c cat http://software-carpentry.org/v5/novice/extras/03-review.html +$ lftp -c cat http://swcarpentry.github.io/shell-extras/03-file-transfer.html ~~~ To obtain retrive all of the files with a particular extension in a directory we can type: