From 28fa64ca6a51faab3e787d5044396037b8753f7f Mon Sep 17 00:00:00 2001 From: charlottejmc <143802849+charlottejmc@users.noreply.github.com> Date: Wed, 18 Dec 2024 11:12:11 +0900 Subject: [PATCH 1/7] Update topics.yml Add modeling topic --- _data/topics.yml | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/_data/topics.yml b/_data/topics.yml index 9c5609170..63a511de2 100644 --- a/_data/topics.yml +++ b/_data/topics.yml @@ -189,3 +189,15 @@ es: "Lecciones que enseñan a representar datos mediante gráficos comunes." fr: "Leçons qui enseignent la représentation de données à l'aide de graphiques communs." pt: "Lições que ensinam a representação de dados através de gráficos comuns." + +- type: modeling + displayname: + en: "modeling" + es: "modelización" + fr: "modélisation" + pt: "modelagem" + description: + en: "Using formal, computational, or data-driven techniques to simulate, analyse, and predict complex systems." + es: "Utilizar técnicas formales, informáticas, o basadas en datos para simular, analizar y predecir sistemas complejos." + fr: "Utiliser des techniques formelles, informatiques, ou basées sur des données pour simuler, analiser et prédire des systèmes complexes." + pt: "Utilizar técnicas formais, computacionais ou baseadas em dados para simular, analisar e prever sistemas complexos." From 1dd34462ef4d31e79bdccfbaf25a904d3436392c Mon Sep 17 00:00:00 2001 From: charlottejmc <143802849+charlottejmc@users.noreply.github.com> Date: Wed, 18 Dec 2024 20:36:12 +0900 Subject: [PATCH 2/7] Update topics.yml Change to US spelling --- _data/topics.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_data/topics.yml b/_data/topics.yml index 63a511de2..8f96c1c2d 100644 --- a/_data/topics.yml +++ b/_data/topics.yml @@ -197,7 +197,7 @@ fr: "modélisation" pt: "modelagem" description: - en: "Using formal, computational, or data-driven techniques to simulate, analyse, and predict complex systems." + en: "Using formal, computational, or data-driven techniques to simulate, analyze, and predict complex systems." es: "Utilizar técnicas formales, informáticas, o basadas en datos para simular, analizar y predecir sistemas complejos." fr: "Utiliser des techniques formelles, informatiques, ou basées sur des données pour simuler, analiser et prédire des systèmes complexes." pt: "Utilizar técnicas formais, computacionais ou baseadas em dados para simular, analisar e prever sistemas complexos." From 84a67661a35dae5976e5fda903490764e01c3fbd Mon Sep 17 00:00:00 2001 From: charlottejmc <143802849+charlottejmc@users.noreply.github.com> Date: Fri, 20 Dec 2024 09:44:47 +0900 Subject: [PATCH 3/7] Update topics.yml Add Metadata --- _data/topics.yml | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/_data/topics.yml b/_data/topics.yml index 8f96c1c2d..2d4f35df7 100644 --- a/_data/topics.yml +++ b/_data/topics.yml @@ -201,3 +201,16 @@ es: "Utilizar técnicas formales, informáticas, o basadas en datos para simular, analizar y predecir sistemas complejos." fr: "Utiliser des techniques formelles, informatiques, ou basées sur des données pour simuler, analiser et prédire des systèmes complexes." pt: "Utilizar técnicas formais, computacionais ou baseadas em dados para simular, analisar e prever sistemas complexos." + + +- type: metadata + displayname: + en: "metadata" + es: "metadatos" + fr: "métadonnées" + pt: "metadados" + description: + en: "Description" + es: "Description" + fr: "Description" + pt: "Description" From 394a22001a6bf2956ca3c162c35dd023614ad52a Mon Sep 17 00:00:00 2001 From: charlottejmc <143802849+charlottejmc@users.noreply.github.com> Date: Fri, 20 Dec 2024 16:08:51 +0900 Subject: [PATCH 4/7] Sorting assets into folders --- .../The_Dataset_-_Alumni_Oxonienses-Jas1.csv | 0 .../Roman_to_Arabic.txt | 0 .../chiffres_romains_arabes.txt | 0 assets/{ => json-and-jq}/count_retweets.txt | 0 assets/{ => json-and-jq}/filter_retweets.txt | 0 assets/{ => json-and-jq}/jq_rkm.json | 0 assets/{ => json-and-jq}/jq_twitter.json | 0 assets/{ => naive-bayesian}/baileycode.zip | Bin en/lessons/extracting-keywords.md | 2 +- ...ing-an-ordered-data-set-from-an-OCR-text-file.md | 2 +- en/lessons/json-and-jq.md | 10 +++++----- en/lessons/naive-bayesian.md | 2 +- fr/lecons/generer-jeu-donnees-texte-ocr.md | 2 +- 13 files changed, 9 insertions(+), 9 deletions(-) rename assets/{ => extracting-keywords}/The_Dataset_-_Alumni_Oxonienses-Jas1.csv (100%) rename assets/{ => generating-an-ordered-data-set-from-an-OCR-text-file}/Roman_to_Arabic.txt (100%) rename assets/{ => generer-jeu-donnees-texte-ocr}/chiffres_romains_arabes.txt (100%) rename assets/{ => json-and-jq}/count_retweets.txt (100%) rename assets/{ => json-and-jq}/filter_retweets.txt (100%) rename assets/{ => json-and-jq}/jq_rkm.json (100%) rename assets/{ => json-and-jq}/jq_twitter.json (100%) rename assets/{ => naive-bayesian}/baileycode.zip (100%) diff --git a/assets/The_Dataset_-_Alumni_Oxonienses-Jas1.csv b/assets/extracting-keywords/The_Dataset_-_Alumni_Oxonienses-Jas1.csv similarity index 100% rename from assets/The_Dataset_-_Alumni_Oxonienses-Jas1.csv rename to assets/extracting-keywords/The_Dataset_-_Alumni_Oxonienses-Jas1.csv diff --git a/assets/Roman_to_Arabic.txt b/assets/generating-an-ordered-data-set-from-an-OCR-text-file/Roman_to_Arabic.txt similarity index 100% rename from assets/Roman_to_Arabic.txt rename to assets/generating-an-ordered-data-set-from-an-OCR-text-file/Roman_to_Arabic.txt diff --git a/assets/chiffres_romains_arabes.txt b/assets/generer-jeu-donnees-texte-ocr/chiffres_romains_arabes.txt similarity index 100% rename from assets/chiffres_romains_arabes.txt rename to assets/generer-jeu-donnees-texte-ocr/chiffres_romains_arabes.txt diff --git a/assets/count_retweets.txt b/assets/json-and-jq/count_retweets.txt similarity index 100% rename from assets/count_retweets.txt rename to assets/json-and-jq/count_retweets.txt diff --git a/assets/filter_retweets.txt b/assets/json-and-jq/filter_retweets.txt similarity index 100% rename from assets/filter_retweets.txt rename to assets/json-and-jq/filter_retweets.txt diff --git a/assets/jq_rkm.json b/assets/json-and-jq/jq_rkm.json similarity index 100% rename from assets/jq_rkm.json rename to assets/json-and-jq/jq_rkm.json diff --git a/assets/jq_twitter.json b/assets/json-and-jq/jq_twitter.json similarity index 100% rename from assets/jq_twitter.json rename to assets/json-and-jq/jq_twitter.json diff --git a/assets/baileycode.zip b/assets/naive-bayesian/baileycode.zip similarity index 100% rename from assets/baileycode.zip rename to assets/naive-bayesian/baileycode.zip diff --git a/en/lessons/extracting-keywords.md b/en/lessons/extracting-keywords.md index cdf125c0b..39f4e72ef 100644 --- a/en/lessons/extracting-keywords.md +++ b/en/lessons/extracting-keywords.md @@ -58,7 +58,7 @@ The lesson touches on Regular Expressions, so some readers may find it handy to The first step of this process is to take a look at the data that we will be using in the lesson. As mentioned, the data includes biographical details of approximately 6,692 graduates who began study at the University of Oxford in the early seventeenth century. -[The\_Dataset\_-\_Alumni_Oxonienses-Jas1.csv](/assets/The_Dataset_-_Alumni_Oxonienses-Jas1.csv) (1.4MB) +[The\_Dataset\_-\_Alumni_Oxonienses-Jas1.csv](/assets/extracting-keywords/The_Dataset_-_Alumni_Oxonienses-Jas1.csv) (1.4MB) {% include figure.html filename="extracting-keywords-1.png" caption="Screenshot of the first forty entries in the dataset" %} diff --git a/en/lessons/generating-an-ordered-data-set-from-an-OCR-text-file.md b/en/lessons/generating-an-ordered-data-set-from-an-OCR-text-file.md index 601a5d5ec..e55e15e25 100755 --- a/en/lessons/generating-an-ordered-data-set-from-an-OCR-text-file.md +++ b/en/lessons/generating-an-ordered-data-set-from-an-OCR-text-file.md @@ -221,7 +221,7 @@ def rom2ar(rom): return result ``` -(run <[this little script](/assets/Roman_to_Arabic.txt)> to see in detail how `rome2ar` works. Elegant programming like this can offer insight; like poetry.) +(run <[this little script](/assets/generating-an-ordered-data-set-from-an-OCR-text-file/Roman_to_Arabic.txt)> to see in detail how `rome2ar` works. Elegant programming like this can offer insight; like poetry.) ## Some other things we'll need: At the top of your Python module, you're going to want to import some python modules that are a part of the standard library. (see Fred Gibbs's tutorial [*Installing Python Modules with pip*](/lessons/installing-python-modules-pip)). diff --git a/en/lessons/json-and-jq.md b/en/lessons/json-and-jq.md index d1ef792a4..acabd4ed8 100755 --- a/en/lessons/json-and-jq.md +++ b/en/lessons/json-and-jq.md @@ -132,7 +132,7 @@ These set various jq [command-line options, or _flags_](https://stedolan.github. jq operates by way of _filters_: a series of text commands that you can string together, and which dictate how jq should transform the JSON you give it. -To learn the basic jq filters, we'll work with a sample response from the Rijksmuseum API: [rkm.json](/assets/jq_rkm.json) +To learn the basic jq filters, we'll work with a sample response from the Rijksmuseum API: [rkm.json](/assets/json-and-jq/json-and-jq/jq_rkm.json) Select all the text at that link, copy it, and paste it into the "JSON" box at [jq play] on the left hand side. @@ -425,7 +425,7 @@ One of the easiest ways to search and download Twitter data is using the excelle For this lesson, we will use a small sample of 50 public tweets. Clear the "Filter", "JSON" and "Result" boxes on [jq play], and ensure all the checkboxes are unchecked. -[Then copy this sample Twitter data](/assets/jq_twitter.json) into [jq play]. +[Then copy this sample Twitter data](/assets/json-and-jq/jq_twitter.json) into [jq play]. ### One-to-many relationships: Tweet hashtags @@ -895,7 +895,7 @@ You should get the following table: "whiteprivilege",1 ``` -[There are multiple ways to solve this with jq. See my answer here.](/assets/filter_retweets.txt) +[There are multiple ways to solve this with jq. See my answer here.](/assets/json-and-jq/filter_retweets.txt) #### Count total retweets per user @@ -909,7 +909,7 @@ Hints: As a way to verify your results, user `356854246` should have a total retweet count of `51` based on this dataset. -[See my answer.](/assets/count_retweets.txt) +[See my answer.](/assets/json-and-jq/count_retweets.txt) ## Using jq on the command line @@ -959,7 +959,7 @@ This can be useful when downloading JSON with a utility like `wget` for retrievi (See [Automated Downloading with Wget](/lessons/automated-downloading-with-wget) to learn the basics of this other command line program.) ```sh -wget -qO- http://programminghistorian.org/assets/jq_rkm.json | jq -r '.artObjects[] | [.id, .title, .principalOrFirstMaker, .webImage.url] | @csv' +wget -qO- http://programminghistorian.org/assets/json-and-jq/jq_rkm.json | jq -r '.artObjects[] | [.id, .title, .principalOrFirstMaker, .webImage.url] | @csv' ``` Note that you must use the `wget` flag `-qO-` in order to send the output of `wget` into `jq` by way of a shell pipe. diff --git a/en/lessons/naive-bayesian.md b/en/lessons/naive-bayesian.md index c484070e5..13536eedc 100755 --- a/en/lessons/naive-bayesian.md +++ b/en/lessons/naive-bayesian.md @@ -1462,7 +1462,7 @@ Happy hunting! [A Naive Bayesian in the Old Bailey]: http://digitalhistoryhacks.blogspot.com/2008/05/naive-bayesian-in-old-bailey-part-1.html [Old Bailey digital archive]: http://www.oldbaileyonline.org/ - [A zip file of the scripts]: /assets/baileycode.zip + [A zip file of the scripts]: /assets/naive-bayesian/baileycode.zip [another zip file]: https://doi.org/10.5281/zenodo.13284 [BeautifulSoup]: http://www.crummy.com/software/BeautifulSoup/ [search interface]: http://www.oldbaileyonline.org/forms/formMain.jsp diff --git a/fr/lecons/generer-jeu-donnees-texte-ocr.md b/fr/lecons/generer-jeu-donnees-texte-ocr.md index 51ca6a755..1b26d3ed4 100644 --- a/fr/lecons/generer-jeu-donnees-texte-ocr.md +++ b/fr/lecons/generer-jeu-donnees-texte-ocr.md @@ -234,7 +234,7 @@ def rom2ar(rom): return result ``` -Exécutez ce <[petit script](/assets/chiffres_romains_arabes.txt)> pour voir en détail comment `rome2ar` fonctionne. Une programmation élégante comme celle-ci peut presque s'apparenter à de la poésie. +Exécutez ce <[petit script](/assets/generer-jeu-donnees-texte-ocr/chiffres_romains_arabes.txt)> pour voir en détail comment `rome2ar` fonctionne. Une programmation élégante comme celle-ci peut presque s'apparenter à de la poésie. ## D'autres informations importantes Si vous avez besoin d'importer des modules faisant partie de la bibliothèque standard de Python, il faudra que les premières lignes de votre programme soient les imports de ces modules. Si besoin, voir le tutoriel de Fred Gibbs sur [*l'installation des bibliothèques Python avec pip*](/fr/lecons/installation-modules-python-pip). From dc49641117e7873fe805ed32dd8a2d4925f48aa2 Mon Sep 17 00:00:00 2001 From: charlottejmc <143802849+charlottejmc@users.noreply.github.com> Date: Fri, 20 Dec 2024 16:16:48 +0900 Subject: [PATCH 5/7] Sort assets into lesson folders --- .../ejemplo_introductorio_estados.csv | 0 assets/{ => analise-sentimento-R-syuzhet}/domCasmurro.txt | 0 es/lecciones/administracion-de-datos-en-r.md | 2 +- pt/licoes/analise-sentimento-R-syuzhet.md | 6 +++--- 4 files changed, 4 insertions(+), 4 deletions(-) rename assets/{ => administracion-de-datos-en-r}/ejemplo_introductorio_estados.csv (100%) rename assets/{ => analise-sentimento-R-syuzhet}/domCasmurro.txt (100%) diff --git a/assets/ejemplo_introductorio_estados.csv b/assets/administracion-de-datos-en-r/ejemplo_introductorio_estados.csv similarity index 100% rename from assets/ejemplo_introductorio_estados.csv rename to assets/administracion-de-datos-en-r/ejemplo_introductorio_estados.csv diff --git a/assets/domCasmurro.txt b/assets/analise-sentimento-R-syuzhet/domCasmurro.txt similarity index 100% rename from assets/domCasmurro.txt rename to assets/analise-sentimento-R-syuzhet/domCasmurro.txt diff --git a/es/lecciones/administracion-de-datos-en-r.md b/es/lecciones/administracion-de-datos-en-r.md index e47e57872..4db485e05 100644 --- a/es/lecciones/administracion-de-datos-en-r.md +++ b/es/lecciones/administracion-de-datos-en-r.md @@ -78,7 +78,7 @@ Copia el siguiente código en R Studio. Para ejecutarlo tienes que marcar las l ``` ## Un ejemplo de dplyr en acción -Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial. +Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/administracion-de-datos-en-r/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial. Como los datos están en un archivo CSV, vamos a usar el comando de lectura ```read_csv()``` en el paquete [readr](https://cran.r-project.org/web/packages/readr/vignettes/readr.html) de "tidyverse". diff --git a/pt/licoes/analise-sentimento-R-syuzhet.md b/pt/licoes/analise-sentimento-R-syuzhet.md index 93676c69e..77ea8ff71 100644 --- a/pt/licoes/analise-sentimento-R-syuzhet.md +++ b/pt/licoes/analise-sentimento-R-syuzhet.md @@ -171,7 +171,7 @@ library(tm) ## Carregar e preparar o texto -Faça o download do texto do romance [Dom Casmurro](/assets/domCasmurro.txt). Como podemos ver, o documento está em formato de texto simples, pois isto é essencial para realizar seu processamento e análise em R. +Faça o download do texto do romance [Dom Casmurro](/assets/analise-sentimento-R-syuzhet/domCasmurro.txt). Como podemos ver, o documento está em formato de texto simples, pois isto é essencial para realizar seu processamento e análise em R. Com o texto em mãos, a primeira coisa que vamos fazer é carregá-lo como um objeto de _string_. Certifique-se de mudar o caminho para o texto para corresponder ao seu computador. @@ -180,7 +180,7 @@ Com o texto em mãos, a primeira coisa que vamos fazer é carregá-lo como um ob Em sistemas Mac podemos usar a função `get_text_as_string` integrada no pacote `syuzhet`: ```R -texto <- get_text_as_string("https://raw.githubusercontent.com/programminghistorian/jekyll/gh-pages/assets/domCasmurro.txt") +texto <- get_text_as_string("https://raw.githubusercontent.com/programminghistorian/jekyll/gh-pages/assets/analise-sentimento-R-syuzhet/domCasmurro.txt") ``` **Em Windows** @@ -188,7 +188,7 @@ texto <- get_text_as_string("https://raw.githubusercontent.com/programminghistor Os sistemas Windows não lêem diretamente caracteres com acentos ou outras marcações típicas do espanhol, português ou francês, então temos que dizer ao sistema que o nosso texto está no formato UTF-8 usando a função `scan`. ```R -texto <- scan(file = "https://raw.githubusercontent.com/programminghistorian/jekyll/gh-pages/assets/domCasmurro.txt", fileEncoding = "UTF-8", what = character(), sep = "\n", allowEscapes = T) +texto <- scan(file = "https://raw.githubusercontent.com/programminghistorian/jekyll/gh-pages/assets/analise-sentimento-R-syuzhet/domCasmurro.txt", fileEncoding = "UTF-8", what = character(), sep = "\n", allowEscapes = T) ``` Como a análise que vamos realizar precisa de uma lista, seja de palavras ou de frases (aqui só prestaremos atenção a palavras individuais), precisamos de um passo intermediário entre o carregamento do texto e a extração dos valores de sentimento. Assim, vamos dividir o texto (*string*) em uma lista de palavras (*tokens*). Isto é muito comum na análise distante de textos. From d8cc28f6b79a763d61166f96f072f50a2e880eec Mon Sep 17 00:00:00 2001 From: charlottejmc <143802849+charlottejmc@users.noreply.github.com> Date: Fri, 20 Dec 2024 16:17:46 +0900 Subject: [PATCH 6/7] Revert "Sorting assets into folders" This reverts commit 394a22001a6bf2956ca3c162c35dd023614ad52a. --- .../Roman_to_Arabic.txt | 0 .../The_Dataset_-_Alumni_Oxonienses-Jas1.csv | 0 assets/{naive-bayesian => }/baileycode.zip | Bin .../chiffres_romains_arabes.txt | 0 assets/{json-and-jq => }/count_retweets.txt | 0 assets/{json-and-jq => }/filter_retweets.txt | 0 assets/{json-and-jq => }/jq_rkm.json | 0 assets/{json-and-jq => }/jq_twitter.json | 0 en/lessons/extracting-keywords.md | 2 +- ...ing-an-ordered-data-set-from-an-OCR-text-file.md | 2 +- en/lessons/json-and-jq.md | 10 +++++----- en/lessons/naive-bayesian.md | 2 +- fr/lecons/generer-jeu-donnees-texte-ocr.md | 2 +- 13 files changed, 9 insertions(+), 9 deletions(-) rename assets/{generating-an-ordered-data-set-from-an-OCR-text-file => }/Roman_to_Arabic.txt (100%) rename assets/{extracting-keywords => }/The_Dataset_-_Alumni_Oxonienses-Jas1.csv (100%) rename assets/{naive-bayesian => }/baileycode.zip (100%) rename assets/{generer-jeu-donnees-texte-ocr => }/chiffres_romains_arabes.txt (100%) rename assets/{json-and-jq => }/count_retweets.txt (100%) rename assets/{json-and-jq => }/filter_retweets.txt (100%) rename assets/{json-and-jq => }/jq_rkm.json (100%) rename assets/{json-and-jq => }/jq_twitter.json (100%) diff --git a/assets/generating-an-ordered-data-set-from-an-OCR-text-file/Roman_to_Arabic.txt b/assets/Roman_to_Arabic.txt similarity index 100% rename from assets/generating-an-ordered-data-set-from-an-OCR-text-file/Roman_to_Arabic.txt rename to assets/Roman_to_Arabic.txt diff --git a/assets/extracting-keywords/The_Dataset_-_Alumni_Oxonienses-Jas1.csv b/assets/The_Dataset_-_Alumni_Oxonienses-Jas1.csv similarity index 100% rename from assets/extracting-keywords/The_Dataset_-_Alumni_Oxonienses-Jas1.csv rename to assets/The_Dataset_-_Alumni_Oxonienses-Jas1.csv diff --git a/assets/naive-bayesian/baileycode.zip b/assets/baileycode.zip similarity index 100% rename from assets/naive-bayesian/baileycode.zip rename to assets/baileycode.zip diff --git a/assets/generer-jeu-donnees-texte-ocr/chiffres_romains_arabes.txt b/assets/chiffres_romains_arabes.txt similarity index 100% rename from assets/generer-jeu-donnees-texte-ocr/chiffres_romains_arabes.txt rename to assets/chiffres_romains_arabes.txt diff --git a/assets/json-and-jq/count_retweets.txt b/assets/count_retweets.txt similarity index 100% rename from assets/json-and-jq/count_retweets.txt rename to assets/count_retweets.txt diff --git a/assets/json-and-jq/filter_retweets.txt b/assets/filter_retweets.txt similarity index 100% rename from assets/json-and-jq/filter_retweets.txt rename to assets/filter_retweets.txt diff --git a/assets/json-and-jq/jq_rkm.json b/assets/jq_rkm.json similarity index 100% rename from assets/json-and-jq/jq_rkm.json rename to assets/jq_rkm.json diff --git a/assets/json-and-jq/jq_twitter.json b/assets/jq_twitter.json similarity index 100% rename from assets/json-and-jq/jq_twitter.json rename to assets/jq_twitter.json diff --git a/en/lessons/extracting-keywords.md b/en/lessons/extracting-keywords.md index 39f4e72ef..cdf125c0b 100644 --- a/en/lessons/extracting-keywords.md +++ b/en/lessons/extracting-keywords.md @@ -58,7 +58,7 @@ The lesson touches on Regular Expressions, so some readers may find it handy to The first step of this process is to take a look at the data that we will be using in the lesson. As mentioned, the data includes biographical details of approximately 6,692 graduates who began study at the University of Oxford in the early seventeenth century. -[The\_Dataset\_-\_Alumni_Oxonienses-Jas1.csv](/assets/extracting-keywords/The_Dataset_-_Alumni_Oxonienses-Jas1.csv) (1.4MB) +[The\_Dataset\_-\_Alumni_Oxonienses-Jas1.csv](/assets/The_Dataset_-_Alumni_Oxonienses-Jas1.csv) (1.4MB) {% include figure.html filename="extracting-keywords-1.png" caption="Screenshot of the first forty entries in the dataset" %} diff --git a/en/lessons/generating-an-ordered-data-set-from-an-OCR-text-file.md b/en/lessons/generating-an-ordered-data-set-from-an-OCR-text-file.md index e55e15e25..601a5d5ec 100755 --- a/en/lessons/generating-an-ordered-data-set-from-an-OCR-text-file.md +++ b/en/lessons/generating-an-ordered-data-set-from-an-OCR-text-file.md @@ -221,7 +221,7 @@ def rom2ar(rom): return result ``` -(run <[this little script](/assets/generating-an-ordered-data-set-from-an-OCR-text-file/Roman_to_Arabic.txt)> to see in detail how `rome2ar` works. Elegant programming like this can offer insight; like poetry.) +(run <[this little script](/assets/Roman_to_Arabic.txt)> to see in detail how `rome2ar` works. Elegant programming like this can offer insight; like poetry.) ## Some other things we'll need: At the top of your Python module, you're going to want to import some python modules that are a part of the standard library. (see Fred Gibbs's tutorial [*Installing Python Modules with pip*](/lessons/installing-python-modules-pip)). diff --git a/en/lessons/json-and-jq.md b/en/lessons/json-and-jq.md index acabd4ed8..d1ef792a4 100755 --- a/en/lessons/json-and-jq.md +++ b/en/lessons/json-and-jq.md @@ -132,7 +132,7 @@ These set various jq [command-line options, or _flags_](https://stedolan.github. jq operates by way of _filters_: a series of text commands that you can string together, and which dictate how jq should transform the JSON you give it. -To learn the basic jq filters, we'll work with a sample response from the Rijksmuseum API: [rkm.json](/assets/json-and-jq/json-and-jq/jq_rkm.json) +To learn the basic jq filters, we'll work with a sample response from the Rijksmuseum API: [rkm.json](/assets/jq_rkm.json) Select all the text at that link, copy it, and paste it into the "JSON" box at [jq play] on the left hand side. @@ -425,7 +425,7 @@ One of the easiest ways to search and download Twitter data is using the excelle For this lesson, we will use a small sample of 50 public tweets. Clear the "Filter", "JSON" and "Result" boxes on [jq play], and ensure all the checkboxes are unchecked. -[Then copy this sample Twitter data](/assets/json-and-jq/jq_twitter.json) into [jq play]. +[Then copy this sample Twitter data](/assets/jq_twitter.json) into [jq play]. ### One-to-many relationships: Tweet hashtags @@ -895,7 +895,7 @@ You should get the following table: "whiteprivilege",1 ``` -[There are multiple ways to solve this with jq. See my answer here.](/assets/json-and-jq/filter_retweets.txt) +[There are multiple ways to solve this with jq. See my answer here.](/assets/filter_retweets.txt) #### Count total retweets per user @@ -909,7 +909,7 @@ Hints: As a way to verify your results, user `356854246` should have a total retweet count of `51` based on this dataset. -[See my answer.](/assets/json-and-jq/count_retweets.txt) +[See my answer.](/assets/count_retweets.txt) ## Using jq on the command line @@ -959,7 +959,7 @@ This can be useful when downloading JSON with a utility like `wget` for retrievi (See [Automated Downloading with Wget](/lessons/automated-downloading-with-wget) to learn the basics of this other command line program.) ```sh -wget -qO- http://programminghistorian.org/assets/json-and-jq/jq_rkm.json | jq -r '.artObjects[] | [.id, .title, .principalOrFirstMaker, .webImage.url] | @csv' +wget -qO- http://programminghistorian.org/assets/jq_rkm.json | jq -r '.artObjects[] | [.id, .title, .principalOrFirstMaker, .webImage.url] | @csv' ``` Note that you must use the `wget` flag `-qO-` in order to send the output of `wget` into `jq` by way of a shell pipe. diff --git a/en/lessons/naive-bayesian.md b/en/lessons/naive-bayesian.md index 13536eedc..c484070e5 100755 --- a/en/lessons/naive-bayesian.md +++ b/en/lessons/naive-bayesian.md @@ -1462,7 +1462,7 @@ Happy hunting! [A Naive Bayesian in the Old Bailey]: http://digitalhistoryhacks.blogspot.com/2008/05/naive-bayesian-in-old-bailey-part-1.html [Old Bailey digital archive]: http://www.oldbaileyonline.org/ - [A zip file of the scripts]: /assets/naive-bayesian/baileycode.zip + [A zip file of the scripts]: /assets/baileycode.zip [another zip file]: https://doi.org/10.5281/zenodo.13284 [BeautifulSoup]: http://www.crummy.com/software/BeautifulSoup/ [search interface]: http://www.oldbaileyonline.org/forms/formMain.jsp diff --git a/fr/lecons/generer-jeu-donnees-texte-ocr.md b/fr/lecons/generer-jeu-donnees-texte-ocr.md index 1b26d3ed4..51ca6a755 100644 --- a/fr/lecons/generer-jeu-donnees-texte-ocr.md +++ b/fr/lecons/generer-jeu-donnees-texte-ocr.md @@ -234,7 +234,7 @@ def rom2ar(rom): return result ``` -Exécutez ce <[petit script](/assets/generer-jeu-donnees-texte-ocr/chiffres_romains_arabes.txt)> pour voir en détail comment `rome2ar` fonctionne. Une programmation élégante comme celle-ci peut presque s'apparenter à de la poésie. +Exécutez ce <[petit script](/assets/chiffres_romains_arabes.txt)> pour voir en détail comment `rome2ar` fonctionne. Une programmation élégante comme celle-ci peut presque s'apparenter à de la poésie. ## D'autres informations importantes Si vous avez besoin d'importer des modules faisant partie de la bibliothèque standard de Python, il faudra que les premières lignes de votre programme soient les imports de ces modules. Si besoin, voir le tutoriel de Fred Gibbs sur [*l'installation des bibliothèques Python avec pip*](/fr/lecons/installation-modules-python-pip). From 0395f305806c128ca2d3b2c4e90411f796840f48 Mon Sep 17 00:00:00 2001 From: charlottejmc <143802849+charlottejmc@users.noreply.github.com> Date: Fri, 20 Dec 2024 16:17:56 +0900 Subject: [PATCH 7/7] Revert "Sort assets into lesson folders" This reverts commit dc49641117e7873fe805ed32dd8a2d4925f48aa2. --- assets/{analise-sentimento-R-syuzhet => }/domCasmurro.txt | 0 .../ejemplo_introductorio_estados.csv | 0 es/lecciones/administracion-de-datos-en-r.md | 2 +- pt/licoes/analise-sentimento-R-syuzhet.md | 6 +++--- 4 files changed, 4 insertions(+), 4 deletions(-) rename assets/{analise-sentimento-R-syuzhet => }/domCasmurro.txt (100%) rename assets/{administracion-de-datos-en-r => }/ejemplo_introductorio_estados.csv (100%) diff --git a/assets/analise-sentimento-R-syuzhet/domCasmurro.txt b/assets/domCasmurro.txt similarity index 100% rename from assets/analise-sentimento-R-syuzhet/domCasmurro.txt rename to assets/domCasmurro.txt diff --git a/assets/administracion-de-datos-en-r/ejemplo_introductorio_estados.csv b/assets/ejemplo_introductorio_estados.csv similarity index 100% rename from assets/administracion-de-datos-en-r/ejemplo_introductorio_estados.csv rename to assets/ejemplo_introductorio_estados.csv diff --git a/es/lecciones/administracion-de-datos-en-r.md b/es/lecciones/administracion-de-datos-en-r.md index 4db485e05..e47e57872 100644 --- a/es/lecciones/administracion-de-datos-en-r.md +++ b/es/lecciones/administracion-de-datos-en-r.md @@ -78,7 +78,7 @@ Copia el siguiente código en R Studio. Para ejecutarlo tienes que marcar las l ``` ## Un ejemplo de dplyr en acción -Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/administracion-de-datos-en-r/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial. +Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial. Como los datos están en un archivo CSV, vamos a usar el comando de lectura ```read_csv()``` en el paquete [readr](https://cran.r-project.org/web/packages/readr/vignettes/readr.html) de "tidyverse". diff --git a/pt/licoes/analise-sentimento-R-syuzhet.md b/pt/licoes/analise-sentimento-R-syuzhet.md index 77ea8ff71..93676c69e 100644 --- a/pt/licoes/analise-sentimento-R-syuzhet.md +++ b/pt/licoes/analise-sentimento-R-syuzhet.md @@ -171,7 +171,7 @@ library(tm) ## Carregar e preparar o texto -Faça o download do texto do romance [Dom Casmurro](/assets/analise-sentimento-R-syuzhet/domCasmurro.txt). Como podemos ver, o documento está em formato de texto simples, pois isto é essencial para realizar seu processamento e análise em R. +Faça o download do texto do romance [Dom Casmurro](/assets/domCasmurro.txt). Como podemos ver, o documento está em formato de texto simples, pois isto é essencial para realizar seu processamento e análise em R. Com o texto em mãos, a primeira coisa que vamos fazer é carregá-lo como um objeto de _string_. Certifique-se de mudar o caminho para o texto para corresponder ao seu computador. @@ -180,7 +180,7 @@ Com o texto em mãos, a primeira coisa que vamos fazer é carregá-lo como um ob Em sistemas Mac podemos usar a função `get_text_as_string` integrada no pacote `syuzhet`: ```R -texto <- get_text_as_string("https://raw.githubusercontent.com/programminghistorian/jekyll/gh-pages/assets/analise-sentimento-R-syuzhet/domCasmurro.txt") +texto <- get_text_as_string("https://raw.githubusercontent.com/programminghistorian/jekyll/gh-pages/assets/domCasmurro.txt") ``` **Em Windows** @@ -188,7 +188,7 @@ texto <- get_text_as_string("https://raw.githubusercontent.com/programminghistor Os sistemas Windows não lêem diretamente caracteres com acentos ou outras marcações típicas do espanhol, português ou francês, então temos que dizer ao sistema que o nosso texto está no formato UTF-8 usando a função `scan`. ```R -texto <- scan(file = "https://raw.githubusercontent.com/programminghistorian/jekyll/gh-pages/assets/analise-sentimento-R-syuzhet/domCasmurro.txt", fileEncoding = "UTF-8", what = character(), sep = "\n", allowEscapes = T) +texto <- scan(file = "https://raw.githubusercontent.com/programminghistorian/jekyll/gh-pages/assets/domCasmurro.txt", fileEncoding = "UTF-8", what = character(), sep = "\n", allowEscapes = T) ``` Como a análise que vamos realizar precisa de uma lista, seja de palavras ou de frases (aqui só prestaremos atenção a palavras individuais), precisamos de um passo intermediário entre o carregamento do texto e a extração dos valores de sentimento. Assim, vamos dividir o texto (*string*) em uma lista de palavras (*tokens*). Isto é muito comum na análise distante de textos.