Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish /fr/lecons/gestion-manipulation-donnees-r #3430

Open
wants to merge 28 commits into
base: gh-pages
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
5715097
Upload FR images to data-wrangling-and-management-in-R
anisa-hawes Dec 11, 2024
48d5da9
Create gestion-manipulation-donnees-r.md
anisa-hawes Dec 11, 2024
6997729
Update gestion-manipulation-donnees-r.md
anisa-hawes Dec 11, 2024
e1d0617
Create assets directory /data-wrangling-and-management-in-R
anisa-hawes Dec 12, 2024
a98a89e
Update data-wrangling-and-management-in-R.md
anisa-hawes Dec 12, 2024
09f803d
Update administracion-de-datos-en-r.md
anisa-hawes Dec 12, 2024
cdd4430
Update gestion-manipulation-donnees-r.md
anisa-hawes Dec 12, 2024
7c9e182
Move ejemplo_introductorio_estados.csv
anisa-hawes Dec 12, 2024
68639bd
Update gestion-manipulation-donnees-r.md
anisa-hawes Dec 12, 2024
f7495d3
Rename directory
anisa-hawes Dec 12, 2024
0a3fd0f
Update data-wrangling-and-management-in-r.md
anisa-hawes Dec 12, 2024
981b709
Update manipulacao-transformacao-dados-r.md
anisa-hawes Dec 12, 2024
461dc30
Delete images/data-wrangling-and-management-in-R directory
anisa-hawes Dec 12, 2024
07901a6
Upload /images/data-wrangling-and-management-in-r
anisa-hawes Dec 12, 2024
9b35bee
Update gestion-manipulation-donnees-r.md
anisa-hawes Dec 12, 2024
483b88b
Update administracion-de-datos-en-r.md
anisa-hawes Dec 12, 2024
9a96668
Update manipulacao-transformacao-dados-r.md
anisa-hawes Dec 12, 2024
7fa1c4b
Update data-wrangling-and-management-in-r.md
anisa-hawes Dec 12, 2024
ab35a6f
Rename data-wrangling-and-management-in-r.png
anisa-hawes Dec 12, 2024
7cf4d8b
Update beginners-guide-to-twitter-data.md
anisa-hawes Dec 12, 2024
2ac2d30
Update geospatial-data-analysis.md
anisa-hawes Dec 12, 2024
a0d62f8
Update sentiment-analysis-syuzhet.md
anisa-hawes Dec 12, 2024
6f08654
Update shiny-leaflet-newspaper-map-tutorial.md
anisa-hawes Dec 12, 2024
e2b02b0
Update analise-sentimento-R-syuzhet.md
anisa-hawes Dec 12, 2024
056c8db
Update aplicacao-web-interativa-r-shiny-leaflet.md
anisa-hawes Dec 12, 2024
dfdfc89
Update visualizacao-animacao-tabelas-historicas-R.md
anisa-hawes Dec 12, 2024
7d72f8e
Update aplicacao-web-interativa-r-shiny-leaflet.md
anisa-hawes Dec 12, 2024
39c5bde
Merge branch 'gh-pages' into publish-gestion-manipulation-donnees-r
Dec 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion en/lessons/beginners-guide-to-twitter-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ At this point, your data has gone from the long list of single tweet IDs to a ro

Each tweet now has lots of useful metadata, including the time created, the included hashtags, number of retweets and favorites, and some geo info. One can imagine how this information can be used for a wide variety of explorations, including to map discourse around an issue on social media, explore the relationship between sentiment and virality, or even text analysis of language of the tweets.

All of these processes will probably include some light data work to format this dataset so that you can produce useful insights: [statistical analyses](/en/lessons/data-wrangling-and-management-in-R), [maps](/en/lessons/mapping-with-python-leaflet), [social network analyses](/en/lessons/exploring-and-analyzing-network-data-with-python), [discourse analyses](/en/lessons/corpus-analysis-with-antconc). But regardless of where you go from here, you have a pretty robust dataset that can be used for a variety of academic pursuits.
All of these processes will probably include some light data work to format this dataset so that you can produce useful insights: [statistical analyses](/en/lessons/data-wrangling-and-management-in-r), [maps](/en/lessons/mapping-with-python-leaflet), [social network analyses](/en/lessons/exploring-and-analyzing-network-data-with-python), [discourse analyses](/en/lessons/corpus-analysis-with-antconc). But regardless of where you go from here, you have a pretty robust dataset that can be used for a variety of academic pursuits.

You might have noticed we didn't get any latitude/longitude location information, but we did get a "place" column with less exact, textualized location information. Non-coordinate location data needs to be [geocoded](https://en.wikipedia.org/wiki/Geocode), which in this case means using a geocoder to [geoparse](https://en.wikipedia.org/wiki/Toponym_Resolution#Geoparsing) the reported locations and assign lat/long values to them. Different programs do this to greater or lesser success. [Tableau](https://www.tableau.com), for instance, has a hard time interpolating a set of locations if it's not at a consistent geographical level (city, state, etc.). For that reason, I generated latitude and longitude information with the Google geocoder following this *Programming Historian* [lesson](/en/lessons/mapping-with-python-leaflet), and then inputted that information into Tableau for mapping. There's plenty of good mapping [tools](https://digitalfellows.commons.gc.cuny.edu/2019/06/03/finding-the-right-tools-for-mapping/) out there that you can feel free to use: the key here is getting specific, accurate location information from the list of place names in the dataset.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Data Wrangling and Management in R
slug: data-wrangling-and-management-in-R
slug: data-wrangling-and-management-in-r
layout: lesson
collection: lessons
authors:
Expand Down Expand Up @@ -126,7 +126,7 @@ An Example of dplyr in Action
Let's go through an example to see how dplyr can aid us as historians by
inputting U.S. decennial census data from 1790 to 2010. Download the
data by [clicking
here](/assets/introductory_state_example.csv)
here](/assets/data-wrangling-and-management-in-r/introductory_state_example.csv)
and place it in the folder that you will use to work through the examples
in this tutorial.

Expand Down Expand Up @@ -164,7 +164,7 @@ time.
geom_line() +
geom_point()

{% include figure.html filename="en-or-data-wrangling-and-management-in-R-01.png" caption="Graph of California and New York population" %}
{% include figure.html filename="en-or-data-wrangling-and-management-in-r-01.png" caption="Graph of California and New York population" %}

As we can see, the population of California has grown considerably
compared to New York. While this particular example may seem obvious
Expand All @@ -182,7 +182,7 @@ with two different states such as Mississippi and Virginia.
geom_line() +
geom_point()

{% include figure.html filename="en-or-data-wrangling-and-management-in-R-02.png" caption="Graph of Mississippi and Virginia population" %}
{% include figure.html filename="en-or-data-wrangling-and-management-in-r-02.png" caption="Graph of Mississippi and Virginia population" %}

Quickly making changes to our code and reanalyzing our data is a
fundamental part of exploratory data analysis (EDA). Rather than trying
Expand Down Expand Up @@ -579,7 +579,7 @@ colleges founded before the U.S. War of 1812:
geom_bar(aes(x=is_secular, fill=is_secular))+
labs(x="Is the college secular?")

{% include figure.html filename="en-or-data-wrangling-and-management-in-R-03.png" caption="Number of secular and non-secular colleges before War of 1812" %}
{% include figure.html filename="en-or-data-wrangling-and-management-in-r-03.png" caption="Number of secular and non-secular colleges before War of 1812" %}

Again, by making a quick change to our code, we can also look at the
number of secular versus non-secular colleges founded after the start of
Expand All @@ -593,7 +593,7 @@ the War of 1812:
geom_bar(aes(x=is_secular, fill=is_secular))+
labs(x="Is the college secular?")

({% include figure.html filename="en-or-data-wrangling-and-management-in-R-04.png" caption="Number of secular and non-secular colleges after War of 1812" %}
({% include figure.html filename="en-or-data-wrangling-and-management-in-r-04.png" caption="Number of secular and non-secular colleges after War of 1812" %}

Conclusion
==========
Expand Down
2 changes: 1 addition & 1 deletion en/lessons/geospatial-data-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ Now we have a large dataframe called `County_Aggregate_Data` which has our count
```r
religion <- read.csv("./data/Religion/Churches.csv", as.is=TRUE)
```
Depending on the state of the data you may need to do some data transformations in order to merge it back with the DataFrame. For complex transformations, see tutorials in R on working with data such as [Data Wrangling and Management in R tutorial](/en/lessons/data-wrangling-and-management-in-R) [data transforms](http://r4ds.had.co.nz/transform.html). In essence, you need to have a common field in both datasets to merge upon. Often this is a geographic id for the county and state represented by `GEOID`. It could also be the unique FIPS Code given by the US Census. Below I am using state and county `GEOID`. In this example, we are converting one data frame's common fields to numeric so that they match the variable type of the other dataframe:
Depending on the state of the data you may need to do some data transformations in order to merge it back with the DataFrame. For complex transformations, see tutorials in R on working with data such as [Data Wrangling and Management in R tutorial](/en/lessons/data-wrangling-and-management-in-r) [data transforms](http://r4ds.had.co.nz/transform.html). In essence, you need to have a common field in both datasets to merge upon. Often this is a geographic id for the county and state represented by `GEOID`. It could also be the unique FIPS Code given by the US Census. Below I am using state and county `GEOID`. In this example, we are converting one data frame's common fields to numeric so that they match the variable type of the other dataframe:

```r
religion$STATEFP <- religion$STATE
Expand Down
2 changes: 1 addition & 1 deletion en/lessons/sentiment-analysis-syuzhet.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Although the lesson is not intended for advanced R users, it is expected that yo

* Taylor Arnold and Lauren Tilton, '[Basic Text Processing in R](/en/lessons/basic-text-processing-in-r)', *Programming Historian* 6 (2017), https://doi.org/10.46430/phen0061
* Taryn Dewar, '[R Basics with Tabular Data](/en/lessons/r-basics-with-tabular-data)', *Programming Historian* 5 (2016), https://doi.org/10.46430/phen0056
* Nabeel Siddiqui, '[Data Wrangling and Management in R](/en/lessons/data-wrangling-and-management-in-R)', *Programming Historian* 6 (2017), https://doi.org/10.46430/phen0063
* Nabeel Siddiqui, '[Data Wrangling and Management in R](/en/lessons/data-wrangling-and-management-in-r)', *Programming Historian* 6 (2017), https://doi.org/10.46430/phen0063

You may also be interested in other sentiment analysis lessons:

Expand Down
4 changes: 2 additions & 2 deletions en/lessons/shiny-leaflet-newspaper-map-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ In this lesson, you will learn:
- The concept and practice of 'reactive programming', as implemented by Shiny applications. Specifically, you'll learn how you can use Shiny to 'listen' for certain inputs, and how they are connected to outputs displayed in your app.

<div class="alert alert-info">
Note that this lesson doesn't teach any coding in R, other than what's necessary to create the web application, nor does it cover publishing the finished application to the web. A basic knowledge of R, particularly using the <a href='/en/lessons/data-wrangling-and-management-in-R'>tidyverse</a>, would be very useful.
Note that this lesson doesn't teach any coding in R, other than what's necessary to create the web application, nor does it cover publishing the finished application to the web. A basic knowledge of R, particularly using the <a href='/en/lessons/data-wrangling-and-management-in-r'>tidyverse</a>, would be very useful.
</div>

### Graphical User Interfaces and the Digital Humanities
Expand Down Expand Up @@ -108,7 +108,7 @@ First, however, you need to set up the correct programming environment and creat

To get started with this tutorial, you should install the latest versions of [R](https://cran.rstudio.com/) and [Rstudio](https://www.rstudio.com/products/rstudio/download/) on your local machine. The R programming language has a very popular IDE (Integrated Development Environment) called RStudio, which is often used alongside R, as it provides a large set of features to make coding in the language more convenient. We'll use RStudio throughout the lesson.

Previous *Programming Historian* lessons have covered [working with R](/en/lessons/r-basics-with-tabular-data) and [working with the tidyverse](/en/lessons/data-wrangling-and-management-in-R). It would be useful to go through these lessons beforehand, to learn the basics of installing R and using the tidyverse for data wrangling.
Previous *Programming Historian* lessons have covered [working with R](/en/lessons/r-basics-with-tabular-data) and [working with the tidyverse](/en/lessons/data-wrangling-and-management-in-r). It would be useful to go through these lessons beforehand, to learn the basics of installing R and using the tidyverse for data wrangling.

### Create a new RStudio Project

Expand Down
4 changes: 2 additions & 2 deletions es/lecciones/administracion-de-datos-en-r.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ translation-reviewer:
- Victor Gayol
review-ticket: https://github.com/programminghistorian/ph-submissions/issues/199
layout: lesson
original: data-wrangling-and-management-in-R
original: data-wrangling-and-management-in-r
difficulty: 2
activity: transforming
topics: [data-manipulation, data-management, distant-reading, r, data-visualization]
Expand Down Expand Up @@ -78,7 +78,7 @@ Copia el siguiente código en R Studio. Para ejecutarlo tienes que marcar las l
```

## Un ejemplo de dplyr en acción
Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial.
Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/data-wrangling-and-management-in-r/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial.

Como los datos están en un archivo CSV, vamos a usar el comando de lectura ```read_csv()``` en el paquete [readr](https://cran.r-project.org/web/packages/readr/vignettes/readr.html) de "tidyverse".

Expand Down
Loading
Loading