In this section, we reuse the Montréal cyclists data from part 1 of this cookbook:
(def bikes
(g/read-csv! bikes-data-path {:delimiter ";"
:encoding "ISO-8859-1"
:kebab-columns true}))
Firstly, we note that the :date
column is parsed as string:
(g/dtypes bikes)
=> {:date "StringType",
:st-urbain-donnees-non-disponibles "StringType",
:du-parc "IntegerType",
:maisonneuve-1 "IntegerType",
:rachel-1 "IntegerType",
:pierre-dupuy "IntegerType",
:berri-1 "IntegerType",
:cote-sainte-catherine "IntegerType",
:brebeuf-donnees-non-disponibles "StringType",
:maisonneuve-2 "IntegerType"}
To parse the date column into the date type, we use g/to-date
and specify the date format as "dd/M/yyyy"
. We can use the function g/with-column
to add (or, in this case, replace) the :date
column with its new column value. Moreover, to calculate the day of the week, we use a similar trick by parsing the same date column with the weekday-only format "EEEE"
:
(def berri-bikes
(-> bikes
(g/with-column :date (g/to-date :date "dd/M/yyyy"))
(g/with-column :weekday (g/date-format :date "EEEE"))
(g/select :date :weekday :berri-1)))
(g/dtypes berri-bikes)
=> {:date "DateType", :weekday "StringType", :berri-1 "IntegerType"}
(g/show berri-bikes)
; +----------+---------+-------+
; |date |weekday |berri-1|
; +----------+---------+-------+
; |2012-01-01|Sunday |35 |
; |2012-01-02|Monday |83 |
; |2012-01-03|Tuesday |135 |
; |2012-01-04|Wednesday|144 |
; |2012-01-05|Thursday |197 |
; |2012-01-06|Friday |146 |
; |2012-01-07|Saturday |98 |
; |2012-01-08|Sunday |95 |
; |2012-01-09|Monday |244 |
; |2012-01-10|Tuesday |397 |
; |2012-01-11|Wednesday|273 |
; |2012-01-12|Thursday |157 |
; |2012-01-13|Friday |75 |
; |2012-01-14|Saturday |32 |
; |2012-01-15|Sunday |54 |
; |2012-01-16|Monday |168 |
; |2012-01-17|Tuesday |155 |
; |2012-01-18|Wednesday|139 |
; |2012-01-19|Thursday |191 |
; |2012-01-20|Friday |161 |
; +----------+---------+-------+
; only showing top 20 rows
To add up the cyclists by weekday, we compose g/group-by
with g/sum
:
(-> berri-bikes
(g/group-by :weekday)
(g/sum :berri-1)
g/show)
; +---------+------------+
; |weekday |sum(berri-1)|
; +---------+------------+
; |Wednesday|152972 |
; |Tuesday |135305 |
; |Friday |141771 |
; |Thursday |160131 |
; |Saturday |101578 |
; |Monday |134298 |
; |Sunday |99310 |
; +---------+------------+
We may not like the default sum(berri-1)
. We can rename it using g/select
in part 1 or, alternatively, we can do an aggregate using g/agg
with a rename map:
(-> berri-bikes
(g/group-by :weekday)
(g/agg {:n-cyclists (g/sum :berri-1)})
g/show)
; +---------+----------+
; |weekday |n-cyclists|
; +---------+----------+
; |Wednesday|152972 |
; |Tuesday |135305 |
; |Friday |141771 |
; |Thursday |160131 |
; |Saturday |101578 |
; |Monday |134298 |
; |Sunday |99310 |
; +---------+----------+