Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Robinlovelace committed Apr 9, 2024
1 parent ffe41fd commit 7235b2b
Show file tree
Hide file tree
Showing 6 changed files with 154 additions and 24 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
^docs$
^pkgdown$
^LICENSE\.md$
^data$
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: opsnap
Title: Get Open Data from Operation Snap Police Records
Title: Operation Snap Police Records
Version: 0.0.0.9000
Authors@R: c(
person("Robin", "Lovelace", email = "[email protected]", role = c("aut", "cre"),
Expand Down
107 changes: 87 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,35 +56,96 @@ names(d)

| mode | make | model | colour | offence | district | disposal | date | location |
|:---------------|:--------|:------|:-------|:---------------------------------------------------------|:---------|:-------------------|:-----------|:---------------------------------|
| Cyclist | Honda | JAZZ | BLUE | RT88576 Drive without reasonable consideration to others | BD | Educational Course | 2023-10-01 | A650 SIR FRED HOYLE WAY, BINGLEY |
| Cyclist | Citroen | DS3 | WHITE | RT88576 Drive without reasonable consideration to others | BD | Educational Course | 2023-10-01 | DALTON BANK ROAD, HUDDERSFIELD |
| Vehicle driver | Audi | S3 | BLACK | RT88760 Fail to comply with solid white lines | LD | Educational Course | 2023-10-01 | A1 North Wetherby, Leeds |
| cyclist | Honda | JAZZ | BLUE | rt88576 drive without reasonable consideration to others | BD | educational course | 2023-10-01 | A650 SIR FRED HOYLE WAY, BINGLEY |
| cyclist | Citroen | DS3 | WHITE | rt88576 drive without reasonable consideration to others | BD | educational course | 2023-10-01 | DALTON BANK ROAD, HUDDERSFIELD |
| vehicle driver | Audi | S3 | BLACK | rt88760 fail to comply with solid white lines | LD | educational course | 2023-10-01 | A1 North Wetherby, Leeds |

# Preliminary analysis

There are 18363 records in the data, with increasing numbers of records
over time (average n. records per month shown below):

<img src="man/figures/README-unnamed-chunk-8-1.png"
style="width:100.0%" />

As shown in the graph above, 68.9% have values for the ‘offence’ column.
Many records lack either an offence or a location, leaving only 48.1% or
8832 complete records.

There are 6478 unique location text strings (addresses) in the data,
As shown in the graph above, 100% have values for the ‘offence’ column.
Many records lack either an offence or a location, leaving only 69.6% or
12782 complete records.

The breakdown of records by mode of transport (of the observer) is shown
below:

| mode | n | percent_records |
|:------------------|-----:|:----------------|
| vehicle driver | 9167 | 49.92% |
| cyclist | 6312 | 34.37% |
| pedestrian | 1352 | 7.36% |
| vehicle passenger | 579 | 3.15% |
| unknown | 497 | 2.71% |
| horse rider | 407 | 2.22% |
| motorcyclist | 48 | 0.26% |
| NA | 1 | 0.01% |

The offence text strings are quite long, with the most common offences
shown below:

| offence | n | percent_records |
|:-------------------------------------------------------------------------------------------------------|-----:|:----------------|
| n/a | 5706 | 31.0734% |
| rt88576 drive without reasonable consideration to others | 4992 | 27.1851% |
| rt88575 drive without due care and attention | 2917 | 15.8852% |
| rt88975 drive motor vehicle fail to comply with red / green arrow / lane closure traffic light signals | 1364 | 7.4280% |
| rt88971 fail to comply with red traffic light | 679 | 3.6977% |
| rt88966 motor vehicle fail to comply with endorsable s36 traffic sign | 411 | 2.2382% |
| rv86019 use a handheld phone / device whilst driving a motor vehicle on a road | 357 | 1.9441% |
| rt88760 fail to comply with solid white lines | 265 | 1.4431% |
| rt88751 contravene give way sign | 264 | 1.4377% |
| suspected contravene weight restriction. | 213 | 1.1599% |

The equivalent table excluding records with missing offence data is
shown below:

| offence | n | percent_records |
|:-------------------------------------------------------------------------------------------------------|-----:|:----------------|
| n/a | 5706 | 31.0750% |
| rt88576 drive without reasonable consideration to others | 4992 | 27.1866% |
| rt88575 drive without due care and attention | 2917 | 15.8861% |
| rt88975 drive motor vehicle fail to comply with red / green arrow / lane closure traffic light signals | 1364 | 7.4284% |
| rt88971 fail to comply with red traffic light | 679 | 3.6979% |
| rt88966 motor vehicle fail to comply with endorsable s36 traffic sign | 411 | 2.2383% |
| rv86019 use a handheld phone / device whilst driving a motor vehicle on a road | 357 | 1.9442% |
| rt88760 fail to comply with solid white lines | 265 | 1.4432% |
| rt88751 contravene give way sign | 264 | 1.4378% |
| suspected contravene weight restriction. | 213 | 1.1600% |

In terms ‘disposal’, the most common values are shown below:

| disposal | n | percent_records |
|:-------------------|-----:|:----------------|
| educational course | 9806 | 53.40% |
| nfa | 5697 | 31.02% |
| conditional offer | 2326 | 12.67% |
| court | 307 | 1.67% |
| dsit investigation | 202 | 1.10% |
| rpu investigation | 23 | 0.13% |
| fine | 1 | 0.01% |
| NA | 1 | 0.01% |

There are 8800 unique location text strings (addresses) in the data,
with the most common locations shown below:

| location | n | percent_records |
|:-----------------------------------------------|----:|:----------------|
| Meanwood Road, Leeds | 34 | 0.385% |
| Westgate J/W Park Square West, Leeds | 31 | 0.351% |
| Dewsbury Road, Ossett | 29 | 0.328% |
| Chapeltown Road, Leeds | 24 | 0.272% |
| Highgate Road, Bradford | 22 | 0.249% |
| M62 EASTBOUND, BRIGHOUSE | 19 | 0.215% |
| Clayton Road, Bradford | 18 | 0.204% |
| Tongue Lane, Leeds | 18 | 0.204% |
| WESTGATE junction with PARK SQUARE WEST, LEEDS | 18 | 0.204% |
| Manchester Road, Bradford | 17 | 0.192% |
| Meanwood Road, Leeds | 49 | 0.3834% |
| Dewsbury Road, Ossett | 48 | 0.3755% |
| Westgate J/W Park Square West, Leeds | 38 | 0.2973% |
| Chapeltown Road, Leeds | 35 | 0.2738% |
| Park Square West, Leeds | 33 | 0.2582% |
| WESTGATE junction with PARK SQUARE WEST, LEEDS | 33 | 0.2582% |
| Hollingwood Lane, Bradford | 27 | 0.2112% |
| Highgate Road, Bradford | 26 | 0.2034% |
| Cemetery Road, Bradford | 25 | 0.1956% |
| Tongue Lane, Leeds | 25 | 0.1956% |

# Geocoding

Expand All @@ -96,16 +157,22 @@ d_sf = opsnap:::op_geocode(d_sample)
mapview::mapview(d_sf)
```

# Analysis
After geocoding all records we kept only those within the boundary of
West Yorkshire, which removed another 3% of records.

# Location of incidents

Due to inaccuracy in the geocoding, we only know the locations of the
records to within around 500m of each crash (although we can link to
specific roads). We’ll present the geographic distribution of crashes
using a 500m grid:

<img src="man/figures/README-unnamed-chunk-12-1.png"
<img src="man/figures/README-unnamed-chunk-16-1.png"
style="width:100.0%" />

The map above represents 8607 incidents in West Yorkshire with an
offence that could be geocoded.

<!-- The results show there is one outlier with a very high number of crashes. We can remove this and plot the data again: -->
<!-- You can query the data downloaded with `opsnap` functions, e.g. as follows (results not shown): -->
<!-- Let's make a plot of the data: -->
68 changes: 65 additions & 3 deletions README.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ usethis::use_package("stplanr")
usethis::use_package("ggplot2")
usethis::use_package("dplyr")
usethis::use_r("opsnap")
# Rbuildignore the data folder:
usethis::use_build_ignore("data")
# MIT license:
usethis::use_mit_license("Leeds Institute for Transport Studies")
Expand Down Expand Up @@ -128,11 +131,19 @@ names(d)
```{r}
#| echo: false
d_all = read_csv(file_name)
d_all = d_all |>
mutate(
mode = tolower(mode),
offence = tolower(offence),
disposal = tolower(disposal)
)
d_all |>
head(3) |>
knitr::kable()
```

# Preliminary analysis

There are `r nrow(d_all)` records in the data, with increasing numbers of records over time (average n. records per month shown below):

```{r}
Expand Down Expand Up @@ -169,6 +180,52 @@ d_monthly |>
As shown in the graph above, `r round(nrow(d_offence) / nrow(d_all) * 100, 1)`% have values for the 'offence' column.
Many records lack either an offence or a location, leaving only `r round(nrow(d) / nrow(d_all) * 100, 1)`% or `r nrow(d)` complete records.

The breakdown of records by mode of transport (of the observer) is shown below:

```{r}
d_all |>
count(mode, sort = TRUE) |>
mutate(percent_records = n / nrow(d_all)) |>
mutate(percent_records = scales::percent(percent_records)) |>
arrange(desc(n)) |>
knitr::kable()
```

The offence text strings are quite long, with the most common offences shown below:

```{r}
d_all |>
count(offence, sort = TRUE) |>
mutate(percent_records = n / nrow(d_all)) |>
mutate(percent_records = scales::percent(percent_records)) |>
arrange(desc(n)) |>
head(10) |>
knitr::kable()
```

The equivalent table excluding records with missing offence data is shown below:

```{r}
d_offence |>
count(offence, sort = TRUE) |>
mutate(percent_records = n / nrow(d_offence)) |>
mutate(percent_records = scales::percent(percent_records)) |>
arrange(desc(n)) |>
head(10) |>
knitr::kable()
```

In terms 'disposal', the most common values are shown below:

```{r}
d_all |>
count(disposal, sort = TRUE) |>
mutate(percent_records = n / nrow(d_all)) |>
mutate(percent_records = scales::percent(percent_records)) |>
arrange(desc(n)) |>
knitr::kable()
```

There are `r unique(d$location) |> length()` unique location text strings (addresses) in the data, with the most common locations shown below:

```{r}
Expand Down Expand Up @@ -202,8 +259,8 @@ d_geocoded = opsnap:::op_geocode(d)
table(d_geocoded$address) |>
sort() |>
tail(1)
d_geometries = d_geocoded |>
filter(address != "NA, West Yorkshire") |>
# d_geometries = d_geocoded |>
# filter(address != "NA, West Yorkshire") |>
sf::st_as_sf(coords = c("long", "lat"), crs = 4326)
d_sf = sf::st_sf(
d,
Expand All @@ -212,6 +269,7 @@ d_sf = sf::st_sf(
west_yorkshire = pct::get_pct_zones("west-yorkshire")
sf::sf_use_s2(FALSE)
d_sf_wy = d_sf[west_yorkshire, ]
proportion_outside_wy = 1 - nrow(d_sf_wy) / nrow(d_sf)
# Sanity check
d_sf_wy |>
sample_n(1000) |>
Expand All @@ -221,7 +279,9 @@ d_sf_wy |>
sf::write_sf(d_sf_wy, paste0("data/west-yorkshire/operation_snap_geocoded_", date_str, ".gpkg"))
```

# Analysis
After geocoding all records we kept only those within the boundary of West Yorkshire, which removed another 3% of records.

# Location of incidents

Due to inaccuracy in the geocoding, we only know the locations of the records to within around 500m of each crash (although we can link to specific roads).
We'll present the geographic distribution of crashes using a 500m grid:
Expand All @@ -235,6 +295,8 @@ raster_count = rasterize(d_projected, raster_template, fun = "length")
plot(raster_count)
```

The map above represents `r nrow(d_sf)` incidents in West Yorkshire with an offence that could be geocoded.

<!-- The results show there is one outlier with a very high number of crashes. We can remove this and plot the data again: -->

```{r}
Expand Down
Binary file added man/figures/README-unnamed-chunk-16-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified man/figures/README-unnamed-chunk-8-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 7235b2b

Please sign in to comment.