Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce package(s) size #87

Open
pawelru opened this issue Nov 14, 2024 · 1 comment
Open

Reduce package(s) size #87

pawelru opened this issue Nov 14, 2024 · 1 comment
Labels

Comments

@pawelru
Copy link
Collaborator

pawelru commented Nov 14, 2024

Have a look at the formatters:

❯ R CMD BUILD formatters
* checking for file ‘formatters/DESCRIPTION’ ... OK
* preparing ‘formatters’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘formatters_0.5.9.9003.tar.gz’

❯ tar -tzvf formatters_0.5.9.9003.tar.gz | sort -k3 -hr
-rw-r--r--  0 ruckip staff 1847812 Dec  6  2023 formatters/man/figures/logo.png
-rw-r--r--  0 ruckip staff  925622 Nov  6  2023 formatters/data/ex_advs.rda
-rw-r--r--  0 ruckip staff  629211 Nov  6  2023 formatters/data/ex_adqs.rda
-rw-r--r--  0 ruckip staff  493724 Nov  6  2023 formatters/data/ex_adlb.rda
-rw-r--r--  0 ruckip staff   58553 Jun 10 11:05 formatters/R/pagination.R
-rw-r--r--  0 ruckip staff   58457 Nov  6  2023 formatters/data/ex_adae.rda
-rw-r--r--  0 ruckip staff   56998 Jun 10 11:05 formatters/R/tostring.R
-rw-r--r--  0 ruckip staff   50367 Nov  6  2023 formatters/data/ex_admh.rda
-rw-r--r--  0 ruckip staff   48949 Nov  6  2023 formatters/data/ex_adcm.rda
-rw-r--r--  0 ruckip staff   45794 Oct 11 10:16 formatters/R/matrix_form.R
-rw-r--r--  0 ruckip staff   43801 Nov  6  2023 formatters/data/ex_adaette.rda
-rw-r--r--  0 ruckip staff   42991 Nov  6  2023 formatters/data/ex_adrs.rda
-rw-r--r--  0 ruckip staff   42770 Nov  6  2023 formatters/data/ex_adtte.rda
-rw-r--r--  0 ruckip staff   28158 Aug 13 14:33 formatters/tests/testthat/test-formatters.R
-rw-r--r--  0 ruckip staff   25210 Nov 14 18:02 formatters/inst/doc/formatters.html
-rw-r--r--  0 ruckip staff   23892 Jun 10 11:05 formatters/R/mpf_exporters.R
(...)

when untared:

❯ total_size=$(du -sk . | awk '{print $1}')
find . -type f -exec du -sk {} + | awk -v total_size=$total_size '{printf "%s %d KB (%.2f%%)\n", $2, $1, ($1/total_size)*100}' | sort -k2,2nr
./man/figures/logo.png 1808 KB (37.05%)
./data/ex_advs.rda 904 KB (18.52%)
./data/ex_adqs.rda 616 KB (12.62%)
./data/ex_adlb.rda 484 KB (9.92%)
./R/pagination.R 60 KB (1.23%)
  • 37% of package size is consumed by the logo alone!
  • looking into the codebase of that repo - we are not using ex_ datasets at all. But it might be used elsewhere. There are few packages that are using ex_adlb but I cannot find any usage of ex_advs and ex_adqs.

Adding this in general task repo to repeat the analysis for multiple packages and find a common solution.

It's important because of:

  • R CMD CHECK note about size
  • WebR with package downloads on the client (website user) side so we aim to have small and fast installable packages. I'm seeing two ways of doing this: limiting deps (not discussed here) and reducing the size (this issue). Currently it's neither good or bad. Have a look at the below extract from a random shinylive teal app in the tlg-catalog (devtools in chrome - network tab - open shinylive tab). This is filtered to downloaded .tgz files and order by size.
    image
@llrs-roche
Copy link

llrs-roche commented Dec 10, 2024

Reducing the logo file size is relatively easy (using an online tool reduced the size by 70% to ~500KB). I also think that the size of the logo should be checked, as I found that for formatters is quite big 1299x1501 (compared it with teal logo at teal/man/figures/teal.png of 278x321 it is >20 times bigger).

Dimensions of the logos found:

teal/man/figures/teal.png
[1] 278 321
rtables/man/figures/logo.png:
[1] 1501 1501
formatters/man/figures/logo.png:
[1] 1299 1501

> file.size("../teal/man/figures/teal.png")
[1] 79859
> file.size("../rtables/man/figures/logo.png")
[1] 1929788
> file.size("../formatters/man/figures/logo.png")
[1] 498770

I think that only compressing can be enough, but we should reduce the file size of the logos for rtables and formatters.
Note also that teal and formatters' logos have the same proportion but rtables one is a square.

Checking for objects/datasets not used can be more complicated because it is harder to automate as they might be used internally or by other repositories, and I don't know a good method to automatically extract which objects are datasets from an R package (checking https://pharmaverse.r-universe.dev/datasets, reports 33 datasets from teal.* packages but it is harder to explore) .

llrs-roche added a commit to insightsengineering/formatters that referenced this issue Dec 10, 2024
[It has been
detected](insightsengineering/nestdevs-tasks#87)
that the formatters logo is quite big in comparison to other packages.
This PR compress the logo so it no longer takes so much of the package
size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants