Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automate generating article PDFs #6

Open
gwijthoff opened this issue Feb 10, 2020 · 8 comments
Open

automate generating article PDFs #6

gwijthoff opened this issue Feb 10, 2020 · 8 comments
Labels
chore One-off task enhancement New feature or request

Comments

@gwijthoff
Copy link
Contributor

Note: Scholars Lab uses a rakefile in Ruby for this.

@gwijthoff gwijthoff added the enhancement New feature or request label Feb 10, 2020
@thatbudakguy thatbudakguy added chore One-off task and removed enhancement New feature or request labels Mar 16, 2020
@thatbudakguy
Copy link
Collaborator

I think #22 will depend on this.

@thatbudakguy thatbudakguy added this to the Issue 1 milestone Jun 16, 2020
@thatbudakguy thatbudakguy added the enhancement New feature or request label Jun 17, 2020
@thatbudakguy thatbudakguy removed this from the Issue 1 milestone Jun 17, 2020
@thatbudakguy
Copy link
Collaborator

thatbudakguy commented Sep 24, 2020

took a look at the rakefile above; it provides commands for easily generating new blank content (like hugo new) and some other interesting things but doesn't deal specifically with generating pdfs. it seems that hugo doesn't have a "tasks framework" for running arbitrary steps as part of the build, or enabling new commands (creating our own hugo pdf or something of that nature, for example).

i think there are a couple options here:

  • we create a small makefile or script that automates generating pdfs using WeasyPrint, so @gwijthoff only has to run make pdf or similar to generate all the pdfs for an issue
  • we create a GitHub action that automatically generates the pdfs for each article when new markdown content is pushed to GitHub, and either adds the pdfs to the repository or stores them somewhere else
  • we do neither and continue having @gwijthoff manually create pdfs (not sure how much manual effort is involved here; maybe this is the necessary/best solution)

@gwijthoff @rlskoeser thoughts?

@gwijthoff
Copy link
Contributor Author

@thatbudakguy I don't imagine there being a huge amount of manual effort for me to run the WeasyPrint command, put the PDF files in the proper directory, and update the markdown links for each article. However, I don't think there's a good way for me to manually add a PDF link alongside the TXT link in the article header, given the way themes/layouts/article/single.html is currently configured.

I think I would prefer to integrate the PDF build into GitHub actions, but how difficult would that be? Is it doable by Issue 1 launch?

Although maybe @rlskoeser has thoughts on whether relying on GH actions is less sustainable than terminal commands (Nick's make pdf makefile example) that we document in the editorial guidelines or README.

@rlskoeser
Copy link
Contributor

I think it can be automated, but I also think it's a bit more complicated than you two are thinking of — which is why I think it's better for us to do it manually at first, so we get a better understanding of what's involved.

Here are some of the steps I've thought of, there may be more:

  • Use Zenodo API to create a draft record and reserve a DOI
  • Add reserved DOI to the article metadata
  • Publish the article to the site
  • Generate the PDF with weasyprint
  • Use Zenodo API to upload the generated PDF to the draft record and publish it
  • Add Zenodo PDF link to article metadata on the site

PDFs need to be generated from the published site so they have correct URLs; I haven't thought of a good way around this yet. I think we should also review the PDFs — although that could be a step earlier in the editorial workflow, so that you're checking the PDF looks ok as you're finalizing the markdown version of the article and you already know the autogenerated one will be fine. (But it is a concern — if this is fully automated, we could end up with a problem in the PDF and have it published on Zenodo before we can fix it.)

Also, I don't think we should write code against the Zendo API yet until we know whether we might be able to use PUL journal publishing infrastructure at all.

@gwijthoff you do bring up a good point, which I thought we had already handled but maybe we haven't — I think the PDF link should be a URL that is set in the metadata for the article once we have it, and the article template should use that to generate the PDF link similar to the way the PDF version is displayed now. Am I right that we missed this? I don't see it in the templates.

@gwijthoff I think we said we would reference the PDFs on Zenodo — is that your recollection too?

@gwijthoff
Copy link
Contributor Author

@rlskoeser yes, I think we did miss handling the PDF URL –> article template. The solution you describe makes sense to me. Also, yes, we did say that we'd reference the PDFs on Zenodo.

In general, this workflow looks good for Issue 1. We can tackle PDF automation in Issue 2.

@thatbudakguy
Copy link
Collaborator

@rlskoeser it sounds like you're also describing #35, no?

@rlskoeser
Copy link
Contributor

Yes indeed, looks like #35 is related — I don't see how they can be automated independently since we need a reserved DOI to generate the PDF but we need a PDF to publish the record that gives us the DOI.

@thatbudakguy would you make an issue for the PDF solution we need for issue 1 ? Articles should have an optional pdf url metadata field (maybe just pdf ?), and it should render similarly to the TXT link per the design when it's set.

@thatbudakguy thatbudakguy changed the title integrate WeasyPrint into Hugo build for generating article PDFs automate generating and depositing article PDFs Oct 22, 2020
@thatbudakguy thatbudakguy changed the title automate generating and depositing article PDFs automate generating article PDFs Oct 22, 2020
@rlskoeser
Copy link
Contributor

As a step toward this, we could at least streamline the creation of PDFs with a script. Steps would be something like:

  • build hugo site in pre-production mode
  • kick off http server to serve out local copy of built site
  • generate pagedjs commands to generate pdfs for all feature articles for the specified issue (probably easiest if the script takes an option for issue number); should be able to generate the desired filenames based on the hugo metadata
  • run pagedjs for each article; put the pdfs in a directory (auto or path param)

Hugo doesn't have a tasks framework, so this could be a bash script or maybe a javascript cli script so we don't add new technologies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore One-off task enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants