Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: implement build.jobs.create_environment and build.jobs.install #11551

Closed
humitos opened this issue Aug 19, 2024 · 16 comments · Fixed by #11710
Closed

Build: implement build.jobs.create_environment and build.jobs.install #11551

humitos opened this issue Aug 19, 2024 · 16 comments · Fixed by #11710
Assignees
Labels
Accepted Accepted issue on our roadmap Feature New feature

Comments

@humitos
Copy link
Member

humitos commented Aug 19, 2024

We've been discussing multiple times a way to override pre-defined jobs. It seems we have reached a good timeframe where we can prioritize this work now that we are enabling addons by default on October 7th.

I propose to start with build.jobs.create_environment and build.jobs.install so people can start using any package manager they want: Poetry, uv, pixi, pdm, etc.

Usage example for uv

build:
  jobs:
    create_environment:  # uv doesn't need to create an environment first
    install:
      - uv pip install -r docs/requirements.txt

Important things to consider

  • How to run commands installed via a Python package? e.g. after running uv pip install sphinx, how do we run sphinx-build? is it going to be in the path already?
  • How will we install our core dependencies in the environment created by the user if build.jobs.create_environment is overwritten?
  • Would it make sense to override a job to do nothing, as build.jobs.create_enviroment does in the example?
  • How each of the mentioned package manager would work with this new approach?

Related issues

@humitos humitos added Feature New feature Accepted Accepted issue on our roadmap labels Aug 19, 2024
@github-project-automation github-project-automation bot moved this to Planned in 📍Roadmap Aug 19, 2024
@ddelange
Copy link

does anything speak against always creating and running from a venv? so the build.jobs.create_environment is not needed? if someone wants to do custom stuff with or without that venv, they could accomplish so by adding more commands in build.jobs.install right?

@ddelange
Copy link

ddelange commented Aug 30, 2024

  • How to run commands installed via a Python package? e.g. after running uv pip install sphinx, how do we run sphinx-build? is it going to be in the path already?
  • How each of the mentioned package manager would work with this new approach?

you could consider always 'activating' the venv docker-style:

# this env var is exported when you activate a venv
ENV VIRTUAL_ENV="/usr/share/python3/app"
# this PATH prepend is performed when you activate a venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
# create the venv, it's already activated by the commands above
RUN python3.12 -m venv ${VIRTUAL_ENV}

# that's it. and now (illustrative) your build.jobs.install and onwards

# pip points at /usr/share/python3/app/bin/pip which installs stuff to the venv
RUN pip install uv
# uv points at /usr/share/python3/app/bin/uv which installs stuff to the venv
RUN uv pip install -r docs.txt
# sphinx-build points at /usr/share/python3/app/bin/sphinx-build
RUN sphinx-build ...

@ddelange
Copy link

ddelange commented Aug 30, 2024

aha I see there is already a $READTHEDOCS_VIRTUALENV_PATH used by default, but it doesn't look properly activated and usable like in my previous comment:
image

.readthedocs.yaml

@ddelange ddelange mentioned this issue Sep 4, 2024
@stsewd
Copy link
Member

stsewd commented Sep 4, 2024

So, if a user overrides the create_environment step, all other commands won't work, as they depend on the READTHEDOCS_VIRTUALENV_PATH/CONDA_ENVS_PATH paths having an environment created there.

If we ask users to override all other steps if create_enviroment is overridden, not sure if there is a difference in just letting users run any commands (as build.commands does).

We could ask users to always create an environment on those paths, but then we will likely need to provide examples for each tool on how to create an environment on a given path (if they support this easily), and/or maybe having them call the commands like this example https://docs.readthedocs.io/en/stable/build-customization.html#install-dependencies-with-poetry

And all that is for Python based projects, I see that #11216 is also linked, users will need to override all steps when using a different tool like node, and it will also be weird for those projects having to have a sphinx or mkdocs key as our config file expects now, when they will be likely using another tool.

Do we still want to go in this direction? Looks like we still have some things to decide

@humitos
Copy link
Member Author

humitos commented Sep 5, 2024

if a user overrides the create_environment step, all other commands won't work, as they depend on the READTHEDOCS_VIRTUALENV_PATH/CONDA_ENVS_PATH paths having an environment created there.

I think that's the compromise/contract the user assumes when overriding these jobs, and that's fine to me for now.


On the other hand, I think we can explore a different approach here as a kind of replacement of the build.commands which is what we are looking forward in the end: allow users to have full control of the build, but in a more structured way that allow us to install APT packages in a secure way, and also make decision based on those commands (eg. build.jobs.build.pdf won't be run on previews)

version: 2

build:
  os: ubuntu-24.04
  tools:
    python: latest
  apt_packages:
    - tree

  jobs:
    install:
      - pip install requirements.txt
    build:
      html:
        - sphinx-build -j auto -E -b html docs/ $READTHEDOCS_OUTPUT/html

Note that I'm not defining mkdocs: nor sphinx: configs here, meaning that there won't be any pre-defined build.jobs executed by default1; giving users a generic builder with full control; but also giving us a structured builder to know what's going on.

Footnotes

  1. https://app.readthedocs.org/projects/test-builds/builds/25531814/

@ericholscher
Copy link
Member

I think both of the these approaches make sense, but I generally think it's fine if folks break their build overriding stuff as Manuel said. I'm not 100% sure about a generic builder, but it does seem like a nice option for folks who want a bit more structure. I think this is what @agjohnson has been advocating for instead of build.commands for a while.

@agjohnson
Copy link
Contributor

Yeah, overall build.commands could really be a minimal build.jobs like this. I think it makes for better UX than two competing implementations. But, I'd be happy pointing in that direction, no need to get there immediately. It's good that build.commands exists, but requiring the user does everything is a worse user experience for non-expert users, so I do worry about pointing users there without a strong need.

I most interested in build.jobs.build.* and build.jobs.install is a good place to start too. create_environment does have some hangups, but I agree that's fine if users break their own builds doing something special.

@stsewd
Copy link
Member

stsewd commented Oct 14, 2024

I guess the question is what we want to achieve with this:

Allow users to use another package manager

This won't be easy to do for users, since all of our commands expect a virtual env to be created in a specific path. And uv for example has a proper interface to run commands (is there a way to even run those commands without using uv?). Other package managers don't even create a Python virtual env (node).

We install some additional "core" deps using pip python -m pip ..., which doesn't make sense if users are using another package manager. I guess we could ignore this step, and users will be responsible for installing sphinx/mkdocs. But..

We run the sphinx/mkdocs commands using python -m sphinx .../python -m mkdocs, which again, it may not work out of the box if another package manager is used (and should be a Python package manager, so no npm tools).

Allow users to do custom things but still be able to install apt-packages

I guess this depends on the first point, if there is a lot of friction for users to change the package manager, this won't matter much.

Commands run in a structured manner

Not sure if I understand why this is relevant for us, at the end we care only about the output docs, the process that was followed to generate those aren't relevant for the final output.

Other use cases?

...

My conclusion is that if we add support for this, in the current state, users will need to do a lot of work and workarounds in order to use their tools, and they will be using these tools in a special manner, not like they were designed to (like to run a command using uv run).

Using build.commands is just more explicit for me, users use their tools as they will normally use them locally, no new workarounds to learn, only output their docs in the given directories. The only problem is that they can't install apt-packages, but that's something we can solve there without having to implement a new feature (just allow sudo, or just install the listed apt-packages).

It's good that build.commands exists, but requiring the user does everything is a worse user experience for non-expert users, so I do worry about pointing users there without a strong need.

Overriding the environment creation and installation step for me is still for advanced users, as mentioned, they won't be using this tools like they are used to.

If we still want to go with this direction, there are some things we need to decide, and documentation to write for the workarounds needed for the tools we want to support.

@humitos
Copy link
Member Author

humitos commented Oct 15, 2024

Not sure if I understand why this is relevant for us, at the end we care only about the output docs, the process that was followed to generate those aren't relevant for the final output.

The more structure we have, the better for the platform. As an example, we don't want to run the PDF build on PR builder to speed up the preview. Eventually, this could also be configurable from the Admin -- that will be possible if we have a structured build.


I see the proposal we discussed useful for these different use cases:

  • Basic users: will just use sphinx YAML key and rely on the automated builder
  • Intermediate users:
    • will use sphinx and build.jobs to add extra steps
    • will use sphinx and build.jobs.create_environment for simple environment customization (eg. add an extra parameter to python -m venv)
    • will use sphinx and build.jobs.install for simple environment customization (eg. add an extra parameter to pip install)
    • will use build.jobs.create_environment for environment that require more customization (note they will need to override more build.jobs if they want to keep using sphinx and they change the environment completely -- using uv, for example)
  • Advanced users: will use build.commands and write the commands as they want (at the cost of maybe loosing some of the optimizations/features we can do with the structured build)

In my opinion, the proposed solution covers all the use cases you described but adding structure to the build process, which as I mentioned, it has some current benefits but also leaves the door open to potential features in the future.

@humitos
Copy link
Member Author

humitos commented Oct 15, 2024

My conclusion is that if we add support for this, in the current state, users will need to do a lot of work and workarounds in order to use their tools, and they will be using these tools in a special manner, not like they were designed to (like to run a command using uv run).

I don't agree with this. Users wanting to use uv would do it by running the standard and default uv commands:

builds:
  # https://docs.astral.sh/uv/getting-started/
  jobs:
    create_environment:
      - curl -LsSf https://astral.sh/uv/install.sh | sh
      - uv python install 3.12    # not needed if `build.tools.python: 3.12` is declared
    install:
      - uv pip install -r requirements.txt
    build:
      html:
        - uv run sphinx -b html ...
      pdf:   # won't be ran on PR builds
        - uv run sphinx -b pdf ...

Using build.commands is just more explicit for me, users use their tools as they will normally use them locally, no new workarounds to learn, only output their docs in the given directories. The only problem is that they can't install apt-packages, but that's something we can solve there without having to implement a new feature (just allow sudo, or just install the listed apt-packages).

I want to eventually migrate off the build process from the application and give users the ability to do whatever they want and having root access; but we are not there yet. Even if we were close to that, it still won't be a replacement of the structured build process that build.jobs gives us.

@stsewd
Copy link
Member

stsewd commented Oct 15, 2024

I don't agree with this. Users wanting to use uv would do it by running the standard and default uv commands:

In your example, you are also overriding the build steps, which is not mentioned in this initial implementation.

In case we want to also implement the customization of the build, there are some other questions, like what to do with the formats option of the config file.

@humitos
Copy link
Member Author

humitos commented Oct 15, 2024

In your example, you are also overriding the build steps, which is not mentioned in this initial implementation.

That is what we expect to implement immediately after we implement build.jobs.create_environment and build.jobs.install. We discussed this in the original issue: https://github.com/readthedocs/meta/discussions/46#discussioncomment-10361881

Implementing first .create_environment and .install was a way to split the work into digestible amount of work.

@agjohnson
Copy link
Contributor

I'll echo what @humitos is pointing out here as well, I think we're on the same page here. This pattern might have been more obvious starting with build.jobs.build perhaps.

The main goal is to provide more touch points on build.jobs so that users don't need to needlessly redefine the whole build process just to use another build tool or use install step like uv.

I'd go one step further and say there might be a future where even build.commands is not necessary, but this can be a side effect and it's not the main goal.

@stsewd
Copy link
Member

stsewd commented Oct 15, 2024

This pattern might have been more obvious starting with build.jobs.build perhaps.

Indeed. It feels like there are still bits to decide around how this integrates with the existing config options. I think we may need a design doc here, or a more specif details about how this feature integrates with the existing config.

@ericholscher
Copy link
Member

On our call today, we discussed moving forward here. It sounds like the validation might be complex, and we will need to figure out exactly what that looks like. If it's easy, we're going to support a generic builder (no sphinx or mkdocs key), along with overriding the build steps on sphinx & mkdocs builds. In general, it sounds like we want to be permissive with the generic builder (allow builds.jobs.build with no other build steps, and a build image: #11551 (comment)) and so hopefully that will simplify some of the validation logic.

Another big question is what to do with formats if you override builds.jobs.build.html, for example. We want to support a build.jobs.build.html override that outputs $READTHEDOCS_OUTPUT/pdf/foo.pdf, so there are a few ways to specify "I want to host a PDF" that we need to resolve how they all play together.

stsewd added a commit that referenced this issue Nov 25, 2024
I was replacing some of the config models with dataclasses, but I found
myself re-implementing some helpers that pydantic provides, so I'm
introducing that new dep, it has everything we need, we may be able to
use it to simplify our validation once all models are migrated to
pydantic.

### About incompatible options

I decided to just not allow formats when `build.jobs.build` is used,
seems just easier that way. But not sure if users may want to just
override the html build step while still using the default build pdf
step, if that's the case, we may need to support using formats... or
have another way of saying "use the default".

build.jobs.create_environment is kind of incompatible with python/conda.
Since users will need to manually create the environment,
but they may still want to use the defaults from build.jobs.install,
or maybe they can't even use the defaults, as they are really tied to
the default create_environment step.

Which bring us to the next point,
if build.jobs.create_environment is overridden,
users will likely need to override build.jobs.install and
build.jobs.build as well...
Maybe just save the user some time and require them to override all of
them?
Or maybe just let them figure it out by themselves with the errors?
We could gather more information once we see some real usage of this...

### Override them all

I chose to make build.html required if users override that step, seems
logical to do so.

~~Do we want to allow users to build all formats? I'm starting with html
and pdf, that seem the most used. But shouldn't be a problem to support
all of them. I guess my only question would be about naming, `htmlzip`
has always been a weird name, maybe just zip?~~ I just went ahead and
allowed all, with the same name as formats.

### Docs

I didn't add docs yet because there is the question... should we just
expose this to users? Or maybe just test it internally for now?

Closes #11551
@github-project-automation github-project-automation bot moved this from Needs review to Done in 📍Roadmap Nov 25, 2024
@ericholscher
Copy link
Member

🎉

humitos added a commit that referenced this issue Nov 28, 2024
This PR modifies the build to not run default commands when the user doesn't
specify `sphinx` or `mkdocs` configs in the YAML file.

This allows to use the following YAML file to build with Docusaurus, for
example:

```yaml
version: 2

build:
  os: ubuntu-24.04
  tools:
    nodejs: "22"
  jobs:
    install:
      - cd docs/ && npm install
    build:
      html:
        - cd docs/ && npm run build
    post_build:
      - mkdir --parents $READTHEDOCS_OUTPUT/html/
      - cp --recursive docs/build/* $READTHEDOCS_OUTPUT/html/
```

This is the structure I've been wanting to have to allow any type of doctool to
work in our platform following the `build.jobs` pattern.

The code deployed in production doesn't allow this because it runs automatically
`create_environment`, `install_core_dependencies` and `install_requirements`
even if the project is not using Python --which is incorrect. It fails with the
generic error: " Unknown problem. There was a problem with Read the Docs while
building your documentation"

- https://app.readthedocs.org/projects/test-builds/builds/26425680/
- https://read-the-docs.sentry.io/issues/4458662274/

This PR detects if the user is defining `sphinx` or `mkdocs` and if it's not, it
doesn't run those jobs and leave the user to handle the build workflow
completely.

* Related #11216
* Related #11551
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Feature New feature
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants