Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should we handle testing #9

Open
epage opened this issue Mar 2, 2019 · 9 comments
Open

How should we handle testing #9

epage opened this issue Mar 2, 2019 · 9 comments
Labels
enhancement New feature or request

Comments

@epage
Copy link
Contributor

epage commented Mar 2, 2019

Considerations

  • Emitting JUnit reports
  • Emitting coverage (gathered in which way?)
  • Maintaining support for cargo test's features that are relevant for CIs

Possibilities

  • Contribute JUnit support to libtest
  • A replacement for cargo test
  • Separate command that converts json to JUnit
  • Task to wrap all of this
@epage
Copy link
Contributor Author

epage commented Mar 2, 2019

@johnterickson on gitter

  1. I think "cargo test" maps decently to JUnit (which seems to be the most broadly supported), but I'm less sure about "cargo bench".
  1. I'm "on the fence" on whether libtest should emit it's own JSON or whether JUnit should be baked in. A confounding factor is that the libtest formatters are used for both "cargo test" and "cargo bench"
  1. I'd like to avoid wrapping cargo with another executable, though I think it would make sense for a "cargo test" build task in Pipelines to also publish the test results to Azure DevOps

4) I'm relatively new and not running Rust in production, so I'm open to hearing other thoughts and also do take my opinions with a grain of salt :)

(doesn't matter, your thoughts are welcome!)

@epage
Copy link
Contributor Author

epage commented Mar 2, 2019

I think "cargo test" maps decently to JUnit (which seems to be the most broadly supported), but I'm less sure about "cargo bench".

From a CI perspective, a lot of people give flack for running cargo bench from within a CI due to the variability Sadly, my google-fu is failing me and I can't find the blog post where someone analyzed this.
(personally I still do it)

From a completeness perspective, it is probably a problem to have an output format that doesn't support all of the events.

Hmm, thinking about this. We could treat each bench as a test case and store the bench information in case properties. That way we can at least say we support it, even if people can't act on it.

(I still feel like a benchmark reporting service is something lacking in the wider programming community)

I'd like to avoid wrapping cargo with another executable, though I think it would make sense for a "cargo test" build task in Pipelines to also publish the test results to Azure DevOps

For cargo test to be programmatically useful, we effectively need to replace it from what I've seen. If junit support was implemented via emitters, then it'll be in the output, mixed with other text making it unclean to extract. If we use escargot, we can cleanly build and test, provide a nice UX for the viewer, and support outputting the relevant test data as junit.

cargo-suity effectively replaces cargo test. Especially after it migrates to escargot, it will be calling cargo build on the tests and then invoking the test executables directly. I suggested renaming it. My vote is actually cargo-ci-test (cargo ci-test) and have its scope be for a CI-focused version of cargo test, helping with needs like coverage, etc.

@ghost
Copy link

ghost commented Mar 3, 2019

If junit support was implemented via emitters, then it'll be in the output, mixed with other text making it unclean to extract.

This is a great clarifying point that reminds me of rust-lang/rust#57147. Zooming out for a second, i wonder if it's a hard requirement of libtest that the test output go to stdout. I think you make a good point that there will be too much mixing in stdout. Seems not very "rustic" to be having to grep from stdout mess.

Perhaps libtest should have:

  1. Formatters for showing cargo test/bench to the console (e.g. with console coloring). These would be streaming as tests complete (pretty, terse)
  2. Formatters for writing results to a test log file. This need not be streaming. (e.g. JUnit has a count at the top IIRC)

This would also solve the question of what to do about cargo's current "JSON-per-line" - specifically we could deprecate

My vote is actually cargo-ci-test (cargo ci-test) and have its scope be for a CI-focused version of cargo test ...

My thought against a wrapper (not having tried suity or escargot (whose name I love btw)) Is more about having a way to generate programmatic test output out-of-box. That is, running tests in a CI shouldn't need me to "cargo install" something first. I'll have to try out suity and escargot!

@epage
Copy link
Contributor Author

epage commented Mar 3, 2019

My thought against a wrapper (not having tried suity or escargot (whose name I love btw)) Is more about having a way to generate programmatic test output out-of-box. That is, running tests in a CI shouldn't need me to "cargo install" something first.

Understandable. Maybe I'm just being pessimistic but I expect we'll have quite a few thins to cargo install as we evolve the ecosystem. Like leveraging crates instead of having a "batteries included" stdlib, I favor us iterating on a lot of these tools outside of cargo. We get them sooner and can more easily evolve them as we better understand the problem.

Speaking of, external tools might also be an important part of getting features integrated into cargo. if we have use cases to show we are solving along with a mature implementation, I expect that'll go through a lot more smoothly.

Also, to be clear, to avoid the performance overhead of literally doing cargo install, I've been encouraging people to upload binaries to github releases in a way compatible with trust's install.sh. GhInstall is the other task that I think that'd be useful. See #3

@nickbabcock
Copy link

From a CI perspective, a lot of people give flack for running cargo bench from within a CI due to the variability

Criterion has a section about that. It doesn't give numbers, but basically says if you run cargo bench in a CI pipeline, it shouldn't fail a build.

Speaking of Criterion, if we do support benchmarking in CI (of which I don't do but won't inhibit others), Criterion should really be the default / recommended. The fact that it outputs csv benchmark data (in addition to its other features) lends itself even more to a CI use case, as one could track benchmarks across all builds.


For tests, from a UX perspective, I think I'd naturally gravitate towards seeing two steps "1. Run tests. 2. Upload test results / (3.?) code coverage" to differentiate potential failures, but I'm flexible. As long as any potential wrapper / tool is as solid as cargo test then it doesn't matter to me what's underneath. I wouldn't want to see a step in the UI called cargo suity as it seems best to name a step after the semantics (hmm maybe not the right word) and not the implementation.

@epage
Copy link
Contributor Author

epage commented Mar 4, 2019

Speaking of Criterion, if we do support benchmarking in CI (of which I don't do but won't inhibit others), Criterion should really be the default / recommended. The fact that it outputs csv benchmark data (in addition to its other features) lends itself even more to a CI use case, as one could track benchmarks across all builds.

I'm on the fence on Criterion atm. Without the new testing RFC implemented and stable, it is error prone to maintain the list of benchmarks. I talked to them about adopting macros for registering tests but they want to just wait until the testing RFC is in.

See bheisler/criterion.rs#262

Otherwise, I think we're agnostic of the use of criterion.

For tests, from a UX perspective, I think I'd naturally gravitate towards seeing two steps "1. Run tests. 2. Upload test results / (3.?) code coverage" to differentiate potential failures,

Let me see if I understand. Is (3) the uploading of coverage or re-running the tests to collect coverage data?

If the latter, the downside is we're doubling our testing time.

but I'm flexible. As long as any potential wrapper / tool is as solid as cargo test then it doesn't matter to me what's underneath. I wouldn't want to see a step in the UI called cargo suity as it seems best to name a step after the semantics (hmm maybe not the right word) and not the implementation.

I agree.

@nickbabcock
Copy link

Let me see if I understand. Is (3) the uploading of coverage or re-running the tests to collect coverage data?

If the latter, the downside is we're doubling our testing time.

How does tarpaulin work? Does it wrap cargo test? Ideally this would all be some sort of middleware, but I'm not sure the best way to avoid re-running the test suite for code coverage and still receive test results and code coverage to upload.

@epage
Copy link
Contributor Author

epage commented Mar 4, 2019

cargo-kcov wraps cargo test, parsing the textual output

https://github.com/kennytm/cargo-kcov/blob/master/src/main.rs#L177

cargo-tarpaulin wraps cargo test, calling directly into cargo crate

https://github.com/xd009642/tarpaulin/blob/master/src/lib.rs#L99

@epage epage transferred this issue from crate-ci/resources Jul 12, 2019
@epage epage added the enhancement New feature or request label Jul 12, 2019
@jonhoo
Copy link
Collaborator

jonhoo commented Jan 20, 2020

I think this should be closed given the recent direction of the project? Maybe with a reference in #78, though I'm not sure. #51 will remain open for how we handle testing of the templates themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants