Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge main into explicit fsm compilation #2381

Open
wants to merge 49 commits into
base: explicit-fsm-compilation
Choose a base branch
from

Conversation

parthsarkar17
Copy link
Contributor

No description provided.

ayakayorihiro and others added 30 commits October 29, 2024 18:42
…#2314)

*Note: This PR is still a draft because I have some questions about the
changes I made here that I wanted to ask about. Once those concerns are
resolved I will turn this into a full PR.*
## This PR contains:
- `fud2/scripts/profiler.rhai`: fud2 support for the profiler. I added a
new state `flamegraph` (the `svg` file containing the flame graph) and a
new operation called `profiler`, which takes a Calyx file and produces
the profiled flame graph.
- Updates to existing profiling scripts to read instrumented cell
signals and allow toggling between profiling optimized and non-optimized
Calyx programs.
- Updated fud2 tests to reflect new state and operation.

## Usage

First, clone https://github.com/brendangregg/FlameGraph and edit the
`fud2` configuration file to specify the location of `flamegraph.pl`:
ex)
```
[flamegraph]
script = "/home/ayaka/projects/FlameGraph/flamegraph.pl"
```

To obtain a flame graph from a Calyx program, specify the output as a
`svg` file.
ex) 
```
fud2 tests/correctness/while.futil -o while.svg -s sim.data=tests/correctness/while.futil
```
will produce the below flame graph.

![image](https://github.com/user-attachments/assets/b257ecb6-9f45-49b7-9555-f68e3212403e)

To obtain a flame graph for the non-compiler-optimized version of the
Calyx program, add `-s passes=no-opt`:
ex)
```
fud2 tests/correctness/while.futil -o flame.svg -s sim.data=tests/correctness/while.futil.data -s passes=no-opt
```
…ogging (#2319)

Sorry for the larger PR here, a bunch of stuff ended up getting all
tangled together.

This PR adds:
- a check for undefined guards after convergence. This is currently
disabled by default as the changes to `@control` mean that this will
basically never be relevant for normal programs.
- redoes internal error handling to make proper error messages more
enforceable by the type system
- Adds a configuration struct for runtime checks/behaviors
- Adds optional debug logging via `--debug-logging` which prints out the
assignments that fire and the results they propagate alongside the
implicit zero assignments. Might be useful for some people under some
circumstances?
- Adjust the `@control` port process to match that used by the compiler.
Consequently for most programs we simulate, every port is `@control`.
This means some error cases are no longer errors and also reintroduces
transient conflicts on some programs.
- As a spot fix to the transient conflicts, when detected we will now
check if the guard for the first assignment still evaluates to true and
if not we take the later assignment and continue onward, otherwise an
error is raised. In all likelihood this won't mitigate every instance of
the transient conflict problem but seems to address it for our current
programs.
This reduces the size of BitVecValue from 32 bytes to 24 bytes.
…#2323)

A quick patch for the `fud2` flow which makes it possible to run without
supplying a `sim.data` value which would otherwise block things. This
hopefully makes testing small programs more straightforward as it no
longer requires generating an empty data file or using direct
invocations of the tools.
Extract floating-point support for `fud` from
#1928
Add a `calyx.lib_path` argument for the `fud2` stage.
This PR removes ops from `lib.rs` which are duplicated in Rhai scripts.
Changes from #2266 since author has not responded. Fixes #2253.
Removes all the old btor code that is no longer used.
Quick set of renaming to reduce the confusion with the `flags` stuff
which should make it clear where things belong
Adding `jq` stage to `fud2` but I've been having trouble getting the
stage to print out the value from the command. It seems like the `jq ...
> $out` call gobbles up the output. Not sure if I'm making a mistake.

Any thoughts @EclecticGriffin @ekiwi or @sampsyo?
Fixes #2336. Currently running into the same problem as
#2335 (comment)

@ekiwi how did you fix it on the Rust side?
Moving all tests in the root `runt.toml` to use `fud2`.
Sorry, I overlooked Rachit's comment on the previous PR.
This PR adds a documentation page for the `calyx-pass-explorer` tool,
serving as a gentle introduction to its usage as well as an overview of
its code.

- I've added `mdbook-callouts` as a way to get Obsidian-style quote
blocks. Specifically, I'm using this for the `[!TIP]` environment. I can
remove this if we don't want another dependency.
- I did some unrelated things (to `docs/contributors.md`,
`docs/github.md`, and `docs/intro.md`) that I can move into another PR
if that would be better.
This allows us to remove most uses of `jq` in the tests.
I did some silly hacking late last week that might get us marginally
closer to unifying all the data marshaling under a single tool.

This adds the ability to specify a `--to dat` (or one of the aliases)
and an output directory which will generate the hex encoded files that
verilator/icarus expect. There are some minor differences with the dat
files generated by the python flow. Mainly, the python flow truncates
leading zeroes in the encoding while I've elected to retain them in the
interest of keeping things simple. Python generates:
```
4B
53
21
5D
1E
5E
2B
B
3C
60
```
while the data-converter generates
```
0000004B
00000053
00000021
0000005D
0000001E
0000005E
0000002B
0000000B
0000003C
00000060
```

This also adds the ability to deserialize this style of data dump but is
slightly brittle at the moment since it expect the following:
- the data header is exactly that used by the tool and is cbor encoded
in a file named `header`. The python currently json encodes things in a
file named `shape`
- the input data includes all leading zeroes

Both these assumptions can probably be relaxed in the future in the
interest of robustness
Creates a new `BaseSimulator` object that can be used independently of
`Simulator`. Also, things are cloneable now. The rationale is we can now
write a model checker using cider by cloning cider contexts at forks.
Configured various files and edited the parser to print out nodes in the
format we talked about.
Biggest function implemented was string_path in **program_counter.rs**
I spent some more time hacking on this instead of spending my time in a
more productive way. This does the following:
- switch the `dat` deserializing from custom matching stuff to a proper
`nom` parser that accounts for comments and leading `0x` tags.
- `dat` files can now parse values with leading zeroes truncated (though
we continue to generate `dat` files with the leading zeroes included)
- Output `dat` files will be generated by default as `MEMNAME.dat`
though this can be customized with `-e` flag. I.e. `-e out` will
generate `MEMNAME.out`
- Similarly, when reading in a `dat` directory, the tool will look for
`MEMNAME.dat` which can be retargeted via `-e` flag.
- The tool will also infer the `--to json` target when given a directory
as input
Closes #2344 since I think it's a relative paths issue.
Actually closes #2344; see #2361 for the failed previous attempt.
Created a parser to process the path we've described such as: ".-0-1-b"
What is says on the tin (as discussed on this [slack
thread](https://cucapra.slack.com/archives/C06CV424G94/p1727223025447109))

Due to a new [release](https://releases.rs/docs/1.83.0/) for rust, CI
catches a few errors (unrelated to this PR). For now, we simply pin to
an older version:
- remove toolchain argument for `setup-rust-toolchain@v1` in `rust.yml`
- add `rust-toolchain.toml`
Adding functions to check if values are special IEEE values (Nan,
infinity, denormalized)
ethanuppal and others added 19 commits December 4, 2024 01:27
Uses interior mutability with a `RwLock` and `lazy_static!`.
This somewhat chunky PR includes the logic needed to propagate clocks
through combinational logic and amends the read checks to defer until
the value is used in a non-combinational context. The continuous
assignments now run under a single thread which is never synchronized,
so attempts to write a register in continuous logic will always result
in a race when used outside continuous logic.

There's some additional refactoring and minor changes to the primitive
interface. Additionally, a single step can now advance past multiple
control nodes, which should reduce some of the repetition when using the
debugger
This PR contains:
- The `profiler-instrumentation` pass is now updated to contain four
types of probes in order to capture call site information.
  - `group_probe`: A probe that is high when a group is active.
- `primitive_probe`: A probe that is high when a primitive cell is
active.
- `se_probe`: A probe that is high when a group that structurally
enables another group is active.
- `cell_probe`: A probe that is high when a group that invokes a
non-primitive cell is active.
- `profiler-process.py`: an updated VCD postprocessing script that
generates `.folded` and `.dot` files:
- A "flattened" flame graph, where each parallel arm gets its own
"cycle". So, if arm A and arm B were executing on a single cycle, the
flame graph would account for a cycle in arm A and a cycle in arm B.
- A scaled flame graph, where a cycle is divided between the parallel
arms in execution. So, if arm A and arm B were executing on a single
cycle, the flame graph would account for 0.5 cycles in arm A and 0.5
cycles in arm B.
- `aggregate.dot`: A tree summary of the execution of the program. Nodes
(groups and cells) are labeled with the number of times the node was a
leaf, and edges are labeled with the number of cycles that edge was
activated.
- `rank{i}.dot`: A tree representation of the `i`th most active stack
picture. `rankings.csv` lists the specific cycles that each ranked tree
was active for.
- Updated fud2 support (`fud2/scripts/profiler.rhai`)
- Utility scripts to help convert generated `.folded` and `.dot` files
into visualizations (`svg` and `png` files respectively)
- Updated tests/cleanup of old tests that relied on scripts that no
longer exist

As usual, I'd really appreciate any suggestions or thoughts!!
Small PR which adds support for combinational components and makes some
minor adjustments in a few other places.

The notable difference for the convergence algorithm is that we now sort
the program counter by containment, before iterating over it. This means
we will visit parents before children during convergence and is
necessary for race detection in the presence of invoke chains. While it
is unnecessary when not doing race detection, I've currently elected to
do it unconditionally for simplicity and consistency's sake, but if it
becomes necessary we can move it behind the race detection flag.
Forgot to remove `par-to-seq` from profiler passes after implementing
profiling for parallel programs!!!
Supersedes #2369.

`rust-toolchain.toml` now determines the version of rust used for every
aspect of the CI, rather than having it change on its own or be set
separately. As it turns out, the version of rust used for the clippy
lints was also pinned separately, so I also had to address all those
lints. As a result, I had to make minor changes to a bunch of code, the
vast majority of which were either docstring indents or places where
people manually implemented `ToString` instead of using `Display`. I
don't anticipate any of these changes messing with things but if I did
break something, do let me know.

Assuming I've done things correctly, it should be the case that the rust
version will not change on its own anymore, and we can update the
version used by all of Ci by editing the toolchain file
bad branch name (oops)
This branch adds the following functionality to the VSCode Cider
debugger:

1. Displaying accurate port values
2. Continue requests now behave according to the spec by informing
VSCode why the debugger was stopped
3. Cells (scopes) displayed now only shows the cells for the current
component, rather then every cell in the environment
4. Small json tweaks (name, publisher, etc)

---------

Co-authored-by: Serena <[email protected]>
Small edits to `gen_test_data.sh` to generate `.data` files with
argument-specified lengths of commands, and a specified output
directory. If no arguments are provided, the size and the output
directory are set to the default (20000 and `<SCRIPT_DIR>/../tests`).
It's a bit hacky, but since I only need it for profiler debugging
purposes, I hope it's ok?
Finally, rename the cider library to be `cider` instead of `interp`.
…#2377)

A quick touch up of the `GlobalPositionTable` to avoid UB without
getting too fancy. Main changes are use of `boxcar::Vec` instead of the
standard library Vec, so that we can now have lock-free concurrent
appends and accesses. This fits our use-case since we never de-allocate
files & positions and frees us from all the issues associated with using
locks. Since we're not using locks, we also don't need any unsafe to
extend lifetimes, and since `boxcar::Vec` doesn't reallocate stored
elements, we don't need to box the stored stuff.

I also use `LazyLock` instead of `lazy_static!` which is easier to deal
with.

The end result is a global position table that can be accessed through
associated functions on the `GlobalPositionTable` type, rather than
methods on an instance. I also changed the `String`s stored by `File` to
`Box<str>`, because we don't ever mutate them.

---
In the process of this, I ran face first in CI issues. Turns out that
because some of the tests run in the docker container, they aren't
actually building using the version of rust in `rust-toolchain.toml` and
instead were using the version of rust from the container (1.76). This
broke things since `LazyLock` was only stabilized in 1.80. I had a real
fight with the CI because it turns out that using the standard actions
in the docker container is not a straightforward as one would like. I'll
skip a recount of the entire ordeal---peruse the commits to witness my
pain---but I resolved things and made these tests consistent with the
rest of CI by specifically pulling the toolchain file before running the
rust install action. The end result is that even the docker container
tests do respect the indicated rust version, and we should be able to
bump the version of rust used by _EVERYTHING_ (if this isn't true I may
scream) by updating the file.
The WIP [PR that tries to get an `yxi` powered end to end xilinx tools
workflow up and running](#2267)
ballooned into a beast and has gotten pretty stale. This is me trying to
break things down into parts and slowly get everything merged.

This changes the `*axi_generator.py` files to use underscores, which is
needed for proper importing afaict.
Also snuck in formatting changes to `dynamic_axi_generator.py`.

These are pretty minor changes so going to merge when tests pass.
The existing instructions did not work to add `fud2` to my path (locally
on macOS). FWICT `cargo install --path fud2` did. Maybe we also want to
get rid of the previous instructions? Not sure if they are still valid.
Another part of #2267 that makes sense to live on it's own. Sorry for so
many small PRs! But whittling things down like this is helping me better
remember where everything stands
AMD decided to bork the old links to a bunch of useful docs, this
updates that.

Just comment changes so going to auto merge
Tiny patch to replace `OnceLock` since `LazyLock` works in this case and
is slightly simpler
Turns out there was no reset input port to `comb_mem_d4` in
`comb.futil`! 🤣
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.