Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📝 Add Assembly information and instructions to PrimeCPP solution 3. #963

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 32 additions & 9 deletions PrimeCPP/solution_3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,32 @@ Since the standard library does not provide required functions, sqrt and bitfiel
*Note*: this solution is limited to numbers up to around 50,000,000 (stack size limit on Mac OS it seems).

## Run instructions
### From CPP Binary

`./run.sh`, requires CLANG in a fairly recent version (supporting C++ 20)

### From Derived Assembly

The generated assembly code of this solution can be inspected by running the following command:

```shell
clang++ $CXX_ARGS -S -masm=intel PrimeCPP_CONSTEXPR.cpp -o PrimeAssembly.s
```

This code might be further optimised by a seasoned assembly developer. To generate the binary, run:

```shell
ASM_ARGS="-pthread -O3 -m64 -mtune=native"

clang++ $ASM_ARGS PrimeAssembly.s -o primes
```

and to run the solution simply execute the binary:

```shell
./primes
```

## Output

All on Apple M1 (Macbook Air)
Expand Down Expand Up @@ -46,34 +69,34 @@ Compared to other C++ implementations:

Computing primes to 10000000 on 8 threads for 5 seconds.
Passes: 2264, Threads: 8, Time: 5.00982, Average: 0.00221282, Limit: 10000000, Counts: 664579/664579, Valid : Pass

davepl_par;2264;5.00982;8;algorithm=base,faithful=yes,bits=1

### Docker performance

Single-threaded
Single-threaded
┌───────┬────────────────┬──────────┬────────┬────────┬──────────┬─────────┬───────────┬──────────┬──────┬───────────────┐
│ Index │ Implementation │ Solution │ Label │ Passes │ Duration │ Threads │ Algorithm │ Faithful │ Bits │ Passes/Second │
├───────┼────────────────┼──────────┼────────┼────────┼──────────┼─────────┼───────────┼──────────┼──────┼───────────────┤
│ 1 │ cpp │ 1 │ davepl │ 3982 │ 5.00001 │ 1 │ base │ yes │ 1 │ 796.39857 │
└───────┴────────────────┴──────────┴────────┴────────┴──────────┴─────────┴───────────┴──────────┴──────┴───────────────┘
Multi-threaded

Multi-threaded
┌───────┬────────────────┬──────────┬────────────┬────────┬──────────┬─────────┬───────────┬──────────┬──────┬───────────────┐
│ Index │ Implementation │ Solution │ Label │ Passes │ Duration │ Threads │ Algorithm │ Faithful │ Bits │ Passes/Second │
├───────┼────────────────┼──────────┼────────────┼────────┼──────────┼─────────┼───────────┼──────────┼──────┼───────────────┤
│ 1 │ cpp │ 2 │ davepl_par │ 13192 │ 5.00080 │ 4 │ base │ yes │ 1 │ 659.49448 │
└───────┴────────────────┴──────────┴────────────┴────────┴──────────┴─────────┴───────────┴──────────┴──────┴───────────────┘
Single-threaded


Single-threaded
┌───────┬────────────────┬──────────┬─────────────────────┬───────────┬──────────┬─────────┬───────────┬──────────┬──────┬────────────────┐
│ Index │ Implementation │ Solution │ Label │ Passes │ Duration │ Threads │ Algorithm │ Faithful │ Bits │ Passes/Second │
├───────┼────────────────┼──────────┼─────────────────────┼───────────┼──────────┼─────────┼───────────┼──────────┼──────┼────────────────┤
│ 1 │ PrimeCPP │ 3 │ flo80_pol_constexpr │ 234051587 │ 5.00000 │ 1 │ base │ no │ 1 │ 46810317.40000 │
└───────┴────────────────┴──────────┴─────────────────────┴───────────┴──────────┴─────────┴───────────┴──────────┴──────┴────────────────┘
Multi-threaded

Multi-threaded
┌───────┬────────────────┬──────────┬─────────────────────┬───────────┬──────────┬─────────┬───────────┬──────────┬──────┬───────────────┐
│ Index │ Implementation │ Solution │ Label │ Passes │ Duration │ Threads │ Algorithm │ Faithful │ Bits │ Passes/Second │
├───────┼────────────────┼──────────┼─────────────────────┼───────────┼──────────┼─────────┼───────────┼──────────┼──────┼───────────────┤
Expand Down
Loading