Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable dynamic-sized RegisterBits #515

Merged
merged 13 commits into from
Sep 9, 2024
Merged

Conversation

furuame
Copy link
Member

@furuame furuame commented Aug 29, 2024

Hello Knute,

In some vector processors, we may need registers longer than 512-bit. Was std::array used for considering C++ performance? It might work if I just upsize it to 1024-bit, but I think there could be larger ones in the future. Use std::vector instead could be once and for all.

Let me know your thoughts! I tried to make 64 as template parameter by default -- But Registerbits seems to be used heavily as return value type, and class template argument deduction can’t work for that case.

@furuame furuame added the component: sparta Issue is related to sparta framework label Aug 29, 2024
@furuame furuame requested a review from klingaard August 29, 2024 09:43
@furuame furuame self-assigned this Aug 29, 2024
@klingaard
Copy link
Member

Hey Zen! Yes, std::array was used as the base for the class mostly because the backend data was inlined with the class (better locality). But you can use a std::vector if you want as long as performance says sharp.

BTW, there' s a test: https://github.com/sparcians/map/blame/master/sparta/test/Register/reg_bit_test.cpp that you can build by hand to test it. I forgot to hook it up to regression.

@furuame
Copy link
Member Author

furuame commented Aug 30, 2024

BTW, there' s a test: https://github.com/sparcians/map/blame/master/sparta/test/Register/reg_bit_test.cpp that you can build by hand to test it. I forgot to hook it up to regression.

Will do!

Yes, std::array was used as the base for the class mostly because the backend data was inlined with the class (better locality). But you can use a std::vector if you want as long as performance says sharp.

std::vector version was down ~2% simulation performance on generating a 50M trace, that was a shame. I came up with another way to keep std::array, but dynamically allocate memory when the size > 64B ... testing it ...

If my trick works, that might be a good idea to shrink the std::array data to 8B as generally registers are 64-bit.

@knute-mips
Copy link

Look forward to your trick. :)

@furuame
Copy link
Member Author

furuame commented Sep 4, 2024

Done! The new version was measured 6.8% faster than the master build on generating a 10M+50M instruction trace 🚀

Register tests are also passed, which are also a good coverage on testing >64-bit RegisterBits.

Copy link
Member

@klingaard klingaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a memory leak here.

const uint8_t * remote_data_ = nullptr; //!< Remove data; points to local_data_ if no remote
const uint64_t num_bytes_ = 0; //!< Number of bytse
std::array<uint8_t, 8> local_storage_; //!< Local storage
std::unique_ptr<uint8_t> local_storage_alt_; //!< Alternative local storage when register size > 64B
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant std::unique_ptr<uint8_t[]> here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops sorry -- My “C” brain thought I could use that as a uint8_t pointer!

@klingaard
Copy link
Member

My “C” brain thought I could use that as a uint8_t pointer!

I thought I trained that out of you! 🤣

@klingaard
Copy link
Member

I'll take a look at the regressions to see why it's failing. It's another irritant, I think.

@furuame
Copy link
Member Author

furuame commented Sep 5, 2024

I'll take a look at the regressions to see why it's failing. It's another irritant, I think.

Huh it’s like on me. Weird -- I passed it on my Apple Silicon box.

 58/101 Test  #58: Register_test ....................Subprocess aborted***Exception:   0.10 sec

I remember I saw a different CI failure previously though. I’ll also take a look at this one tomorrow.

@klingaard
Copy link
Member

I ran valgrind --vgdb=full --vgdb-error=0 ./Register_test and it found a few issues with RegisterBits. Might start there.

@furuame
Copy link
Member Author

furuame commented Sep 5, 2024

Thank you! Will do!

@furuame
Copy link
Member Author

furuame commented Sep 6, 2024

I ran valgrind --vgdb=full --vgdb-error=0 ./Register_test and it found a few issues with RegisterBits. Might start there.

Done! That one was tricky ... Valgrind rocks!!

Regressions still fail but that seems a Cython compilation issue on Argos ... @klingaard would you help take a look at it?

Comment on lines 1337 to 1339
if(!isPowerOf2(def->bytes)) {
return RegisterBits(nullptr);
}
Copy link
Member Author

@furuame furuame Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to just throw a SpartaException or keep this as it is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write mask is initialized by Class initializer list, but the legal register size is later checked by the constructor ...

However an odd size would cause illegal access here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should throw.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, that’s much better

@klingaard
Copy link
Member

Regressions still fail but that seems a Cython compilation issue on Argos

Sigh... will take a look

@klingaard
Copy link
Member

I'm going to remove helios/pipeview from the regression build. It's broken and we're re-writing it, so I don't care as much

{
if(num_bytes_ <= local_storage_.size()) {
local_storage_ = orig.local_storage_;
local_data_ = orig.local_data_ == nullptr ? nullptr : local_storage_.data();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this happen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, when can num_bytes_ be <= local storage size and the original still have its local_data_ pointer set to nullptr?

Copy link
Member Author

@furuame furuame Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this happen?

Actually I don’t know 😂 I just follow the same logics but take care of the original storage and alternative storage here.

Curious, when can num_bytes_ be <= local storage size and the original still have its local_data_ pointer set to nullptr?

There can be a scenario that RegisterBits is initialized by a raw pointer, which could be nullptr. That’s the only case I can think of.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if it has > 0 bytes, will it ever be nullptr?

Copy link
Member Author

@furuame furuame Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, local_data_ could be nullptr if RegisterBits is initialized by a const uint8 pointer. In that case, only remote_data_ pointer is initialized. local_data_ is nullptr until the value is changed.

@furuame furuame merged commit ce6b9dc into master Sep 9, 2024
8 checks passed
@furuame furuame deleted the zhenw/dynamic-register-bits branch September 9, 2024 04:59
github-actions bot pushed a commit that referenced this pull request Sep 9, 2024
* Enable dynamic-sized RegisterBits

* Revert "Enable dynamic-sized RegisterBits"

This reverts commit a954891.

* Keep 64B array but use heap when registers > 64B

* Enable register bit tests

* Downsize preallocated RegisterBits data space

* Fixed smart pointer type

* Fixed illegal access caused by illegal RegDef

* Fixed uninitialized values

* Remove large register from bad test

* Moved CI forward; see what happens

* Do not build helios anymore

* Throw the exception in the initialze function

---------

Co-authored-by: Knute Lingaard <[email protected]> ce6b9dc
@furuame
Copy link
Member Author

furuame commented Sep 9, 2024

I merged this into the map_v2 branch.

@knute-mips
Copy link

I haven't documented the release process fully, but in a nutshell (I will documented this officially 😄 ):

  • Merged changes into master
  • Merge changes from master into map_v2 branch
  • Tag map_v2 branch with the next tag
  • Make a release

@furuame
Copy link
Member Author

furuame commented Sep 9, 2024

I haven't documented the release process fully, but in a nutshell (I will documented this officially 😄 ):

  • Merged changes into master
  • Merge changes from master into map_v2 branch
  • Tag map_v2 branch with the next tag
  • Make a release

I am glad I do it right so far!

Would we need a process to release a tag? I haven’t done it and filed a discussion here: #519

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: sparta Issue is related to sparta framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants