-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[opentitantool] Introduce binary protocol for HyperDebug gpio monitoring #20672
Conversation
Can you confirm our understanding of the USB protocol used here (it's hard to figure it out form the device code since I am not familiar with the hyperdebug code): the device will send one or more USB bulk packets (full speed so up to 64 bytes each). The first packet contains the header and some data, and the remaining packets contain the rest of the data? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lowRISC software team reviewed this pull request together in a review session.
We have some detailed suggestions below, but also at the high level have some questions about how this is implemented.
Firstly this is missing comments. Please include comments detailing the protocol so that code readers do not have to get it from another source. If there is a canonical description of the protocol, please also include a link.
We are skeptical about the way the struct is organised. It's named header but it is clearly including more things than just the header, i.e. it contains a protocol identification byte and data bytes. The data bytes also do not reflect the actual maximum size, and if we understand correctly that's only for ensuring that USB bulk reads are not overshort.
Another thing is that this is a wire protocol, it needs to be conscious of the byte order. You can either use zerocopy's byte order primitives, or perhaps just parse the bytes using byteorder crate directly without creating a struct. Given the small number of fields it might even aid readability. Although, given that we don't really support BE machines, feel free to ignore this suggestion.
PS: In https://chromium-review.googlesource.com/c/chromiumos/platform/ec/+/5128914, gpio.c L1168, it converts tx_buffer + 1
(with alignment 1) to struct gpio_monitoring_header_t
(with alignment 8) and is undefined behaviour.
.borrow() | ||
.write_bulk(cmsis_interface.out_endpoint, &pkt)?; | ||
|
||
let mut resp = RspGpioMonitoringHeader::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a sentinel value and reading into it, create a buffer, read into it, and then use zerocopy's FromBytes::read_from
to convert that buffer into a filled struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
match resp.status { | ||
0 => (), | ||
n => bail!(TransportError::CommunicationError(format!( | ||
"HyperDebug error: {}", | ||
n | ||
))), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a simple if is easier to read here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just happens that I added handling for one specific error code, which deserves a descriptive error message, since it can be triggered by the operator, and not only by bugs in opentitantool/HyperDebug.
n | ||
))), | ||
} | ||
let skip_bytes = resp.struct_size as usize - (GPIO_MONITORIN_HEADER_SIZE - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest removing the constant and use memoffset::offset_of!(RspGpioMonitoringHeader, data)
when you need it.
Also, this code is not guarding against overflows when struct_size
is smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed both points.
let mut signal_bits = 0; | ||
while (pin_names.len() - 1) >> signal_bits != 0 { | ||
signal_bits += 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut signal_bits = 0; | |
while (pin_names.len() - 1) >> signal_bits != 0 { | |
signal_bits += 1; | |
} | |
let signal_bits = (pin_names.len() * 2 - 1).ilog2(); |
or
let mut signal_bits = 0; | |
while (pin_names.len() - 1) >> signal_bits != 0 { | |
signal_bits += 1; | |
} | |
let signal_bits = 32 - (pin_names.len() as u32 - 1).leading_zeros(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose the latter of the two suggestions, since I think the name of the method leading_zeros()
explains everything to any casual reader, who like me may not be familiar with either ilog2()
or leading_zeros()
.
let signal_mask = (1u64 << signal_bits) - 1; | ||
|
||
let mut cur_time: u64 = resp.start_timestamp; | ||
let mut cur_level = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut cur_level = Vec::new(); | |
let mut cur_level = Vec::with_capacity(pin_names.len()); |
Although I think we could just be using bitmasks in the loop itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I chose Vec<bool>
because I thought that the code in the loop would become cleaner, but it seems that it is causing more trouble than it is saving. I have changed the code to use a bitmask in the format of start_levels
.
} | ||
let skip_bytes = resp.struct_size as usize - (GPIO_MONITORIN_HEADER_SIZE - 1); | ||
|
||
let mut databytes: Vec<u8> = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut databytes: Vec<u8> = Vec::new(); | |
let mut data_bytes = Vec::with_capacity(resp.transcript_size as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reworked the logic to do just a single resize()
.
let skip_bytes = resp.struct_size as usize - (GPIO_MONITORIN_HEADER_SIZE - 1); | ||
|
||
let mut databytes: Vec<u8> = Vec::new(); | ||
databytes.extend_from_slice(&resp.data[..bytecount - GPIO_MONITORIN_HEADER_SIZE]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should just skip bytes here and not using it as starting index or loop condition later.
|
||
while databytes.len() < skip_bytes + resp.transcript_size as usize { | ||
let original_length = databytes.len(); | ||
databytes.resize(original_length + 64, 0u8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of growing and shrinking, perhaps resize to full expected size ahead of time and just keep filling it.
// since the previous event (on any signal, not necessarily on that same one). | ||
cur_time += value >> signal_bits; | ||
events.push(MonitoringEvent { | ||
signal_index: (value & signal_mask) as u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(value & signal_mask)
is used many times. Perhaps save it to a variable first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
events.push(MonitoringEvent { | ||
signal_index: (value & signal_mask) as u8, | ||
edge: if cur_level[(value & signal_mask) as usize] { | ||
cur_level[(value & signal_mask) as usize] = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a cur_level[index] ^= true;
after the push, or, cur_level ^= 1 << index;
if you opted to use bitmask instead of creating an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thank you for the very detailed review.
Yes, on the HyperDebug firmware side, I use a USB-queueing library also used with e.g. the UART USB interfaces. This library does not deal with "transfers" terminated with a short (possibly zero length) packet, but thinks of the data a a single stream, and may in theory send any number of bytes in USB packets, though as I enqueue bytes very quickly in this case, unlike a physical UART, they will practically always arrive in 64-byte packets except the last one. My code here is still prepared for short packets "in the middle" of the response, though. In the future, I may choose to change the HyperDebug firmware to provide the usual transmission boundaries, if we find that desirable. I chose the above mostly because the USB streaming code library was there already, and is well tested, allowing me to spend time on the higher level functions.
I do not have a document describing the protocol, only comments in the HyperDebug source code. I have added more comments in this file, and a reference to the HyperDebug file.
Yes, I could not do
Yes, I used to be much more conscious of wire protocols, preferring big endian on the wires. But the EC codebase makes no such attempt, at it seems that BE machines are not common anymore, so I think I will ignore this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating. This looks much better now.
|
||
/// Read 7 bits from each byte, least significant byte first. High bit of one indicates more | ||
/// bytes belong to the same value. | ||
fn decode_leb128(idx: &mut usize, databytes: &[u8]) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't guard against data bytes ending with a byte with MSB set. We would like an error with meaning message instead of panic if that error happens.
You could switch this to take a &mut impl std::io::Read
and then iterate over using std::io::Cursor::new(&databytes[header_bytes..])
. It'll make the use site cleaner but it might make this function a bit more uglier (no sure if it worths the trade-off).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified this routine to return Result<u64>
, and for good measure made it not mutate idx
in the case of errors.
vec![0u8; 1 + size_of::<RspGpioMonitoringHeader>() + USB_MAX_SIZE]; | ||
let mut bytecount = 0; | ||
|
||
while bytecount < 1 + size_of::<RspGpioMonitoringHeader>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header always fits in one packet. Would it make sense to change the code to first do an initial read_bulk
to get the header + part of the data. With this, we now know the size of the full data so we can allocate a buffer for it and read it with several read_bulk
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my interpretation of @jesultra's comment it sounds like they're using USB as a stream and doesn't guarantee the header'll be sent in a single packet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I missed @jesultra 's comment on the USB protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart a final nit
let mut shift = 0; | ||
while i < databytes.len() { | ||
let byte = databytes[i]; | ||
value |= ((byte & 0x7F) as u64) << shift; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may overflow there are 10 bytes without MSB set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have inserted a check to return an error in case of a tenth byte in the encoding, meaning that the routine accepts at most 63 bits of data for an u64. I do not think it is worth supporting the most significant 64th bit being set through the lowest bit in the tenth byte of encoding, as it would make error handling more complicated.
The Ti50 team occasionally sees "communication error" from opentitansession, most often during `transport init`, but that could be just because there are dozens of GPIOs being manipulated in a short time. This PR aims to help a litle bit in diagnosing, by making each error message unique. Change-Id: I28e804429d153c1cd527ea467729364bc7c1ec15 Signed-off-by: Jes B. Klinke <[email protected]>
*idx = i; | ||
return Ok(value); | ||
} | ||
if shift + 7 > 64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shift is already incremented by 7 above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is to error when shift == 63
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but even having a shift value of e.g. 60 for next round would allow an overflow, as a value of 0x7F shifted 60 bits would need 67 bits of storage. That is why this line ensures that you can shift any 7-bit value by shift
bits, and it still does not exceed 64 bits.
.read_bulk(cmsis_interface.in_endpoint, &mut resp)?; | ||
let resp = &resp[..bytecount]; | ||
ensure!( | ||
bytecount >= 4 && resp[0] == Self::CMSIS_DAP_CUSTOM_COMMAND_GOOGLE_INFO && resp[1] >= 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe document what the resp[1] >= 2
is? I assume that this is the number of bytes that follow the header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
When verifying the ITE bootloader waveform, 30000 events will be captured, which takes a long time to transmit inefficiently via the text console. This CL leverages the binary CMSIS-DAP protocol to allow an alternative way of retrieving a transcript of captured GPIO events. Corresponding functionality in opentitantool proposed here: lowRISC/opentitan#20672 BUG=b:266832220 TEST=make tast (with local changes to opentitantool, to use new protocol) Change-Id: Id8c472ef29100f1a89743575b1b7ec00aaef449b Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/ec/+/5128914 Code-Coverage: Zoss <[email protected]> Commit-Queue: Jes Klinke <[email protected]> Reviewed-by: Jett Rink <[email protected]> Tested-by: Jes Klinke <[email protected]>
Current code assumes that any flavor of a HyperDebug debugger which implements the CMSIS-DAP protocol, will also implement the Google extensions for I2C host and device control. Allthough this is the case now, this CL introduces code to properly query for Google extensions. Change-Id: I48a88d6917935d45d716fff6c743f9b2f0a52202 Signed-off-by: Jes B. Klinke <[email protected]>
HyperDebug supports logic analyzer functionality, in which it will record events on a given set of gpio pins, and `opentitantool` can later be used to retrieve a transcript of every level change with microsecond timestamp. This has been used by the GSC team to verify the reaction time of firmware under test. Such testing involve typically a few handfuls of events, which can easily be transmitted via the textual protocol. However, We now plan on using the functionality for cases with 30000 events to be retrieved, which would take many tens of seconds to inefficently transmit via the console (which runs slow enough that the physical UART can keep up). To improve performance, this CL introduces another Google-specific extension to the binary CMSIS-DAP protocol, for GPIO operations, and adds code to repliate the `gpio monitoring read` functionality. (Starting and stopping the monitoring can still only be done through the textual protocol, those do not carry a large amount of data. Though there may be a 80-character limit on a single command, which could impact the ability to monitor 5 or more signals at once, so in the future we may want to allow starting monitoring also through the binary protocol.) Change-Id: I3c075f2960b4d4a38bff8cd7d8e270a3a1211a9b Signed-off-by: Jes B. Klinke <[email protected]>
Updated to new HyperDebug firmware with support for binary GPIO monitoring protocol through "vendor extension" to CMSIS-DAP. Change-Id: I29fb35c39e835e59763e0d0a935b087dd14ba2ae Signed-off-by: Jes B. Klinke <[email protected]>
Successfully created backport PR for |
HyperDebug supports logic analyzer functionality, in which it will record events on a given set of gpio pins.
opentitantool
can then later be used to retrieve a transcript of every level change with microsecond timestamp.This has been used by the GSC team to verify the reaction time of firmware under test. Such testing involve typically a few handfuls of events, which can easily be transmitted via the textual protocol. However, we now plan on using the functionality for cases with 30000 events to be retrieved, which would take many tens of seconds to inefficently transmit via the console (which runs slow enough that the physical UART can keep up).
To improve performance, this CL introduces another Google-specific extension to the binary CMSIS-DAP protocol, for GPIO operations, and adds code to replicate the
gpio monitoring read
functionality. (Starting and stopping the monitoring can still only be done through the textual protocol, those do not carry a large amount of data. Though there may be a 80-character limit on a single command, which could impact the ability to monitor 5 or more signals at once, so in the future we may want to allow starting monitoring also through the binary protocol.)Corresponding HyperDebug functionality implemented here:
https://chromium-review.googlesource.com/c/chromiumos/platform/ec/+/5128914