Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opentitantool] Introduce binary protocol for HyperDebug gpio monitoring #20672

Merged
merged 4 commits into from
Jan 26, 2024

Conversation

jesultra
Copy link
Contributor

@jesultra jesultra commented Dec 18, 2023

HyperDebug supports logic analyzer functionality, in which it will record events on a given set of gpio pins.
opentitantool can then later be used to retrieve a transcript of every level change with microsecond timestamp.

This has been used by the GSC team to verify the reaction time of firmware under test. Such testing involve typically a few handfuls of events, which can easily be transmitted via the textual protocol. However, we now plan on using the functionality for cases with 30000 events to be retrieved, which would take many tens of seconds to inefficently transmit via the console (which runs slow enough that the physical UART can keep up).

To improve performance, this CL introduces another Google-specific extension to the binary CMSIS-DAP protocol, for GPIO operations, and adds code to replicate the gpio monitoring read functionality. (Starting and stopping the monitoring can still only be done through the textual protocol, those do not carry a large amount of data. Though there may be a 80-character limit on a single command, which could impact the ability to monitor 5 or more signals at once, so in the future we may want to allow starting monitoring also through the binary protocol.)

Corresponding HyperDebug functionality implemented here:
https://chromium-review.googlesource.com/c/chromiumos/platform/ec/+/5128914

@jesultra jesultra changed the title [opentitantool] Introduce binary protocol for HyperDebug gpio [opentitantool] Introduce binary protocol for HyperDebug gpio monitoring Dec 18, 2023
@jesultra jesultra marked this pull request as ready for review December 18, 2023 21:01
@jesultra jesultra requested a review from a team as a code owner December 18, 2023 21:01
@pamaury
Copy link
Contributor

pamaury commented Jan 9, 2024

Can you confirm our understanding of the USB protocol used here (it's hard to figure it out form the device code since I am not familiar with the hyperdebug code): the device will send one or more USB bulk packets (full speed so up to 64 bytes each). The first packet contains the header and some data, and the remaining packets contain the rest of the data?
Also, will the device send a ZLP if the total transfer length is a multiple of 64 bytes, or not? (I am assuming not since your code treats a ZLP as an error).

Copy link
Contributor

@nbdd0121 nbdd0121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lowRISC software team reviewed this pull request together in a review session.

We have some detailed suggestions below, but also at the high level have some questions about how this is implemented.

Firstly this is missing comments. Please include comments detailing the protocol so that code readers do not have to get it from another source. If there is a canonical description of the protocol, please also include a link.

We are skeptical about the way the struct is organised. It's named header but it is clearly including more things than just the header, i.e. it contains a protocol identification byte and data bytes. The data bytes also do not reflect the actual maximum size, and if we understand correctly that's only for ensuring that USB bulk reads are not overshort.

Another thing is that this is a wire protocol, it needs to be conscious of the byte order. You can either use zerocopy's byte order primitives, or perhaps just parse the bytes using byteorder crate directly without creating a struct. Given the small number of fields it might even aid readability. Although, given that we don't really support BE machines, feel free to ignore this suggestion.

PS: In https://chromium-review.googlesource.com/c/chromiumos/platform/ec/+/5128914, gpio.c L1168, it converts tx_buffer + 1 (with alignment 1) to struct gpio_monitoring_header_t (with alignment 8) and is undefined behaviour.

.borrow()
.write_bulk(cmsis_interface.out_endpoint, &pkt)?;

let mut resp = RspGpioMonitoringHeader::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a sentinel value and reading into it, create a buffer, read into it, and then use zerocopy's FromBytes::read_from to convert that buffer into a filled struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 303 to 344
match resp.status {
0 => (),
n => bail!(TransportError::CommunicationError(format!(
"HyperDebug error: {}",
n
))),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a simple if is easier to read here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just happens that I added handling for one specific error code, which deserves a descriptive error message, since it can be triggered by the operator, and not only by bugs in opentitantool/HyperDebug.

n
))),
}
let skip_bytes = resp.struct_size as usize - (GPIO_MONITORIN_HEADER_SIZE - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest removing the constant and use memoffset::offset_of!(RspGpioMonitoringHeader, data) when you need it.

Also, this code is not guarding against overflows when struct_size is smaller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed both points.

Comment on lines 328 to 331
let mut signal_bits = 0;
while (pin_names.len() - 1) >> signal_bits != 0 {
signal_bits += 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut signal_bits = 0;
while (pin_names.len() - 1) >> signal_bits != 0 {
signal_bits += 1;
}
let signal_bits = (pin_names.len() * 2 - 1).ilog2();

or

Suggested change
let mut signal_bits = 0;
while (pin_names.len() - 1) >> signal_bits != 0 {
signal_bits += 1;
}
let signal_bits = 32 - (pin_names.len() as u32 - 1).leading_zeros();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose the latter of the two suggestions, since I think the name of the method leading_zeros() explains everything to any casual reader, who like me may not be familiar with either ilog2() or leading_zeros().

let signal_mask = (1u64 << signal_bits) - 1;

let mut cur_time: u64 = resp.start_timestamp;
let mut cur_level = Vec::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut cur_level = Vec::new();
let mut cur_level = Vec::with_capacity(pin_names.len());

Although I think we could just be using bitmasks in the loop itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I chose Vec<bool> because I thought that the code in the loop would become cleaner, but it seems that it is causing more trouble than it is saving. I have changed the code to use a bitmask in the format of start_levels.

}
let skip_bytes = resp.struct_size as usize - (GPIO_MONITORIN_HEADER_SIZE - 1);

let mut databytes: Vec<u8> = Vec::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut databytes: Vec<u8> = Vec::new();
let mut data_bytes = Vec::with_capacity(resp.transcript_size as usize);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reworked the logic to do just a single resize().

let skip_bytes = resp.struct_size as usize - (GPIO_MONITORIN_HEADER_SIZE - 1);

let mut databytes: Vec<u8> = Vec::new();
databytes.extend_from_slice(&resp.data[..bytecount - GPIO_MONITORIN_HEADER_SIZE]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just skip bytes here and not using it as starting index or loop condition later.


while databytes.len() < skip_bytes + resp.transcript_size as usize {
let original_length = databytes.len();
databytes.resize(original_length + 64, 0u8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of growing and shrinking, perhaps resize to full expected size ahead of time and just keep filling it.

// since the previous event (on any signal, not necessarily on that same one).
cur_time += value >> signal_bits;
events.push(MonitoringEvent {
signal_index: (value & signal_mask) as u8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(value & signal_mask) is used many times. Perhaps save it to a variable first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

events.push(MonitoringEvent {
signal_index: (value & signal_mask) as u8,
edge: if cur_level[(value & signal_mask) as usize] {
cur_level[(value & signal_mask) as usize] = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a cur_level[index] ^= true; after the push, or, cur_level ^= 1 << index; if you opted to use bitmask instead of creating an array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jesultra
Copy link
Contributor Author

Thank you for the very detailed review.

Can you confirm our understanding of the USB protocol used here (it's hard to figure it out form the device code since I am not familiar with the hyperdebug code): the device will send one or more USB bulk packets (full speed so up to 64 bytes each). The first packet contains the header and some data, and the remaining packets contain the rest of the data? Also, will the device send a ZLP if the total transfer length is a multiple of 64 bytes, or not? (I am assuming not since your code treats a ZLP as an error).

Yes, on the HyperDebug firmware side, I use a USB-queueing library also used with e.g. the UART USB interfaces. This library does not deal with "transfers" terminated with a short (possibly zero length) packet, but thinks of the data a a single stream, and may in theory send any number of bytes in USB packets, though as I enqueue bytes very quickly in this case, unlike a physical UART, they will practically always arrive in 64-byte packets except the last one. My code here is still prepared for short packets "in the middle" of the response, though.

In the future, I may choose to change the HyperDebug firmware to provide the usual transmission boundaries, if we find that desirable. I chose the above mostly because the USB streaming code library was there already, and is well tested, allowing me to spend time on the higher level functions.

Firstly this is missing comments. Please include comments detailing the protocol so that code readers do not have to get it from another source. If there is a canonical description of the protocol, please also include a link.

I do not have a document describing the protocol, only comments in the HyperDebug source code. I have added more comments in this file, and a reference to the HyperDebug file.

We are skeptical about the way the struct is organised. It's named header but it is clearly including more things than just the header, i.e. it contains a protocol identification byte and data bytes. The data bytes also do not reflect the actual maximum size, and if we understand correctly that's only for ensuring that USB bulk reads are not overshort.

Yes, I could not do read_bulk(size_of::<...>) without losing the remaning bytes from the first 64-byte packet being sent, which is why I added the extra data field to the header struct. Using FromBytes as you suggested is much cleaner.

Another thing is that this is a wire protocol, it needs to be conscious of the byte order. You can either use zerocopy's byte order primitives, or perhaps just parse the bytes using byteorder crate directly without creating a struct. Given the small number of fields it might even aid readability. Although, given that we don't really support BE machines, feel free to ignore this suggestion.

Yes, I used to be much more conscious of wire protocols, preferring big endian on the wires. But the EC codebase makes no such attempt, at it seems that BE machines are not common anymore, so I think I will ignore this issue.

Copy link
Contributor

@nbdd0121 nbdd0121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating. This looks much better now.


/// Read 7 bits from each byte, least significant byte first. High bit of one indicates more
/// bytes belong to the same value.
fn decode_leb128(idx: &mut usize, databytes: &[u8]) -> u64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't guard against data bytes ending with a byte with MSB set. We would like an error with meaning message instead of panic if that error happens.

You could switch this to take a &mut impl std::io::Read and then iterate over using std::io::Cursor::new(&databytes[header_bytes..]). It'll make the use site cleaner but it might make this function a bit more uglier (no sure if it worths the trade-off).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified this routine to return Result<u64>, and for good measure made it not mutate idx in the case of errors.

vec![0u8; 1 + size_of::<RspGpioMonitoringHeader>() + USB_MAX_SIZE];
let mut bytecount = 0;

while bytecount < 1 + size_of::<RspGpioMonitoringHeader>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header always fits in one packet. Would it make sense to change the code to first do an initial read_bulk to get the header + part of the data. With this, we now know the size of the full data so we can allocate a buffer for it and read it with several read_bulk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my interpretation of @jesultra's comment it sounds like they're using USB as a stream and doesn't guarantee the header'll be sent in a single packet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I missed @jesultra 's comment on the USB protocol.

Copy link
Contributor

@nbdd0121 nbdd0121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart a final nit

let mut shift = 0;
while i < databytes.len() {
let byte = databytes[i];
value |= ((byte & 0x7F) as u64) << shift;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may overflow there are 10 bytes without MSB set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have inserted a check to return an error in case of a tenth byte in the encoding, meaning that the routine accepts at most 63 bits of data for an u64. I do not think it is worth supporting the most significant 64th bit being set through the lowest bit in the tenth byte of encoding, as it would make error handling more complicated.

The Ti50 team occasionally sees "communication error" from
opentitansession, most often during `transport init`, but that could be
just because there are dozens of GPIOs being manipulated in a short
time.

This PR aims to help a litle bit in diagnosing, by making each error
message unique.

Change-Id: I28e804429d153c1cd527ea467729364bc7c1ec15
Signed-off-by: Jes B. Klinke <[email protected]>
*idx = i;
return Ok(value);
}
if shift + 7 > 64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shift is already incremented by 7 above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is to error when shift == 63

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but even having a shift value of e.g. 60 for next round would allow an overflow, as a value of 0x7F shifted 60 bits would need 67 bits of storage. That is why this line ensures that you can shift any 7-bit value by shift bits, and it still does not exceed 64 bits.

@jesultra jesultra requested a review from pamaury January 18, 2024 15:01
.read_bulk(cmsis_interface.in_endpoint, &mut resp)?;
let resp = &resp[..bytecount];
ensure!(
bytecount >= 4 && resp[0] == Self::CMSIS_DAP_CUSTOM_COMMAND_GOOGLE_INFO && resp[1] >= 2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe document what the resp[1] >= 2 is? I assume that this is the number of bytes that follow the header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

coreboot-org-bot pushed a commit to coreboot/chrome-ec that referenced this pull request Jan 26, 2024
When verifying the ITE bootloader waveform, 30000 events will be
captured, which takes a long time to transmit inefficiently via the text
console.

This CL leverages the binary CMSIS-DAP protocol to allow an alternative
way of retrieving a transcript of captured GPIO events.

Corresponding functionality in opentitantool proposed here:
lowRISC/opentitan#20672

BUG=b:266832220
TEST=make tast  (with local changes to opentitantool, to use new protocol)

Change-Id: Id8c472ef29100f1a89743575b1b7ec00aaef449b
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/ec/+/5128914
Code-Coverage: Zoss <[email protected]>
Commit-Queue: Jes Klinke <[email protected]>
Reviewed-by: Jett Rink <[email protected]>
Tested-by: Jes Klinke <[email protected]>
Current code assumes that any flavor of a HyperDebug debugger which
implements the CMSIS-DAP protocol, will also implement the Google
extensions for I2C host and device control.  Allthough this is the case
now, this CL introduces code to properly query for Google extensions.

Change-Id: I48a88d6917935d45d716fff6c743f9b2f0a52202
Signed-off-by: Jes B. Klinke <[email protected]>
HyperDebug supports logic analyzer functionality, in which it will
record events on a given set of gpio pins, and `opentitantool` can later
be used to retrieve a transcript of every level change with microsecond
timestamp.

This has been used by the GSC team to verify the reaction time of
firmware under test.  Such testing involve typically a few handfuls of
events, which can easily be transmitted via the textual protocol.
However, We now plan on using the functionality for cases with 30000
events to be retrieved, which would take many tens of seconds to
inefficently transmit via the console (which runs slow enough that the
physical UART can keep up).

To improve performance, this CL introduces another Google-specific
extension to the binary CMSIS-DAP protocol, for GPIO operations, and
adds code to repliate the `gpio monitoring read`
functionality.  (Starting and stopping the monitoring can still only be
done through the textual protocol, those do not carry a large amount of
data.  Though there may be a 80-character limit on a single command,
which could impact the ability to monitor 5 or more signals at once, so
in the future we may want to allow starting monitoring also through the
binary protocol.)

Change-Id: I3c075f2960b4d4a38bff8cd7d8e270a3a1211a9b
Signed-off-by: Jes B. Klinke <[email protected]>
Updated to new HyperDebug firmware with support for binary GPIO
monitoring protocol through "vendor extension" to CMSIS-DAP.

Change-Id: I29fb35c39e835e59763e0d0a935b087dd14ba2ae
Signed-off-by: Jes B. Klinke <[email protected]>
@jesultra jesultra merged commit cf256a0 into lowRISC:master Jan 26, 2024
32 checks passed
@jesultra jesultra deleted the hyp_debug branch January 26, 2024 21:00
@timothytrippel timothytrippel added the CherryPick:earlgrey_es_sival This PR should be cherry-picked to earlgrey_es_sival label Feb 7, 2024
Copy link

github-actions bot commented Feb 7, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CherryPick:earlgrey_es_sival This PR should be cherry-picked to earlgrey_es_sival SW:opentitantool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants