-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add asInputStream to ByteString #1085
Conversation
actor/src/main/scala-2.12/org/apache/pekko/util/ByteString.scala
Outdated
Show resolved
Hide resolved
* @see [[asByteBuffer]] | ||
* @since 1.1.0 | ||
*/ | ||
final def asInputStream: InputStream = new ByteArrayInputStream(toArrayUnsafe()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The toArrayUnsafe
documentation states:
If the ByteString is backed by a single array it is returned without any copy. If it is backed by a rope
of multiple ByteString instances a new array will be allocated and the contents will be copied
into it before returning it
So we could avoid that allocation either by:
- defining
def asInputStream: InputStream
as an abstract method, and add different implementations that avoid allocations- using
new ByteArrayInputStream(bytes)
forByteString1C
, - using
new ByteArrayInputStream(bytes, start, length)
forByteString1
, - using
new SequenceInputStream(bytestrings.map(_.asInputStream))
forByteStrings
- using
- Implement in the super class using
SequenceInputStream(asByteBuffers.map(bb => new ByteBufferBackedInputStream(bb))
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was about to write the same, this should be the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point - I have committed a change based on @jtjeferreira's suggestions
Co-Authored-By: João Ferreira <[email protected]>
352558e
to
356c46c
Compare
actor/src/main/scala-2.12/org/apache/pekko/util/ByteString.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but 2 notes:
- mima is complaining
- should some benchmarks be written?
This is a forward compatibility error and is expected, i.e. MiMa complains when you add a new method to a public interface
Yes they should |
this is still WIP
|
If it's not too pushy it would be good to get this out for M1, I suggested on mailing list we should do M1 around mid next week |
I have a basic benchmark that shows that ByteString.asInputStream is faster than new ByteArrayInputStream(ByteString.toArray). The latter is the current safe way to create an InputStream for a ByteString. I have also tested ByteArrayInputStream(ByteString.toArrayUnsafe) which is about as fast as ByteString.asInputStream for simple ByteStrings (single arrays) but is slow like ByteArrayInputStream(ByteString.toArray) when you have byte strings composed of many smaller byte strings. |
@@ -566,6 +572,9 @@ object ByteString { | |||
|
|||
def asByteBuffers: scala.collection.immutable.Iterable[ByteBuffer] = bytestrings.map { _.asByteBuffer } | |||
|
|||
override def asInputStream: InputStream = | |||
new SequenceInputStream(bytestrings.map(_.asInputStream).iterator.asJavaEnumeration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytestrings.iterator.map(_.asInputString).asJavaEnumeration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@pjfanning I will send a suggestion tomorrow about this line, currently on my phone.
I just checked in Netty, there is a |
I checked that already. Unfortunately, Scala conversion to JavaEnumeration only seem to work on Scala Iterators and not on Scala Collections. With this change:
|
|
We are dealing with wrapping byte arrays - the perfect InputStream for wrapping a byte array in ByteArrayInputStream. |
@He-Pin I think |
It will defer the mapping if you don't read. There is no need to rerun the benchmark. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
it is being read - there is nothing gained here by thinking there is lazy eval anywhere here. SequenceInputStream takes an evaluated JavaEnumeration. For me, if we adjust this, we could write a custom function that maps a Vector to a JavaEnumeration without requiring the intermediate iterator. |
actor/src/main/scala-2.12/org/apache/pekko/util/ByteString.scala
Outdated
Show resolved
Hide resolved
actor/src/main/scala-2.13/org/apache/pekko/util/ByteString.scala
Outdated
Show resolved
Hide resolved
@@ -579,6 +583,9 @@ object ByteString { | |||
|
|||
def asByteBuffers: scala.collection.immutable.Iterable[ByteBuffer] = bytestrings.map { _.asByteBuffer } | |||
|
|||
override def asInputStream: InputStream = | |||
new SequenceInputStream(bytestrings.map(_.asInputStream).iterator.asJavaEnumeration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new SequenceInputStream(bytestrings.map(_.asInputStream).iterator.asJavaEnumeration) | |
new SequenceInputStream(bytestrings.iterator.map(_.asInputStream).asJavaEnumeration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
See #995
This needs tests but raising it for discussion.
This is probably not that useful for Pekko usage but general ByteString users might appreciate it.
This method gets an InputStream without cloning the array but is not unsafe like toArrayUnsafe which exposes the underlying array data in a way that it can be unsafely changed. The InputStream wraps this unsafe array in a way that doesn't allow it to be modified.
I've done some experimentation and ByteArrayInputStream works just as well as alternative implementations in commons-io and fastutil.
apache/pekko-http#424 is a good enough solution for Pekko HTTP.