Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3 Problems in WARequest>>#bodyDecoded #1357

Open
JoachimTuchel opened this issue Mar 11, 2023 · 4 comments
Open

3 Problems in WARequest>>#bodyDecoded #1357

JoachimTuchel opened this issue Mar 11, 2023 · 4 comments

Comments

@JoachimTuchel
Copy link

JoachimTuchel commented Mar 11, 2023

I think there are 3 Problems with WARequest>>bodyDecoded

  1. The comment says the method returns nil if no body is present. That's not true, it doesn't check for a body at all, and that may even be unnecessary
  2. I think it is reasonable to assume the body is encoded in utf-8 if no other charset ist stated the contentType parameters. In my case the contentType id text/plain without any parameters. I don't think this should throw an error, but assume it is utf-8
  3. If the throwing of an error is intended, either the word "no" or "not" should be removed from the error message, because it is now doubly negated:
    "**_no_** character set of request body can **_not_** be determined"
@eMaringolo
Copy link
Contributor

We discussed this with @JoachimTuchel and I proposed than instead of falling back to a default codec if no charset is specified in the content type, we could fall back to the request context codec, which will likely be UTF-8.

Something like:

WARequest>>#bodyDecoded
  "Comment TBD."

  | contentType charSet |
  contentType := self contentType.
  charSet := contentType isNil ifFalse: [contentType charSet].
  ^ charSet isNil
      ifTrue: [self requestContext codec
        ifNil: [WAIllegalStateException signal: 'no character set of request body can not be determined']
        ifNotNil: [:requestCodec | body isNil ifFalse: [requestCodec decode: body]]]
      ifFalse: [self bodyDecodeUsing: charSet]

@jbrichau
Copy link
Member

jbrichau commented Jul 14, 2024

Getting back to this old issue/question...

First off:

  • bodyDecodeUsing: returns nil when there is no body. As such, the method bodyDecoded will returnnil as well.
  • I corrected the error message to 'character set of request body cannot be determined'

About using a default fallback then. From the spec (https://www.rfc-editor.org/rfc/rfc9110.html#field.content-type):

A sender that generates a message containing content SHOULD generate a Content-Type header field in that message unless the intended media type of the enclosed representation is unknown to the sender. If a Content-Type header field is not present, the recipient MAY either assume a media type of "application/octet-stream" ([RFC2046], Section 4.5.1) or examine the data to determine its type.
So, while it is not required for a request to have the 'Content-Type' header, it may be omitted and then the spec says we may assume it's arbitrary binary data.

In the situation that no 'Content-Type' header was present, we would also default to the fallback, but I think that should not be allowed to happen. I suggest to only fallback to the encoding of the adaptor (i.e. the requestContext codec) if a 'Content-Type' header was present with a non-binary media type (text/plain, text/html,...). So, only use a default charSet in the case of non-binary mediatypes. I think we will want to error if the specified mimetype is "non-decodable" anyway, rather than silently trying to decode something that will not make sense.

What do you think @eMaringolo ?

@jbrichau
Copy link
Member

Suggestion:

bodyDecoded
	"Answer the body decoded using the character set in the request header. Answer nil if no body is present. Signal an error if no character set is present in the request header."

	| contentType charSet |
	contentType := self contentType.
	contentType ifNil: [ WAIllegalStateException signal: 'Content type of request body cannot be determined.' ].
	charSet := contentType charSet.
	^ charSet 
		ifNil: [ 
			self requestContext codec
				ifNil: [ WAIllegalStateException signal: 'Character set of request body cannot be determined.' ]
				ifNotNil: [ :requestCodec | body ifNotNil: [ requestCodec decode: body ] ] ]
		ifNotNil: [ self bodyDecodeUsing: charSet ]

@marschall
Copy link
Contributor

Just assuming a character set / encoding when one was not provided and the contents is not binary will just open a back door to introduce the kind of encoding problems we have been trying to fix for year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants