Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle unregistered users in BearerTokenAuthMechanism and implement user registration mechanism #10972

Open
wants to merge 64 commits into
base: develop
Choose a base branch
from

Conversation

GPortas
Copy link
Contributor

@GPortas GPortas commented Oct 28, 2024

What this PR does / why we need it:

This PR includes a mechanism for registering new users in Dataverse through a Bearer Token.

BearerTokenAuthMechanism

The Bearer Token authentication functionality (available through feature flag) has been extended to properly handle cases where BearerTokenAuthMechanism successfully validates the token but cannot identify any Dataverse user because there is no account associated with the token. See the current development version of BearerTokenAuthMechanism. (https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/api/auth/BearerTokenAuthMechanism.java#L58).

The BearerTokenAuthMechanism has been extended to return a meaningful error in the scenario described above. Specifically, the calling user would receive a 403 error, indicating that the token is validated but the user is not registered in Dataverse.

Additionally, the entire BearerTokenAuthMechanism class has been refactored, moving authentication and token verification logic to AuthService, since this logic is also utilized by the new user registration functionality included in this PR, and it also results in a simpler version of BearerTokenAuthMechanism, more testable and with more limited responsability.

User registration endpoint

The new endpoint takes the bearer token and a UserDTO object with the required properties to create a user account in Dataverse. It only works if the bearer token feature flag is enabled. A RegisterOidcUserCommand handles the validation of the bearer token and user data, and the account creation process. It can return various errors, including a new InvalidFieldsCommandException, which lists any validation errors in the UserDTO. This exception is now handled by AbstractApiBean like other command exceptions and can be used for future purposes.

Which issue(s) this PR closes:

Special notes for your reviewer:

  • See Handle unregistered users in BearerTokenAuthMechanism and implement user registration mechanism #10972 (comment) to understand why the OIDC UserInfo endpoint is not used in this design

  • The IT for the new registration endpoint is disabled because it depends on the containerized environment, which is not available in Jenkins. Additionally, the feature flag would not be active, so it would fail for this reason as well. Changes have been made to the Keycloak realm JSON to suit the needs of this IT, particularly the modification of roles in the test realm to allow an admin user to create a random test user in the IT setup.

Suggestions on how to test this:

Visual inspection of the API integration tests. Overall, this PR is extensively tested with unit and integration tests, but it is in the ITs where we can see how the endpoint calls work and the different responses that can be obtained.

We can also run this branch locally and reproduce the same calls made in the ITs using curl.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No

Is there a release notes update needed for this change?:

Yes, attached.

Additional documentation:

None

This comment has been minimized.

This comment has been minimized.

…earer token auth feature flags compatibility

This comment has been minimized.

This comment has been minimized.

1 similar comment

This comment has been minimized.

This comment has been minimized.

@GPortas GPortas assigned qqmyers and unassigned GPortas Nov 20, 2024
@GPortas GPortas requested a review from qqmyers November 20, 2024 16:02
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the updates. I'll pass it back with a couple questions.

This comment has been minimized.

This comment has been minimized.

@GPortas
Copy link
Contributor Author

GPortas commented Nov 21, 2024

Thank you for your detailed review (This is a big PR!). We’ve implemented several improvements. Please take a look at the latest changes and let me know if you have any additional suggestions. @qqmyers

@GPortas GPortas requested a review from qqmyers November 21, 2024 10:39
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:10959-bearer-token-auth-ext
ghcr.io/gdcc/configbaker:10959-bearer-token-auth-ext

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. As discussed in other comments, I think/agree getting more field types from the claims makes sense and will probably be needed at some point, but this code solves the immediate issue and gives code that can be built on.

curl -H "Authorization: Bearer $TOKEN" -X POST http://localhost:8080/api/users/register --data '{"termsAccepted":true}'
It is essential to send a JSON that includes the property ``termsAccepted`` set to true, which indicates that you accept the terms of service of Dataverse. Otherwise, you will not be able to create an account.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I think some config option like that for terms, affiliation, and role would make sense, but I'd definitely suggest a new PR rather holding this up and making it even bigger.

@cmbz cmbz added the FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) label Nov 21, 2024
@johannes-darms
Copy link
Contributor

johannes-darms commented Nov 22, 2024

Looks good. As discussed in other comments, I think/agree getting more field types from the claims makes sense and will probably be needed at some point, but this code solves the immediate issue and gives code that can be built on.

I totally agree. This PR improves OIDC compatibility quite a bit and builds a solid foundation for further improvements (getting more information from an authorisation server, possibly via configurable claims/scopes per provider).

The only conceptual problem I see with the approach is that the user is not asked for consent before the information flow. This is part of the specification . The required processes and attributes are part of the standard and could be "just" used instead of reinventing the wheel with custom json properties. Terms and conditions, policies and other information that the authorisation server can display can be provided during client registration (https://openid.net/specs/openid-connect-registration-1_0.html#toc) (see Keyclack OID Client -> Advanced>Fine grained OIDC configuration).

IHMO: As stated above, if the required properties (scopes/claims) are not provided by the OIDC server (can be confiured per client) or released by the user for use by a dataverse, the OIDC server configuration is simply not compatible or the user is not willing to participate. Implementing UI/APIs around it to obtain information via another channel does work, but requires additional code, circumvents the standard and somehow questions the use of a protocol with the sole purpose of obtaining identity information.

@qqmyers
Copy link
Member

qqmyers commented Nov 22, 2024

I think we do still ask the user for consent, it is just that the full terms are not presented there. In some sense, I see this discussion about whether it is better to have Keycloak ask for missing information, or whether the Dataverse client can do it. I'm not sure why one is better than the other (though both are possible). In some sense the argument that you should use Keycloak for this can just be applied further - if you need Keycloak to add info not supplied by, for example, ORCID, you should just require ORCID to add the fields you need or not use it. My guess is that the use case where a Dataverse instance would like to use an OIDC service that is not under their control enough for it to be configured to supply all the fields we'd like is real and having the API/SPA able to handle that case w/o requiring Keycloak seems reasonable. That said, we should definitely support a fully configured Keycloak in the middle too.

I think that means, minimally to make the terms acceptance in json be configurable by flag now, and, in the future, add support for the optional affiliation/role fields to come from claims?

@johannes-darms
Copy link
Contributor

I think we do still ask the user for consent, it is just that the full terms are not presented there.

It does not ask if Dataverse is allowed to copy personal information (name, email,...) from an OIDC server to Dataverse. Nor, as said, are the ToS displayed. The OIDC protocol has concepts to solve this, but they aren't used. I'm happy with this implementation, as it's certainly an improvement. However, it can be improved in this respect by implementing the standard and the protocol. Yes, not all OIDC authentication servers support all features, support all requirements, and always provide meaningful responses. Solution for these cases have to be worked out and implemented later, and some of the solutions in the PR already solve them, but deviations from the standard should be avoided where possible.

My guess is that the use case where a Dataverse instance would like to use an OIDC service that is not under their control enough for it to be configured to supply all the fields we'd like is real and having the API/SPA able to handle that case w/o requiring Keycloak seems reasonable.

Yes, it makes perfect sense. However, you need to make sure that the OIDC providers that support these features can be used without additional code, properties and configuration. This is already partly done with the feature flags.

I think that means, minimally to make the terms acceptance in json be configurable by flag now, and, in the future, add support for the optional affiliation/role fields to come from claims?

Yes, of course this is the way to go. However, it lacks the behaviour that should happen when terms acceptance is disabled in JSON. This is basically an implementation of the protocol (i.e. creating an additional authentication request with a promt=consent option (see spec)).

@g-saracca
Copy link

g-saracca commented Dec 2, 2024

@ofahimIQSS I think we should wait for this one, while doing the SPA integration and forcing to send username when the OIDC token has the preferred_username present the endpoint should fail and is not failing.
See https://github.com/IQSS/dataverse/pull/10972/files#diff-0811c7940f163c5ba98554fe91242de94213ea2b586a4e8aa4142234df70bc9aR103

@g-saracca
Copy link

I think we should wait for this one, while doing the SPA integration and forcing to send username when the OIDC token has the preferred_username present the endpoint should fail and is not failing.

This was sort of a mistake, there was a typo in the docs, the dataverse flag is not api-bearer-auth-json-claims but api-bearer-auth-provide-missing-claims

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) GREI Re-arch Issues related to the GREI Dataverse rearchitecture Original size: 30 Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) SPA.Q4.4 OIDC login + API authentication SPA These changes are required for the Dataverse SPA User Role: API User Makes use of APIs
Projects
Status: Ready for QA ⏩
Development

Successfully merging this pull request may close these issues.

Handle unregistered users in BearerTokenAuthMechanism
6 participants