Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow strings in field/2 #4384

Merged
merged 11 commits into from
Dec 25, 2024

Conversation

greg-rychlewski
Copy link
Member

I'm worried I'm missing something obvious. But I think with these 2 simple changes to the schema functions we can allow string names in field/2.

The __schema__(:type, field)__ functions will now catch string names and get the right type. And the __schema__(:field_source, field)__ functions will ensure the field is converted back into an atom during normalization so that the adapters don't have to change.

@josevalim
Copy link
Member

Interesting. What happens if there is no schema?

@greg-rychlewski
Copy link
Member Author

greg-rychlewski commented Feb 24, 2024

Ah good catch, I knew I forgot something. Then it fails but it can be fixed by changing the guard in the adapter here: https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/postgres/connection.ex#L835 and here https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/postgres/connection.ex#L841

@josevalim
Copy link
Member

I think if we want to go down this route, we should rather go in the other direction. Have field_source return strings and then send it all the way down to the adapter as strings. If there is no field source, we normalize it to strings on the spot?

As long as it is a small change on the adapter side, we should be good. :)

@josevalim
Copy link
Member

Although converting to strings on adapters such as etso and ecto_mnesia will likely cause other problems.

@greg-rychlewski
Copy link
Member Author

It makes the inspection a bit weird too if it's normalized to string. I had to make a special condition to make the inspect look normal. I just pushed a change

@josevalim
Copy link
Member

Thanks @greg-rychlewski. I am honestly a bit unsure. It feels we are at the worst of both words: sometimes they are atom, sometimes they are strings. What do you think about:

  1. If there is a schema, we always normalize them to atoms in the planner (as long as the schema module has been loaded because we called a previous operation in it, String.to_existing_atom should be enough).

  2. For schemaless queries, we always keep them as strings.

Perhaps we can start with step 2 for now (and raise if a string is given to a schema source)?

@greg-rychlewski
Copy link
Member Author

That sounds like a good compromise. I'll update the PR.

There was one thing I was having trouble figuring out for inspect. I believe the dot notation for the fields does not play nice with strings. For example, atoms produce this

query = from p in Post, select: field(p, :visit)

IO.inspect query
# "from p0 in Inspect.Post, select: p0.visit"

But strings produce this

query = from p in Post, select: field(p, "visit")

IO.inspect query
# "from p0 in Inspect.Post, select: p0 . \"visit\"()"

I was previously handling it by transforming it to field inside of the inspect. But if we will normalize to string then it could be confusing if there is an error during planning and the user didn't use field/2. The only other thing I could think is to not use the dot notation, but then the inconsistency is probably confusing.

Do you know if there is anything I can do to make the string play nice with the dot notation?

@josevalim
Copy link
Member

My only suggestion is to indeed convert it back to field(..., ...) when pretty printing/inspecting.

@greg-rychlewski
Copy link
Member Author

I had one idea. Would you be against tagging the field/2 stuff all the way down to the adapter and them have the source resolved in the adapter? That way we could normalize everything given to field/2 to string and all the non field/2 stuff will stay as atom.

The reason I'm suggesting this is because when I tried normalizing schemaless sources to string it started to affect a lot of things I didn't expect like subqueries and joins. Doing it this way would minimize the blast radius and still have some kind of internal consistency. And either way we are asking adapters to change something.

@josevalim
Copy link
Member

To be clear, you are saying this:

field/2 with a string arrives as a field in the AST all the way down to the adapter? That sounds good to me.

But we still need to decide what to do with strings and schemas. We have to options:

  1. We say string fields are not checked, typed, or converted in any way (akin to a fragment)
  2. We say string fields are validated and normalized accordingly

@greg-rychlewski
Copy link
Member Author

I'm starting to feel like I completely misunderstood one of your earlier replies. When you said this

I am honestly a bit unsure. It feels we are at the worst of both words: sometimes they are atom, sometimes they are strings. What do you think about:

If there is a schema, we always normalize them to atoms in the planner (as long as the schema module has been loaded because we called a previous operation in it, String.to_existing_atom should be enough).

For schemaless queries, we always keep them as strings.

Were you talking about field/2 specifically? I read it as any field notation, for example p.title as well but now I think I was mistaken.

@josevalim
Copy link
Member

At the time I did not mean field/2 but later on I certainly thought it should be only about field/2. :)

@greg-rychlewski
Copy link
Member Author

greg-rychlewski commented Feb 27, 2024

Let me know what you think of this:

field/2 with string name

  • always normalizes its name as an atom when dealing with schemas
  • keeps its name as string when not dealing with schemas
  • `has its type validated when dealing with schemas. it's handled by adding guards to the reflection functions here:

{args, when_expr, body} ->

types_quoted =

field/2 with atom name

  • no changes. though i'm not sure if you'd prefer to normalize its name to string when dealing with schemaless sources

@greg-rychlewski
Copy link
Member Author

we could also make it so that field/2 with a string always emits a string and field/2 with an atom always emits an atom. for both schema/schemaless.

@josevalim
Copy link
Member

My thinking would be that, if you pass a string, then it is kept as is and it would always be the field source. We never transform or manipulate it in any case.

@greg-rychlewski
Copy link
Member Author

Do you think it would still be ok to check its type against the schema? For instance, if someone wanted to do this

field = get_user_input_as_string(...)
query = from p in Post, where field(p, ^field) == ^"some_string"

To work properly the parameter would have to be cast to the type of field(p, ^field).

Or alternatively they could wrap field(p, ^field) inside of type/2 but then it's kind of the asme problem where they have to dynamically find the type they want given a string field name.

@josevalim
Copy link
Member

josevalim commented Feb 28, 2024

It really depends which problem we want to solve. Do we want to make it easier for people receiving parameters to create custom queries or do we want to make it possible for you to dynamically query databases without generating tons of atoms?

@greg-rychlewski
Copy link
Member Author

I would say this is the bigger issue right now:

make it possible for you to dynamically query databases without generating tons of atoms

In which case I think you are saying this is not an issue currently for schemas. Because if you are using schemas then you know up front which strings are allowed and can handle it without this change?

If my above statement is not mistaken, then I think even though it's a bigger problem, it might be a bit confusing why we are not solving this problem at the same time if it's not too hard to do so:

make it easier for people receiving parameters to create custom queries

@greg-rychlewski
Copy link
Member Author

I think I get it though. Basically it's about staying consistent using atoms when referring to schema fields right? If so it's ok to me. I could change the PR to raise if string is used for schemas.

@josevalim
Copy link
Member

The problem I think is that we have two acceptable semantics. One is to make it a direct field source access and the other is to make it automatically cast. When we pick one, we exclude the other. For schemaless fields they are the same, however.

@josevalim
Copy link
Member

In which case I think you are saying this is not an issue currently for schemas. Because if you are using schemas then you know up front which strings are allowed and can handle it without this change?

Yes, you could do String.to_existing_atom, althogh we can argue that we could have a more convenient API. The question is if it should be field or something else.

@josevalim
Copy link
Member

For example, just for brainstorming purposes, it could be called field! or cast_field or something. Given the field may not exist and the query my fail.

@greg-rychlewski
Copy link
Member Author

The question is if it should be field or something else.

One reason I can think to keep it as field/2 is because I'm seeing other areas that users might want opened up. For example, if someone has a need for field/2 with string then probably someone would also have a need for map/2 with a string list.

If we go down the route of making a new function for each one it might be cumbersome for us to maintain and for users to use. So it might be worthwhile to lax the definition to include stings in the existing functions and make them have the same semantics as atoms.

@josevalim
Copy link
Member

I think we can go ahead with this one? Perhaps we start by simply allowing strings on field/2 which is then sent as is to the adapter?

@Schultzer
Copy link
Contributor

I think we can go ahead with this one? Perhaps we start by simply allowing strings on field/2 which is then sent as is to the adapter?

I think this makes sense, the columns received from the database is strings as well.

@greg-rychlewski
Copy link
Member Author

greg-rychlewski commented Dec 23, 2024

I forgot a lot about this but please let me know if my understanding below is correct.

When someone uses a string in field/2 we pass it as is to the adapter. That's not to say we pass the entire AST of field/2, but that the field name itself does not do anything like this:

  1. Does not get changed to an atom (if using a schema)
  2. Does not get changed to the value of :source (if using a schema)
  3. Does not have any type associated with it. That means the auto-casting in situations like this where: field(p, ^field) == ^"some_string" will consider the left side to have type :any. A consequence of this is that query parameters might not be casted to the right type unless special care is taken.

If ^ is correct then this PR is good to review. I will just need to update the docs for field/2 to especially callout (3) above. It will also require this ecto_sql PR to work: https://github.com/elixir-ecto/ecto_sql/pull/596/files

lib/ecto/query/builder.ex Outdated Show resolved Hide resolved
test/ecto/query/builder_test.exs Show resolved Hide resolved
@greg-rychlewski
Copy link
Member Author

It feels kind of weird to lose all the type behaviour and other things when querying a schema. Maybe raising if used with schema is better?

@Schultzer
Copy link
Contributor

It feels kind of weird to lose all the type behaviour and other things when querying a schema. Maybe raising if used with schema is better?

Why would we lose type behaviour, would it be to expensive to convert the atoms to strings, or even keep them around, a-kind to how Ecto Enum works?

@greg-rychlewski
Copy link
Member Author

My understanding is the debate is between these two choices:

  1. Keep strings as strings and treat them like the field source. They are essentially bypassing everything done to schema fields in the planner like transforming to :source, checking for virtual fields, type behaviour, etc.
  2. Normalize strings to atoms when dealing with schemas and don't let them bypass the schema field behaviour. Keep it as (1) for schemaless queries.

I understand Jose's last response to mean he's in favour of (1). So if you keep the type behaviour but not other things then the semantics are not consistent.

@josevalim
Copy link
Member

Agreed. I prefer 1. We should not try to guess the user intent and this way they also behave consistently. For example, you can imagine that a string field could be used to access a source (or field_source) of a schema directly.

@greg-rychlewski
Copy link
Member Author

@josevalim That makes sense to me. I have updated the docs so this PR and the Ecto SQL one are both ready for review whenever you have the time.

As a side discussion:

I think the use case getting the biggest benefit from this change is when you are dynamically managing a lot of schemaless queries at run time. i.e. there's not a tonne of structure to your queries that you know up front.

There is probably another use case, which is you have a lot of structure up front but the query is being dictated by outside input. It would probably be good to have a way for users to intentionally try to use an outside string as a schema field.

If you also think the above is a valid use case, would you be open to a follow-up PR or would you rather see how people use the change in this PR and what kind of feedback they give before going down that route?

@josevalim
Copy link
Member

I would say that those later use cases are not blocked by this PR, right? They can use String.to_existing_atom and the existing reflection API to convert fields from strings to atoms. Perhaps we can add functionality to make their lives easier? Make a function like Ecto.cast_fields(schema, list_of_fields) or similar?

Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful 😍

@greg-rychlewski greg-rychlewski merged commit 4990abd into elixir-ecto:master Dec 25, 2024
6 checks passed
@greg-rychlewski greg-rychlewski deleted the string_field_name branch December 25, 2024 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants