Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose ogr_fdw_info functions via SQL #260

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rcoup
Copy link

@rcoup rcoup commented Aug 8, 2024

I've found it useful to be able to introspect OGR datasources as the PostgreSQL server can see it — SQL equivalents of ogr_fdw_info commands.

  1. Add a function to list available OGR layers from an FDW server:

    SELECT ogr_fdw_layers('myserver');
     ogr_fdw_layers
    ----------------
     Cities
     Countries
    
    (2 rows)
    
  2. Add a function to get the CREATE FOREIGN TABLE SQL for a particular OGR layer, the same as IMPORT FOREIGN SCHEMA uses. This can help for issues where OGR is reporting column types wrongly, or some columns aren't needed but it's mostly correct.

    ogr_fdw_table_sql(server_name, ogr_layer_name, table_name=NULL, launder_column_names=TRUE, launder_table_name=TRUE)

    SELECT ogr_fdw_table_sql('myserver', 'pt_two');
            ogr_fdw_table_sql
    ---------------------------------
    CREATE FOREIGN TABLE "pt_two" (
      fid integer,
      "geom" geometry(Point, 4326),
      "name" varchar,
      "age" integer,
      "height" real,
      "birthdate" date
    ) SERVER "myserver"
    OPTIONS (layer 'pt_two');
    

Not sure on preferences whether you'd prefer the functions to take OIDs (eg: SELECT ogr_fdw_layers(myserver);) instead of names; and suggestions also very welcome on improvements for naming or docs.

This returns the SQL for foreign table creation
for a given OGR layer. It accepts the server name,
layer name, table name, and options for laundering
table and column names.
Returns a list of layers in the specified FDW
server.
@pramsey
Copy link
Owner

pramsey commented Aug 8, 2024

I think this is useful but I'm +0 on the API.

  • Layer listing is good.
  • Need some way to play with connection strings to find one that works.
  • Pretty sure that Oid/Regclass is what we want if we're referencing a database object, otherwise for strangely named things you end up wrestling at cross purposes with escape sequences.
  • How crazy would just having ogr_fdw_info(server text, layer text default null) returns text exact analogues to CLI?
  • I feel like a missing key function is ogr_read(server text, layer text default null) returns setof(record) but you don't have to do that one, I'm just putting it here for completeness.

Design goal is basically to provide the affordances of the existing software, but removing the need for locally installed software, yes?

@rcoup
Copy link
Author

rcoup commented Aug 8, 2024

Design goal is basically to provide the affordances of the existing software, but removing the need for locally installed software, yes?

Yes, so you don't need ogr-fdw built+installed on both the client and the PG server. And also to fix random schema issues by getting the SQL since ogr_fdw_info doesn't support launder options.

I think this is useful but I'm +0 on the API.

👍 Happy to iterate.

  • Pretty sure that Oid/Regclass is what we want if we're referencing a database object, otherwise for strangely named things you end up wrestling at cross purposes with escape sequences.

Yeah, that's what I was leaning towards.

  • Need some way to play with connection strings to find one that works.
  • How crazy would just having ogr_fdw_info(server text, layer text default null) returns text exact analogues to CLI?

I feel like it might get confusingly overloaded?

ogr_fdw_info(datasource text) returns text; -- CREATE SERVER... 
ogr_fdw_info(server oid, layer text) returns text; -- CREATE FOREIGN TABLE...

How about:

ogr_fdw_info_server(ogr_datasource text, [...options]) returns text; -- CREATE SERVER...
ogr_fdw_info_table(server oid, layer text, [...options]) returns text; -- CREATE FOREIGN TABLE...

ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string
ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server
  • I feel like a missing key function is ogr_read(server text, layer text default null) returns setof(record) but you don't have to do that one, I'm just putting it here for completeness.

I guess there's some potential security questions here too if we're executing from ogr datasource strings? Since calling server-side functions is potentially a different class of permissions than doing CREATE SERVER/CREATE FOREIGN TABLE; and GDAL is making remote network requests.

@pramsey
Copy link
Owner

pramsey commented Aug 12, 2024

How about:

ogr_fdw_info_server(ogr_datasource text, [...options]) returns text; -- CREATE SERVER...
ogr_fdw_info_table(server oid, layer text, [...options]) returns text; -- CREATE FOREIGN TABLE...

ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string
ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server

This latter won't work so great I don't thing, as the text form will shadow the oid form (since most people access the oid type by feeding in a text string and letting the auto-cast do its magic.

  • I feel like a missing key function is ogr_read(server text, layer text default null) returns setof(record) but you don't have to do that one, I'm just putting it here for completeness.

I guess there's some potential security questions here too if we're executing from ogr datasource strings? Since calling server-side functions is potentially a different class of permissions than doing CREATE SERVER/CREATE FOREIGN TABLE; and GDAL is making remote network requests.

Yeah, I hadn't thought about the extent to which the FDW API was providing some free security cover. I mean, we could just make the ogr_read be a superuser-only function, people can grant security definer on it if they want to swim naked. (BTW, you don't have to write ogr_read, that's on my personal todo/wish list.

@rcoup
Copy link
Author

rcoup commented Aug 12, 2024

ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string
ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server

This latter won't work so great I don't thing, as the text form will shadow the oid form (since most people access the oid type by feeding in a text string and letting the auto-cast do its magic.

I kinda assumed it'd be the difference between:

SELECT ogr_fdw_info("myserver"); -- oid
SELECT ogr_fdw_info(myserver); -- oid
SELECT ogr_fdw_info('WFS:https://wfs.example.com/wfs'); -- text

But maybe there's some special TIL typing coercion that happens by default?

I mean, SELECT lower('foo') and SELECT lower("foo")/lower(foo) are fundamentally different, that's SQL.101?

(I'll accept "it's still too confusing, don't do it" 😁 )

I mean, we could just make the ogr_read be a superuser-only function

ok. Maybe ogr_fdw_info() & ogr_fdw_layers() too? They're still doing remote network requests.

@pramsey
Copy link
Owner

pramsey commented Aug 12, 2024

(I'll accept "it's still too confusing, don't do it" 😁 )

Nope I just have my head up my ass. Ignore me and procede.

@robe2
Copy link
Contributor

robe2 commented Sep 12, 2024

How about:

ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server

What is the expected output of ogr_fdw_info_layers? If it's simply a set of layer names, I think output being a

SETOF text

makes more sense.

@rcoup
Copy link
Author

rcoup commented Sep 12, 2024

I've been somewhat bogged down in other stuff, but I'll get back to finishing this soon hopefully :-)

If it's simply a set of layer names, I think output being a SETOF text makes more sense.

TIL SETOF text for returning multiple rows of a single column; my previous understanding was you needed SETOF record for anything returning multiple rows.

So, yes it does 👍

@robe2
Copy link
Contributor

robe2 commented Oct 3, 2024

SETOF text or SETOF anytype will return multiple records and will just have the function as the name of column if no alias is provided.

The main difference is SETOF record or RETURNS TABLE would allow multiple columns per row but you have to deal with that messy business of using RETURNS TABLE (defining the columns) or using OUT params, which is kinda pointless if you are only returning one column.

Take for example the postgis function ST_SubDivide, it returns a SETOF geometry and you use it like any other table function.

CREATE OR REPLACE FUNCTION st_subdivide(
	geom geometry,
	maxvertices integer DEFAULT 256,
	gridsize double precision DEFAULT '-1.0'::numeric)
    RETURNS SETOF geometry 
    LANGUAGE 'c'
    COST 5000
    IMMUTABLE STRICT PARALLEL SAFE 
    ROWS 1000

AS '$libdir/postgis-3', 'ST_Subdivide'
;

But that said, I don't think there is any performance penalty doing it your way, just that you have a slightly longer definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants