Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

columns are hard coded. #27

Open
srinivas-gurani opened this issue Aug 21, 2024 · 3 comments
Open

columns are hard coded. #27

srinivas-gurani opened this issue Aug 21, 2024 · 3 comments

Comments

@srinivas-gurani
Copy link

https://github.com/timescale/python-vector/blob/34e51abff2f401f0e8aea71d5537b193a6fe34cb/timescale_vector/client.py#L768

    query = '''
    SELECT
        id, metadata, contents, embedding, {distance} as distance
    FROM
       {table_name}
    WHERE 
       {where}
    {order_by_clause}
    LIMIT {limit}
    '''.format(distance=distance, order_by_clause=order_by_clause, where=where, table_name=self.table_name, limit=limit)
    return (query, params)

Can we select which column we need? or pass different column names?

@cevian
Copy link
Collaborator

cevian commented Aug 21, 2024

@srinivas-gurani what's the use-case here? Are all the same columns present and just have different names or is there a different schema altogether?

The library creates the tables as well and assumes a certain layout of the tables. It's easy to make the names configurable but I am wondering if you actually need some deeper changes.

@lucasgadams
Copy link

@cevian I agree here with @srinivas-gurani, it would be better if these things were a lot more customizable. This library assuming a layout of the tables it operates on honestly makes it pretty unusable for many real world scenarios. It would be great if the different parts were more modularized so people could fit them into their existing applications.

@lucasgadams
Copy link

lucasgadams commented Oct 4, 2024

I think this library overall is taking a much too high level approach that while might make it easy for some people, hurts its usability in a production dev environment. I think overall it would be nice if it provided a more isolated minimal and clean interface for the relevant things for vectorscale, which is mainly around creating the specific indexes and creating queries with some tuning nobs for searching. It shouldn't include things like adding where filters, predicates, table schemas, ect. I think a nice library to look to is python pgvector, which provides just the basics and additionally some nice interfaces for the common python DB libraries (sqlalchemy, asyncpg, ect).

I'd like to use this library mainly so I know what the different parameters are of the indexes, and make sure I know if any of those change or there are enhancements, but I think instead I will just copy and paste some code because unfortunately psycopg2 is a pain in the ass to install on a mac and I would not like to introduce that dependency. Our current setup uses pgvector asyncpg and sqlalchemy and have had success with that.

I do think the code around automatically adding new embeddings and general management of embedding tables is nice, but if it is built upon the assumption that we are structuring our tables in a specific way it is not going to be useful. Just my 2 cents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants