Skip to content

Commit

Permalink
Merge branch 'main' into willtai/sphinx-documentation-pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
willtai authored Apr 29, 2024
2 parents ca14349 + 6a87979 commit 4bcd425
Show file tree
Hide file tree
Showing 20 changed files with 1,192 additions and 67 deletions.
32 changes: 32 additions & 0 deletions .github/workflows/cla-check.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: "CLA Check"

on:
pull_request_target:
branches:
- main

jobs:
cla-check:
if: github.event.pull_request.user.login != 'renovate[bot]'

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4
with:
repository: neo-technology/whitelist-check
token: ${{ secrets.CLA_CHECK_TOKEN }}
- uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
with:
python-version: 3
- name: Install dependencies
run: |
python -m pip install --upgrade pip
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- run: |
owner=$(echo "$GITHUB_REPOSITORY" | cut -d/ -f1)
repository=$(echo "$GITHUB_REPOSITORY" | cut -d/ -f2)
./bin/examine-pull-request "$owner" "$repository" "${{ secrets.CLA_CHECK_TOKEN }}" "$PULL_REQUEST_NUMBER" cla-database.csv
env:
PULL_REQUEST_NUMBER: ${{ github.event.number }}
5 changes: 5 additions & 0 deletions .snyk
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Snyk (https://snyk.io) policy file

exclude:
code:
- tests/**
69 changes: 69 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Contributing to the Neo4j Ecosystem

At [Neo4j](https://neo4j.com/), we develop our software in the open at GitHub.
This provides transparency for you, our users, and allows you to fork the software to make your own additions and enhancements.
We also provide areas specifically for community contributions, in particular the [neo4j-contrib](https://github.com/neo4j-contrib) space.

There's an active [Neo4j Online Community](https://community.neo4j.com/) where we work directly with the community.
If you're not already a member, sign up!

We love our community and wouldn't be where we are without you.


## Need to raise an issue?

Where you raise an issue depends largely on the nature of the problem.

Firstly, if you are an Enterprise customer, you might want to head over to our [Customer Support Portal](https://support.neo4j.com/).

There are plenty of public channels available too, though.
If you simply want to get started or have a question on how to use a particular feature, ask a question in [Neo4j Online Community](https://community.neo4j.com/).
If you think you might have hit a bug in our software (it happens occasionally!) or you have specific feature request then use the issue feature on the relevant GitHub repository.
Check first though as someone else may have already raised something similar.

[StackOverflow](https://stackoverflow.com/questions/tagged/neo4j) also hosts a ton of questions and might already have a discussion around your problem.
Make sure you have a look there too.

Include as much information as you can in any request you make:

- Which versions of our products are you using?
- Which language (and which version of that language) are you developing with?
- What operating system are you on?
- Are you working with a cluster or on a single machine?
- What code are you running?
- What errors are you seeing?
- What solutions have you tried already?


## Want to contribute?

If you want to contribute a pull request, we have a little bit of process you'll need to follow:

- Do all your work in a personal fork of the original repository
- [Rebase](https://github.com/edx/edx-platform/wiki/How-to-Rebase-a-Pull-Request), don't merge (we prefer to keep our history clean)
- Create a branch (with a useful name) for your contribution
- Make sure you're familiar with the appropriate coding style (this varies by language so ask if you're in doubt)
- Include unit tests if appropriate (obviously not necessary for documentation changes)
- Take a moment to read and sign our [CLA](https://neo4j.com/developer/cla)

We can't guarantee that we'll accept pull requests and may ask you to make some changes before they go in.
Occasionally, we might also have logistical, commercial, or legal reasons why we can't accept your work but we'll try to find an alternative way for you to contribute in that case.
Remember that many community members have become regular contributors and some are now even Neo employees!


## Specifically for this project
Setting up the development environment:

1. Install Python 3.9.1+
2. Install poetry (see https://python-poetry.org/docs/#installation)
3. Install dependencies:

```shell
poetry install
```

4. Install the pre-commit hook, that will do some code-format-checking everytime you commit.

```shell
pre-commit install
```
2 changes: 1 addition & 1 deletion LICENSE.PYTHON.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ analyze, test, perform and/or display publicly, prepare derivative works,
distribute, and otherwise use Python alone or in any derivative version,
provided, however, that PSF's License Agreement and PSF's notice of copyright,
i.e., "Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022 Python Software Foundation;
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024 Python Software Foundation;
All Rights Reserved" are retained in Python alone or in any derivative version
prepared by Licensee.

Expand Down
167 changes: 167 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,170 @@
# Neo4j GenAI package for Python

This repository contains the official Neo4j GenAI features for Python.

The purpose of this package is to provide a first party package to developers,
where Neo4j can guarantee long term commitment and maintenance as well as being
fast to ship new features and high performing patterns and methods.

Docs are coming soon!

# Usage

## Installation

This package requires Python (>=3.8.1).

To install the latest stable version, use:

```shell
pip install neo4j-genai
```

## Examples

While the library has more retrievers than shown here, the following examples should be able to get you started.

### Performing a similarity search

Assumption: Neo4j running with populated vector index in place.

```python
from neo4j import GraphDatabase
from neo4j_genai import VectorRetriever

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

INDEX_NAME = "embedding-name"

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Initialize the retriever
retriever = VectorRetriever(driver, INDEX_NAME)

# Run the similarity search
query_text = "How do I do similarity search in Neo4j?"
response = retriever.search(query_text=query_text, top_k=5)
```

### Creating a vector index

When creating a vector index, make sure you match the number of dimensions in the index with the number of dimensions the embeddings have.

Assumption: Neo4j running

```python
from neo4j import GraphDatabase
from neo4j_genai.indexes import create_vector_index

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

INDEX_NAME = "chunk-index"

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Creating the index
create_vector_index(
driver,
INDEX_NAME,
label="Document",
property="textProperty",
dimensions=1536,
similarity_fn="euclidean",
)

```

### Populating the Neo4j Vector Index

This library does not write to the database, that is up to you.
See below for how to write using Cypher via the Neo4j driver.

Assumption: Neo4j running with a defined vector index

```python
from neo4j import GraphDatabase
from random import random

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Upsert the vector
vector = [random() for _ in range(DIMENSION)]
insert_query = (
"MERGE (n:Document {id: $id})"
"WITH n "
"CALL db.create.setNodeVectorProperty(n, 'textProperty', $vector)"
"RETURN n"
)
parameters = {
"id": 0,
"vector": vector,
}
driver.execute_query(insert_query, parameters)
```

# Development

## Install dependencies

```bash
poetry install
```

## Getting started

### Issues

If you have a bug to report or feature to request, first
[search to see if an issue already exists](https://docs.github.com/en/github/searching-for-information-on-github/searching-on-github/searching-issues-and-pull-requests#search-by-the-title-body-or-comments).
If a related issue doesn't exist, please raise a new issue using the relevant
[issue form](https://github.com/neo4j/neo4j-genai-python/issues/new/choose).

If you're a Neo4j Enterprise customer, you can also reach out to [Customer Support](http://support.neo4j.com/).

If you don't have a bug to report or feature request, but you need a hand with
the library; community support is available via [Neo4j Online Community](https://community.neo4j.com/)
and/or [Discord](https://discord.gg/neo4j).

### Make changes

1. Fork the repository.
2. Install Python and Poetry. For more information, see [the development guide](./docs/contributing/DEVELOPING.md).
3. Create a working branch from `main` and start with your changes!

### Pull request

When you're finished with your changes, create a pull request, also known as a PR.

- Ensure that you have [signed the CLA](https://neo4j.com/developer/contributing-code/#sign-cla).
- Ensure that the base of your PR is set to `main`.
- Don't forget to [link your PR to an issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)
if you are solving one.
- Enable the checkbox to [allow maintainer edits](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
so that maintainers can make any necessary tweaks and update your branch for merge.
- Reviewers may ask for changes to be made before a PR can be merged, either using
[suggested changes](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/incorporating-feedback-in-your-pull-request)
or normal pull request comments. You can apply suggested changes directly through
the UI, and any other changes can be made in your fork and committed to the PR branch.
- As you update your PR and apply changes, mark each conversation as [resolved](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/commenting-on-a-pull-request#resolving-conversations).

## Run tests

Open a new virtual environment and then run the tests.

```bash
poetry shell
pytest
```

## Further information

- [The official Neo4j Python driver](https://github.com/neo4j/neo4j-python-driver)
- [Neo4j GenAI integrations](https://neo4j.com/docs/cypher-manual/current/genai-integrations/)
59 changes: 59 additions & 0 deletions examples/hybrid_search.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
from neo4j import GraphDatabase

from random import random
from neo4j_genai.embedder import Embedder
from neo4j_genai.indexes import create_vector_index, create_fulltext_index
from neo4j_genai.retrievers import HybridRetriever

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

INDEX_NAME = "embedding-name"
FULLTEXT_INDEX_NAME = "fulltext-index-name"
DIMENSION = 1536

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)


# Create Embedder object
class CustomEmbedder(Embedder):
def embed_query(self, text: str) -> list[float]:
return [random() for _ in range(DIMENSION)]


embedder = CustomEmbedder()

# Creating the index
create_vector_index(
driver,
INDEX_NAME,
label="Document",
property="propertyKey",
dimensions=DIMENSION,
similarity_fn="euclidean",
)
create_fulltext_index(
driver, FULLTEXT_INDEX_NAME, label="Document", node_properties=["propertyKey"]
)

# Initialize the retriever
retriever = HybridRetriever(driver, INDEX_NAME, FULLTEXT_INDEX_NAME, embedder)

# Upsert the query
vector = [random() for _ in range(DIMENSION)]
insert_query = (
"MERGE (n:Document {id: $id})"
"WITH n "
"CALL db.create.setNodeVectorProperty(n, 'propertyKey', $vector)"
"RETURN n"
)
parameters = {
"id": 0,
"vector": vector,
}
driver.execute_query(insert_query, parameters)

# Perform the similarity search for a text query
query_text = "Who are the fremen?"
print(retriever.search(query_text=query_text, top_k=5))
4 changes: 3 additions & 1 deletion examples/openai_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,15 @@

# Upsert the query
vector = [random() for _ in range(DIMENSION)]

insert_query = (
"MERGE (n:Document)"
"MERGE (n:Document {id: $id})"
"WITH n "
"CALL db.create.setNodeVectorProperty(n, 'propertyKey', $vector)"
"RETURN n"
)
parameters = {
"id": 0,
"vector": vector,
}
driver.execute_query(insert_query, parameters)
Expand Down
8 changes: 4 additions & 4 deletions examples/similarity_search_for_text.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
from typing import List
from neo4j import GraphDatabase
from neo4j_genai import VectorRetriever

Expand All @@ -16,9 +15,9 @@
driver = GraphDatabase.driver(URI, auth=AUTH)


# Create Embedder object
# Create CustomEmbedder object with the required Embedder type
class CustomEmbedder(Embedder):
def embed_query(self, text: str) -> List[float]:
def embed_query(self, text: str) -> list[float]:
return [random() for _ in range(DIMENSION)]


Expand All @@ -40,12 +39,13 @@ def embed_query(self, text: str) -> List[float]:
# Upsert the query
vector = [random() for _ in range(DIMENSION)]
insert_query = (
"MERGE (n:Document)"
"MERGE (n:Document {id: $id})"
"WITH n "
"CALL db.create.setNodeVectorProperty(n, 'propertyKey', $vector)"
"RETURN n"
)
parameters = {
"id": 0,
"vector": vector,
}
driver.execute_query(insert_query, parameters)
Expand Down
Loading

0 comments on commit 4bcd425

Please sign in to comment.