Skip to content

Commit

Permalink
Fix broken links and mistakes (#426)
Browse files Browse the repository at this point in the history
  • Loading branch information
NataliaIvakina authored Apr 8, 2022
1 parent e8fc5ce commit 143fca0
Show file tree
Hide file tree
Showing 4 changed files with 75 additions and 71 deletions.
15 changes: 6 additions & 9 deletions doc/docs/modules/ROOT/pages/gds.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,17 @@ GDS algorithms are bucketed into five groups:
== GDS operates via Cypher

All of the link:{url-neo4j-gds-manual}[functionality of GDS] is used by issuing Cypher queries. As such, it is easily
All of the link:{url-neo4j-gds-manual}[functionality of GDS^] is used by issuing Cypher queries. As such, it is easily
accessible via Spark, because the Neo4j Connector for Apache Spark can issue Cypher queries and read their results back. This combination means
that you can use Neo4j and GDS as a graph co-processor in an existing ML workflow that you may implement in Apache Spark.

== Example

In the link:{url-gh-spark-notebooks}[sample Zeppelin Notebook repository], there is a GDS example that can be run against
a Neo4j Sandbox, showing how to use the two together.
In the link:{url-gh-spark-notebooks}[sample Zeppelin Notebook repository^], there is a GDS example that can be run against a Neo4j Sandbox, showing how to use the two together.

=== Create a virtual graph in GDS using Spark

This is very simple, straightforward code; it constructs the right Cypher statement to link:https://neo4j.com/docs/graph-data-science/current/common-usage/creating-graphs/[create a virtual graph in GDS] and returns the results.
This is very simple, straightforward code; it constructs the right Cypher statement to link:https://neo4j.com/docs/graph-data-science/current/common-usage/projecting-graphs/[create a virtual graph in GDS^] and returns the results.

[source,python]
----
Expand All @@ -50,13 +49,11 @@ df = spark.read.format("org.neo4j.spark.DataSource") \
.load()
----

[NOTE]
If you get a `A graph with name [name] already exists` error, take a look at this xref:faq.adoc#graph-already-exists[FAQ].
[TIP]
If you get a _A graph with name [name] already exists_ error, take a look at this xref:faq.adoc#graph-already-exists[FAQ].

[NOTE]
**Ensure that option `partitions` is set to 1. You do not want to execute this query in parallel, it should be executed only once.**

[NOTE]
**When you use stored procedures, you must include a `RETURN` clause.**

=== Run a GDS analysis and stream the results back
Expand Down Expand Up @@ -90,7 +87,7 @@ df.show()

=== Streaming versus persisting GDS results

When link:https://neo4j.com/docs/graph-data-science/current/common-usage/running-algos/[running GDS algorithms], the library gives you the choice
When link:https://neo4j.com/docs/graph-data-science/current/common-usage/running-algos/[running GDS algorithms^], the library gives you the choice
of either streaming the algorithm results back to the caller, or mutating the underlying graph. Using GDS together with Spark provides an
additional option of transforming or otherwise using a GDS result. Ultimately, either modality works with the Neo4j Connector for Apache
Spark, and you choose what's best for your use case.
Expand Down
30 changes: 15 additions & 15 deletions doc/docs/modules/ROOT/pages/overview.adoc
Original file line number Diff line number Diff line change
@@ -1,31 +1,30 @@

= Project Overview
= Project overview

:description: This chapter provides an introduction to the Neo4j Connector for Apache Spark.
:description: This chapter provides an introduction to the Neo4j Connector for Apache Spark.

== Overview
The Neo4j Connector for Apache Spark is intended to make integrating graphs with Spark easy.

The Neo4j Connector for Apache Spark is intended to make integrating graphs together with Spark easy. There are effectively two ways of using the connector:
There are effectively two ways of using the connector:

- **As a data source**: read any set of nodes or relationships as a DataFrame in Spark.
- **As a sink**: write any DataFrame to Neo4j as a collection of nodes or relationships, or alternatively; use a
Cypher statement to process records in a DataFrame into the graph pattern of your choice.
- **As a data source**: you can read any set of nodes or relationships as a DataFrame in Spark.
- **As a sink**: you can write any DataFrame to Neo4j as a collection of nodes or relationships or use a Cypher statement to process records in a DataFrame into the graph pattern of your choice.

== Multiple languages support

Because the connector is based on the new Spark DataSource API, other Spark interpreters for languages such as Python and R work.

The API remains the same, and mostly only slight syntax changes are necessary to accomodate the differences between (for example) Python
The API remains the same, and mostly only slight syntax changes are necessary to accommodate the differences between (for example) Python
and Scala.

== Compatibility

=== Neo4j compatibility
This connector works with Neo4j 3.5, and the entire 4.x series of Neo4j, whether run as a single instance,
in Causal Cluster mode, or run as a managed service in Neo4j AuraDB. The connector does not rely on enterprise features, and as
such works with Neo4j Community as well, with the appropriate version number.
This connector works with Neo4j 3.5 and the entire 4.x series of Neo4j, whether run as a single instance,
in Causal Cluster mode, or run as a managed service in Neo4j AuraDB. The connector does not rely on Enterprise Edition features and as
such works with Neo4j Community Edition as well, with the appropriate version number.

[NOTE]
[TIP]
**Neo4j versions prior to 3.5 are not supported.**

=== Spark and Scala compatibility
Expand All @@ -36,7 +35,8 @@ This connector currently supports:
- Spark 3.0+ with Scala 2.12.

Depending on the combination of Spark and Scala versions you need a different JAR.
JARs are named in the form `neo4j-connector-apache-spark_${scala.version}_${connector.version}_for_${spark.version}`
JARs are named in the form:
`neo4j-connector-apache-spark_${scala.version}_${connector.version}_for_${spark.version}`

Ensure that you have the appropriate JAR file for your environment.
Here's a compatibility table to help you choose the correct JAR.
Expand Down Expand Up @@ -67,6 +67,6 @@ This connector is provided under the terms of the Apache 2.0 license, which can

== Support

For Neo4j Enterprise and Neo4j AuraDB customers, official releases of this connector are supported under the terms of your existing Neo4j support agreement. This support extends only to regular releases, and excludes
alpha, beta, and pre-releases. If you have any questions about the support policy, please get in touch with
For Neo4j Enterprise and Neo4j AuraDB customers, official releases of this connector are supported under the terms of your existing Neo4j support agreement. This support extends only to regular releases and excludes
alpha, beta, and pre-releases. If you have any questions about the support policy, get in touch with
Neo4j.
75 changes: 41 additions & 34 deletions doc/docs/modules/ROOT/pages/reading.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@

:description: The chapter explains how to read data from a Neo4j database.

Neo4j Connector for Apache Spark allows you to read data from a Neo4j innstance in 3 different ways:
Neo4j Connector for Apache Spark allows you to read data from a Neo4j instance in three different ways:

* By node labels.
* By relationship type.
* By Cypher query.
* By node labels
* By relationship type
* By Cypher query

== Getting started

Expand Down Expand Up @@ -45,8 +45,8 @@ spark.read.format("org.neo4j.spark.DataSource")
|Yes^*^

|`labels`
|List of node labels separated by `:`.
The first label is to be the primary label
|List of node labels separated by colon.
The first label is to be the primary label.
|_(none)_
|Yes^*^

Expand All @@ -56,24 +56,24 @@ The first label is to be the primary label
|Yes^*^

|`schema.flatten.limit`
|Number of records to be used to create the Schema (only if APOC is not installed)
|Number of records to be used to create the Schema (only if APOC is not installed).
|`10`
|No

|`schema.strategy`
|Strategy used by the connector in order to compute the Schema definition for the Dataset.
Possibile values are `string`, `sample`.
Possible values are `string`, `sample`.
When `string` is set, it coerces all the properties to String, otherwise it tries to sample the Neo4j's dataset.
|`sample`
|No

|`pushdown.filters.enabled`
|Enable or disable the PushdownFilters support
|Enable or disable the PushdownFilters support.
|`true`
|No

|`pushdown.columns.enabled`
|Enable or disable the PushdownColumn support
|Enable or disable the PushdownColumn support.
|`true`
|No

Expand Down Expand Up @@ -111,12 +111,12 @@ every single node property as column prefixed by `source` or `target`
|No

|`relationship.source.labels`
|List of source node labels separated by `:`
|List of source node labels separated by colon.
|_(empty)_
|Yes

|`relationship.target.labels`
|List of target node labels separated by `:`
|List of target node labels separated by colon.
|_(empty)_
|Yes

Expand All @@ -126,7 +126,7 @@ every single node property as column prefixed by `source` or `target`

== Read data

Reading data from a Neo4j Database can be done in 3 ways:
Reading data from a Neo4j Database can be done in three ways:

* <<read-query,Custom Cypher query>>
* <<read-node,Node>>
Expand All @@ -145,7 +145,7 @@ val spark = SparkSession.builder().getOrCreate()
spark.read.format("org.neo4j.spark.DataSource")
.option("url", "bolt://localhost:7687")
.option("query", "MATCH (n:Person) WITH n LIMIT 2 RETURN id(n) as id, n.name as name")
.option("query", "MATCH (n:Person) WITH n LIMIT 2 RETURN id(n) AS id, n.name AS name")
.load()
.show()
----
Expand All @@ -158,14 +158,18 @@ spark.read.format("org.neo4j.spark.DataSource")
|1|Jane Doe
|===

[NOTE]
We recommend individual property fields to be returned, rather than returning graph entity (node, relationship, and path) types.
This best maps to Spark's type system and yields the best results.
So instead of writing:
`MATCH (p:Person) RETURN p`
write the following:
`MATCH (p:Person) RETURN id(p) as id, p.name as name`.
[TIP]
====
We recommend individual property fields to be returned, rather than returning graph entity (node, relationship, and path) types. This best maps to Spark's type system and yields the best results. So instead of writing:
`MATCH (p:Person) RETURN p`
write the following:
`MATCH (p:Person) RETURN id(p) AS id, p.name AS name`.
If your query returns a graph entity, use the `labels` or `relationship` modes instead.
====

The structure of the Dataset returned by the query is influenced by the query itself.
In this particular context, it could happen that the connector isn't able to sample the Schema from the query,
Expand Down Expand Up @@ -216,7 +220,7 @@ This does not cause any problems since you have no data in your dataset.
For example, you have this query:
[source]
----
MATCH (n:NON_EXISTENT_LABEL) RETURN id(n) as id, n.name, n.age
MATCH (n:NON_EXISTENT_LABEL) RETURN id(n) AS id, n.name, n.age
----

The created schema is the following:
Expand All @@ -229,39 +233,42 @@ The created schema is the following:
|n.age|String
|===

[NOTE]
[TIP]
====
The returned column order is not guaranteed to match the RETURN statement for Neo4j 3.x and Neo4j 4.0.
Starting from Neo4j 4.1 the order is the same.
====

[[limit-query]]
==== Limit the results

This connector does not permit using `SKIP` or `LIMIT` at the end of a Cypher query.
Attempts to do this result in errors, such as the message:
"SKIP/LIMIT are not allowed at the end of the query".
This connector does not permit using `SKIP` or `LIMIT` at the end of a Cypher query. +
Attempts to do this result in errors, such as the message: +
_SKIP/LIMIT are not allowed at the end of the query_.

This is not supported, because internally the connector uses SKIP/LIMIT pagination to break read sets up into multiple partitions to support partitioned reads.
As a result, user-provided SKIP/LIMIT clashes with what the connector itself adds to your query to support parallelism.

There is a work-around though; you can still accomplish the same by using `SKIP / LIMIT` internal inside of the query, rather than after the final `RETURN` block of the query.

Here's a simple example.
Here's an example.
This first query is rejected and fails:

[source,cypher]
----
MATCH (p:Person)
RETURN p.name as name
RETURN p.name AS name
ORDER BY name
LIMIT 10
----

However this query can be reformulated and works:
However, you can reformulate this query to make it works:

[source,cypher]
----
MATCH (p:Person)
WITH p.name as name
WITH p.name AS name
ORDER BY name
LIMIT 10
RETURN p.name
Expand Down Expand Up @@ -303,7 +310,7 @@ spark.read.format("org.neo4j.spark.DataSource")
----

[NOTE]
Label list can be specified both with starting colon or without it:
Label list can be specified both with starting colon or without it: +
`Person:Customer` and `:Person:Customer` are considered the same thing.

==== Columns
Expand Down Expand Up @@ -391,7 +398,7 @@ The result format can be controlled by the `relationship.nodes.map` option (defa
When it is set to `false`, source and target nodes properties are returned in separate columns
prefixed with `source.` or `target.` (i.e., `source.name`, `target.price`).

When it is set to `true`, the source and target nodes properties are returned as Map[String, String] in two columns named `source`and `target`.
When it is set to `true`, the source and target nodes properties are returned as Map[String, String] in two columns named `source` and `target`.

[[rel-schema-no-map]]
.Nodes map set to `false`
Expand Down Expand Up @@ -518,7 +525,7 @@ Use the correct prefix:
If `relationship.nodes.map` is set to `false`:

* ``\`source.[property]` `` for the source node properties.
* ``\`rel.[property]` `` for the relation property.
* ``\`rel.[property]` `` for the relationship property.
* ``\`target.[property]` `` for the target node property.

[source,scala]
Expand All @@ -541,7 +548,7 @@ df.where("`source.id` = 14 AND `target.id` = 16")
If `relationship.nodes.map` is set to `true`:

* ``\`<source>`.\`[property]` `` for the source node map properties.
* ``\`<rel>`.\`[property]` `` for the relation map property.
* ``\`<rel>`.\`[property]` `` for the relationship map property.
* ``\`<target>`.\`[property]` `` for the target node map property.

In this case, all the map values are to be strings, so the filter value must be a string too.
Expand Down
Loading

0 comments on commit 143fca0

Please sign in to comment.