Fix and check links with hyperlink (#271)

* Fix and check links with hyperlink * Add CI jobs * Trigger if documentation changed * Build on fork * Remove temporary build on fork repo * Remove comment * Remove console.log
neo4j · Feb 17, 2021 · 0f58d2c · 0f58d2c
1 parent f1ccd65
commit 0f58d2c
Show file tree

Hide file tree

Showing 12 changed files with 143 additions and 23 deletions.
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,28 @@
+name: Docs
+
+on:
+  push:
+    branches:
+      - '4.0'
+      - 'master'
+  pull_request:
+    branches:
+      - '*'
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v2
+
+      - name: Use Node.js 14
+        uses: actions/setup-node@v1
+        with:
+          node-version: '14'
+      - run: npm install
+        working-directory: 'doc'
+      - run: npm run build:docs
+        working-directory: 'doc'
+      - run: npm run lint:links
+        working-directory: 'doc'
diff --git a/.github/workflows/notify.yml b/.github/workflows/notify.yml
@@ -0,0 +1,20 @@
+name: Trigger Publish
+
+on:
+  push:
+    paths:
+      - 'doc/docs'
+    branches:
+      - '4.0'
+
+jobs:
+  trigger_publish:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Trigger Developer Event
+        uses: peter-evans/repository-dispatch@master
+        with:
+          token: ${{ secrets.BUILD_ACCESS_TOKEN }}
+          repository: neo4j-documentation/docs-refresh
+          event-type: spark-connector
diff --git a/doc/docs.yml b/doc/docs.yml
@@ -1,16 +1,27 @@
 site:
   title: Neo4j Connector for Apache Spark User Guide
-  url: /neo4j-spark-docs
+
 content:
   sources:
     - url: ../
       branches: HEAD
       start_path: doc/docs
+
+output:
+  dir: ./build/site/developer
+
 ui:
   bundle:
     url: https://s3-eu-west-1.amazonaws.com/static-content.neo4j.com/build/ui-bundle.zip
     snapshot: true
+
+urls:
+  html_extension_style: indexify
+
 asciidoc:
   attributes:
-    page-theme: docs
-    page-cdn: /_/
+    experimental: ''
+    page-cdn: /static/assets
+    page-theme: developer
+    page-canonical-root: /developer
+    page-disabletracking: true
diff --git a/doc/docs/antora.yml b/doc/docs/antora.yml
@@ -10,3 +10,6 @@ asciidoc:
     theme: docs
     connector-version: 4.0.0
     copyright: Neo4j Inc.
+    url-neo4j-product-gds-lib: https://neo4j.com/product/graph-data-science-library/
+    url-gh-spark-notebooks: https://github.com/utnaf/neo4j-connector-apache-spark-notebooks
+    url-neo4j-gds-manual: https://neo4j.com/docs/graph-data-science/current/
diff --git a/doc/docs/modules/ROOT/pages/architecture.adoc b/doc/docs/modules/ROOT/pages/architecture.adoc
@@ -168,7 +168,7 @@ MERGE (c)-[:BOUGHT { quantity: event.quantity }]->(p);
 ```
 
 Notice that in this case the entire job can be done by a single cypher statement.  As data frames get complex,
-these cypher statements too can get quite complex.  
+these cypher statements too can get quite complex.
 
 ==== Pros
 
@@ -248,7 +248,7 @@ available on the server**.
 It's impossible to pick a single batch size that works for everyone, because how much memory your transactions
 take up depends on the number of properties & relationships, and other factors.  A good general aggressive value
 to try is around 20,000 - but you can increase this number if your data is small, or if you have a lot of memory
-on the server.  Lower the number if it's a small database server, or the data your pushing has many large 
+on the server.  Lower the number if it's a small database server, or the data your pushing has many large
 properties.
 
 === Tune your Neo4j Memory Configuration
@@ -266,7 +266,7 @@ At the Neo4j Cypher level, it's very common to use the Spark connector in a way
 In Neo4j, this looks up a node by some "key" and then creates it only if it does not already exist.
 
 [NOTE]
-**It is strongly recommended to assert indexes or constraints on any graph property that you use as part of 
+**It is strongly recommended to assert indexes or constraints on any graph property that you use as part of
 `node.keys`, `relationship.source.node.keys`, `relationship.target.node.keys` or other similar key options**
 
 A common source of poor performance is to write Spark code that generates `MERGE` cypher, or otherwise tries
@@ -314,7 +314,7 @@ extreme cases with too much parallelism, Neo4j may reject the writes with lock c
 [NOTE]
 **You can use as many partitions as there are cores in the Neo4j server, if you have properly partitioned your data to avoid Neo4j locks**
 
-There is an exception to the "1 partition" rule above; if your data writes are partitioned ahead of time to avoid locks, you 
+There is an exception to the "1 partition" rule above; if your data writes are partitioned ahead of time to avoid locks, you
 can generally do as many write threads to Neo4j as there are cores in the server. Suppose we want to write a long list of `:Person` nodes, and we know they are distinct by the person `id`.  We might stream those into Neo4j in 4 different partitions, as there will not be any lock contention.
 
 == Schema Considerations

diff --git a/doc/docs/modules/ROOT/pages/faq.adoc b/doc/docs/modules/ROOT/pages/faq.adoc
@@ -3,9 +3,9 @@
 
 == How can I speed up writes to Neo4j?
 
-The Spark connector fundamentally writes data to Neo4j in batches.  Neo4j is a transactional 
+The Spark connector fundamentally writes data to Neo4j in batches.  Neo4j is a transactional
 database, and so all modifications are made within a transaction.  Those transactions in turn
-have overhead.  
+have overhead.
 
 The two simplest ways of increasing write performance are:
 * Increase the batch size (option `batch.size`). The larger the batch, the fewer transactions are executed to write all of your data, and the less transactional overhead is incurred.
@@ -35,7 +35,7 @@ environment operates in terms of DataFrames as it always did, and this connector
 
 == Can this connector be used for pre-processing of data and loading into Neo4j?
 
-Yes.  This connector enables spark to be used as a good method of loading data directly into Neo4j.  See link:architecture.adoc[the architecture section] for a detailed discussion of
+Yes.  This connector enables spark to be used as a good method of loading data directly into Neo4j.  See xref:architecture.adoc[the architecture section] for a detailed discussion of
 "Normalized Loading" vs. "Cypher Destructuring" and guidance on different approaches for how to do performant data loads into Neo4j.
 
 == My writes are failing due to Deadlock Exceptions
@@ -46,7 +46,7 @@ link:https://neo4j.com/developer/kb/explanation-of-error-deadlockdetectedexcepti
 
 Typically this is caused by too much parallelism in writing to Neo4j.  For example, when you
 write a relationship `(:A)-[:REL]->(:B)`, this creates a "lock" in the database on both nodes.
-If some simultaneous other thread is attempting to write to those nodes too often, deadlock 
+If some simultaneous other thread is attempting to write to those nodes too often, deadlock
 exceptions can result and a transaction will fail.
 
 In general, the solution is to repartition the dataframe prior to writing it to Neo4j, to avoid

diff --git a/doc/docs/modules/ROOT/pages/gds.adoc b/doc/docs/modules/ROOT/pages/gds.adoc
@@ -5,7 +5,7 @@
 This chapter provides an information on using the Neo4j Connector for Apache Spark with Neo4j's Graph Data Science Library.
 --
 
-link:https://neo4j.com/graph-data-science-library/[Neo4j's Graph Data Science (GDS) Library] lets data scientists benefit from powerful graph algorithms.  It provides unsupervised machine learning methods and heuristics that learn and describe the topology of your graph. The GDS Library includes hardened graph algorithms with enterprise features, like deterministic seeding for consistent results and reproducible machine learning workflows.
+link:{url-neo4j-product-gds-lib}[Neo4j's Graph Data Science (GDS) Library] lets data scientists benefit from powerful graph algorithms.  It provides unsupervised machine learning methods and heuristics that learn and describe the topology of your graph. The GDS Library includes hardened graph algorithms with enterprise features, like deterministic seeding for consistent results and reproducible machine learning workflows.
 
 GDS Algorithms are bucketed into 5 "families":
 
@@ -17,13 +17,13 @@ GDS Algorithms are bucketed into 5 "families":
 
 == GDS Operates via Cypher
 
-All of the link:https://neo4j.com/docs/graph-data-science/current/[functionality of GDS] is used by issuing cypher queries.  As such, it is easily
+All of the link:{url-neo4j-gds-manual}[functionality of GDS] is used by issuing cypher queries.  As such, it is easily
 accessible via Spark, because the Neo4j Connector for Apache Spark can issue Cypher queries and read their results back.  This combination means
 that you can use Neo4j & GDS as a graph co-processor in an existing ML workflow that you may implement in Apache Spark.
 
 == Example
 
-In the link:https://github.com/utnaf/spark-connector-notebooks[sample Zeppelin Notebook repository], there is a GDS example that can be run against
+In the link:{url-gh-spark-notebooks}[sample Zeppelin Notebook repository], there is a GDS example that can be run against
 a Neo4j Sandbox, showing how to use the two together.
 
 === Create a Virtual Graph in GDS Using Spark
@@ -69,7 +69,7 @@ To run an analysis, the result is just another Cypher query, executed as a spark
 %pyspark
 
 query = """
-    CALL gds.pageRank.stream('got-interactions') 
+    CALL gds.pageRank.stream('got-interactions')
     YIELD nodeId, score
     RETURN gds.util.asNode(nodeId).name AS name, score
 """
@@ -92,7 +92,7 @@ df.show()
 === Streaming versus Persisting GDS Results
 
 When link:https://neo4j.com/docs/graph-data-science/current/common-usage/running-algos/[running GDS algorithms] the library gives you the choice
-of either streaming the results of the algorithm back the caller, or mutating the underlying graph. Using GDS together with spark provides an 
+of either streaming the results of the algorithm back the caller, or mutating the underlying graph. Using GDS together with spark provides an
 additional option of transforming or otherwise using a GDS result.   Ultimately, either modality will work with the Neo4j Connector for Apache
 Spark, and it is left up to your option what's best for your use case.
 

diff --git a/doc/docs/modules/ROOT/pages/quickstart.adoc b/doc/docs/modules/ROOT/pages/quickstart.adoc
@@ -312,7 +312,7 @@ RETURN count(p) AS count
 
 === Examples
 
-You can find examples on how to use the Neo4j Connector for Apache Spark at link:https://github.com/utnaf/spark-connector-notebooks[this repository].
+You can find examples on how to use the Neo4j Connector for Apache Spark at link:{url-gh-spark-notebooks}[this repository].
 It's a collection of Zeppelin Notebooks with different usage scenarios, along with a getting started guide.
 
 The repository is in constant development, and feel free to submit your examples.
diff --git a/doc/docs/modules/ROOT/pages/reading.adoc b/doc/docs/modules/ROOT/pages/reading.adoc
@@ -32,7 +32,7 @@ spark.read.format("org.neo4j.spark.DataSource")
 
 .List of available read options
 |===
-|Setting Name |Description |Default Value |Required 
+|Setting Name |Description |Default Value |Required
 
 |`query`
 |Cypher query to read the data
@@ -161,7 +161,7 @@ If your query returns a graph entity please use the `labels` or `relationship` m
 
 The struct of the Dataset returned by the query is influenced by the query itself,
 in this particular context it could happen that the connector won't be able to sample the Schema from the query,
-in these cases we suggest trying with the option `schema.strategy` set to `string` as described <<bookmark-string-strategy,here>>.
+in these cases we suggest trying with the option `schema.strategy` set to `string` as described xref:quickstart.adoc#bookmark-string-strategy[here].
 
 [NOTE]
 Read query *must always* return some data (read: *must always* have a return statement).

diff --git a/doc/package.json b/doc/package.json
@@ -15,7 +15,8 @@
     "server": "forever start server.js",
     "start": "npm run server && npm-watch",
     "stop": "forever stop server.js",
-    "build:docs": "antora --fetch --stacktrace docs.yml"
+    "build:docs": "antora --fetch --stacktrace docs.yml",
+    "lint:links": "node tasks/lint-links.js"
   },
   "license": "ISC",
   "dependencies": {
@@ -25,7 +26,8 @@
   },
   "devDependencies": {
     "express": "^4.17.1",
-    "npm-watch": "^0.7.0",
-    "forever": "^3.0.2"
+    "forever": "^3.0.2",
+    "hyperlink": "^4.6.0",
+    "npm-watch": "^0.7.0"
   }
 }
diff --git a/doc/server.js b/doc/server.js
@@ -3,6 +3,8 @@ const express = require('express')
 const app = express()
 app.use(express.static('./build/site'))
 
-app.get('/', (req, res) => res.redirect('/spark'))
+app.use('/static/assets', express.static('./build/site/developer/_'))
+
+app.get('/', (req, res) => res.redirect('/developer/spark'))
 
 app.listen(8000, () => console.log('📘 http://localhost:8000'))
diff --git a/doc/tasks/lint-links.js b/doc/tasks/lint-links.js
@@ -0,0 +1,54 @@
+const path = require('path')
+const hyperlink = require('hyperlink')
+const TapRender = require('@munter/tap-render');
+
+const root = path.join(__dirname, '..')
+
+;(async () => {
+  const tapRender = new TapRender()
+  tapRender.pipe(process.stdout)
+  try {
+    const skipPatterns = [
+      // initial redirect
+      'load index.html',
+      'load docs/index.html',
+      // google fonts
+      'load https://fonts.googleapis.com/',
+      // static resources
+      'load static/assets',
+      // externals links
+      // /
+      'load try-neo4j',
+      // /developer
+      'load developer',
+      // /labs
+      'load labs',
+      // /docs
+      'load docs',
+      // rate limit on twitter.com (will return 400 code if quota exceeded)
+      'external-check https://twitter.com/neo4j',
+      // workaround: not sure why the following links are not resolved properly by hyperlink :/
+      'load build/site/developer/spark/quickstart/reading',
+      'load build/site/developer/spark/quickstart/writing'
+    ]
+    const skipFilter = (report) => {
+      return Object.values(report).some((value) => {
+          return skipPatterns.some((pattern) => String(value).includes(pattern))
+        }
+      )
+    };
+    await hyperlink({
+        root,
+        inputUrls: [`build/site/developer/spark/index.html`],
+        skipFilter: skipFilter,
+        recursive: true,
+      },
+      tapRender
+    )
+  } catch (err) {
+    console.log(err.stack);
+    process.exit(1);
+  }
+  const results = tapRender.close();
+  process.exit(results.fail ? 1 : 0);
+})()