Arachnid

Arachnid is a simple service that allows your Javascript powered application to be fully indexed by SEO spiders.

You must configure your server to redirect traffic from search bots towards your Arachnid instance. Arachnid works by inspecting a custom HTTP Header x-original-uri, and then hitting the configured hostname at the URL you provided. A PhantomJS instance then executes all your Javascript code and returns the final page HTML.

Optionally, Arachnid can save scraped pages to a folder of your choice, so that subsequent requests to the same resource are faster.

For more info, check out our blog post on Arachnid at the Clubjudge blog.

Configuration

Arachnid expects a config.js file to be present in the project root. It ships with a config.js.example file with all available options. These are:

folder | String

The folder path where Arachnid should save scraped pages. Only has any effect if writeToFile is set to true.

host | String

The hostname to query URLs against. This value must be a valid URL for Arachnid to run correctly.

port | Number

The port where the service should run.

timeout | Number

The maximum time in ms that Arachnid should wait for your page to finish rendering before it returns the current HTML snapshot.

writeToFile | Boolean

Whether Arachnid should write scraped files to the disk. Works in tandem with the folder option.

Installation

Install NodeJS (v.0.8.11 works fine).

npm install -g phantomjs
npm install -g forever
npm install

Running it

npm start

Starting will spin up an instance of bin/arachnid using Forever. Any CLI arguments will be passed along to Forever.

Stopping it

npm stop

Pruning files

Over time the folder where Arachnid saves its scraped pages can become too large. There's an included utility script that will clear this folder for you.

npm run-script pruneFiles

An example of how to set up this task to run regularly through cron would be:

min hour dayOfMonth month dayOfWeek cd /PATH/TO/ARACHNID/; /PATH/TO/NPM run-script pruneFiles

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
bin		bin
debian		debian
.gitignore		.gitignore
README.md		README.md
config.js.example		config.js.example
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arachnid

Configuration

Installation

Running it

Stopping it

Pruning files

About

Releases

Packages

Languages

orelhinhas/arachnid

Folders and files

Latest commit

History

Repository files navigation

Arachnid

Configuration

Installation

Running it

Stopping it

Pruning files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages