copyright | lastupdated | ||
---|---|---|---|
|
2017-08-18 |
{:shortdesc: .shortdesc} {:new_window: target="_blank"} {:tip: .tip} {:pre: .pre} {:codeblock: .codeblock} {:screen: .screen} {:javascript: .ph data-hd-programlang='javascript'} {:java: .ph data-hd-programlang='java'} {:python: .ph data-hd-programlang='python'} {:swift: .ph data-hd-programlang='swift'}
The data crawler lets you automate the upload of content to the {{site.data.keyword.discoveryshort}} Service. {: shortdesc}
The Data Crawler is a command line tool that will help you take your documents from the repositories where they reside (for example: file shares, databases, Microsoft SharePoint® ) and push them to the cloud, to be used by the {{site.data.keyword.discoveryshort}} Service.
The Data Crawler should be used if you want to have a managed upload of a significant number of files from a remote system, or you want to extract content from a supported repository (such as a DB2 database).
The Data Crawler is not intended to be a solution for uploading files from your local drive. Uploading files from a local drive should be done using the tooling or by using direct API calls. {: tip}
- Configure the {{site.data.keyword.discoveryshort}} service
- Download and install the Data Crawler on a supported Linux system that has access to the content that you want to crawl.
- Connect the Data Crawler to your content.
- Configure the Data Crawler to connect to the {{site.data.keyword.discoveryshort}} Service.
- Crawl your content.
You can get started quickly with the Data Crawler by following the example in: Getting started with the Data Crawler