Parent document: connectors
Hadoop connector can be used to read hdfs files in batch scenarios. Its function points mainly include:
- Support reading files in multiple hdfs directories at the same time
- Support reading hdfs files of various formats
<dependency>
<groupId>com.bytedance.bitsail</groupId>
<artifactId>bitsail-connector-hadoop</artifactId>
<version>${revision}</version>
</dependency>
- Basic data types supported by Hadoop connectors:
- Integer type:
- short
- int
- long
- biginterger
- Float type:
- float
- double
- bigdecimal
- Time type:
- timestamp
- date
- time
- String type:
- string
- Bool type:
- boolean
- Binary type:
- binary
- Integer type:
- Composited data types supported by Hadoop connectors:
- map
- list
The following mentioned parameters should be added to job.reader
block when using, for example:
{
"job": {
"reader": {
"path_list": "hdfs://test_path/test.csv"
}
}
}
Param name | Required | Optional value | Description |
---|---|---|---|
class | Yes | Class name of hadoop connector, com.bytedance.bitsail.connector.hadoop.source.HadoopInputFormat |
|
path_list | Yes | Specifies the path of the read in file. Multiple paths can be specified, separated by ',' |
|
content_type | Yes | JSON CSV |
Specify the format of the read in file. For details, refer to支持的文件格式 |
columns | Yes | Describing fields' names and types |
Param name | Required | Optional value | Description |
---|---|---|---|
hadoop_conf | No | Specify the read configuration of hadoop in the standard json format string | |
reader_parallelism_num | No | Reader parallelism |
Support the following formats:
It supports parsing text files in json format. Each line is required to be a standard json string.
The following parameters are supported to adjust the json parsing stype:
Parameter name | Default value | Description |
---|---|---|
job.common.case_insensitive |
true | Whether to be sensitive to the case of the key in the json field |
job.common.json_serializer_features |
Specify the mode when 'FastJsonUtil' is parsed. The format is ',' separated string, for example"QuoteFieldNames,UseSingleQuotes" |
|
job.common.convert_error_column_as_null |
false | Whether to set the field with parsing error to null |
Support parsing of text files in csv format. Each line is required to be a standard csv string.
The following parameters are supported to adjust the csv parsing style:
Parameter name | Default value | Description |
---|---|---|
job.common.csv_delimiter |
',' |
csv delimiter |
job.common.csv_escape |
escape character | |
job.common.csv_quote |
quote character | |
job.common.csv_with_null_string |
Specify the conversion value of null field. It is not converted by default |
Configuration examples: hadoop-connector-example