Skip to content

Latest commit

 

History

History
166 lines (140 loc) · 8.85 KB

config.md

File metadata and controls

166 lines (140 loc) · 8.85 KB

Job Configuration Guide

English | 简体中文

BitSail script configuration is managed by JSON structure, follow scripts show the complete structure:

{
    "job":{
        "common":{
        ...
        },
        "reader":{
        ...
        },
        "writer":{
        ...
        }
    }
}
Module Name Description
common It is mainly responsible for general setting, such job metadata, plugins setting.
reader/readers It is mainly responsible for setting relevant parameter information on the source data side. Taking the MySQL data source as an example, you need to set the JDBC connection information and the database table information of the operation under the subdomain of the reader.
writer/writers Mainly responsible for setting the relevant parameters of the target data source, etc. Taking the Hive target data source as an example, you need to set the Hive metastore connection information under the writer's subdomain, and set the Hive database table and partition related information.

Common Module

Example:

{
    "job":{
        "common":{
            "user_name":"bytedance_dts",
            "instance_id":-1L,
            "job_id":-1L,
            "job_name":"",
            "min_parallelism":1,
            "max_parallelism":5,
            "parallelism_chain":false,
            "max_dirty_records_stored_num":50,
            "dirty_records_count_threshold":-1,
            "dirty_records_percentage_threshold":-1
        }
    }
}

Description:

Metadata Parameters:

Parameter name Required Default Description Example
user_name TRUE - job's submitter bitsail
job_id TRUE - job' unique id 12345
instance_id TRUE - job's instance id, maybe use in some scheduler system. 12345
job_name TRUE - job's name bitsail_conf

Parameter parallelism:

Parameter name Required Default Description Example
min_parallelism FALSE 1 The minimum parallelism of the job, the parallelism from automatic calculation will be greater than or equal to the minimum parallelism. 2
max_parallelism FALSE 512 The maximum parallelism of the job, the parallelism from automatic calculation will be less than or equal to the maximum parallelism. 2
parallelism_chain FALSE FALSE Whether chain the operator between operators. If this option is enabled, will select min parallelism between readers and writers as final parallelism. 2

Dirty record setting:(Only in batch mode)

Parameter name Required Default Description Example
max_dirty_records_stored_num FALSE 50 Every task collect size for dirty record. 50
dirty_records_count_threshold FALSE -1 The threshold of the total dirty records, if dirty records bigger than the threshold, job will fail in final -1
dirty_record_percentage_threshold FALSE -1 The percent threshold of the total dirty records, if dirty records percent bigger than the threshold, job will fail in final. -1

Reader Module

Examples:

{
    "job":{
        "reader":
   
            {
                "class":"com.bytedance.bitsail.connector.legacy.jdbc.source.JDBCInputFormat",
                "columns":[
                    {
                        "name":"id",
                        "type":"bigint"
                    },
                    {
                        "name":"name",
                        "type":"varchar"
                    }
                ],
                "table_name":"your table name",
                "db_name":"your database name",
                "password":"your database connection password",
                "user_name":"your database connection username",
                "split_pk":"your table primary key",
                "connections":[
                    {
                        "slaves":[
                            {
                                "port":"your connection's port",
                                "db_url":"your connection's url",
                                "host":"your connection's host"
                            }
                        ],
                        "shard_num":0,
                        "master":{
                            "port":"your connection's port",
                            "db_url":"your connection's url",
                            "host":"your connection's host"
                        }
                    }
                ]
            }
        
    }
}

Common Parameter:

Parameter name Required Default Description Example
class TRUE - Connector's class name com.bytedance.bitsail.connector.legacy.jdbc.source.JDBCInputFormat
reader_parallelism_num FALSE - Specify the parallelism for the reader operator. 2

Other parameters please check the connector

Writer Module

{
    "writer":
        {
            "class":"com.bytedance.bitsail.connector.legacy.hive.sink.HiveParquetOutputFormat",
            "db_name":"your hive database' name.",
            "table_name":"your hive database' table name.",
            "partition":"your partition which want to add.",
            "metastore_properties":"{\"hive.metastore.uris\":\"thrift://localhost:9083\"}",
            "columns":[
                {
                    "name":"id",
                    "type":"bigint"
                }
            ],
            "write_mode":"overwrite",
            "writer_parallelism_num":1
        }
}

Common Parameters:

Parameter name Required Default Description Example
class TRUE - Connector's class name com.bytedance.bitsail.connector.legacy.hive.sink.HiveParquetOutputFormat
writer_parallelism_num FALSE - Specify Writer's parallelism, default bitsail will calculate write parallelism for the job. 2

Other parameters please check the connector