JDBC Data Source Multi-Split Management

Overview

This function applies to JDBC data sources. Data tables to be read are divided into multiple splits, and multiple worker nodes in the cluster simultaneously read the splits to accelerate data reading.

Properties

Multi-split management is based on connectors. For a data table with this function enabled, add the following attributes to the configuration file of the connector to which the data table belong. For example, the configuration file corresponding to the mysql connector is etc/catalog/mysql.properties.

Property list:

Configure the properties as follows:

jdbc.table-split-enabled=true
jdbc.table-split-stepCalc-refresh-interval=10s
jdbc.table-split-stepCalc-threads=2
jdbc.table-split-fields=[{"catalogName":"test_catalog", "schemaName":null, "tableName":"test_table", "splitField":"id","dataReadOnly":"true", "calcStepEnable":"false", "splitCount":"5","fieldMinValue":"1","fieldMaxValue":"10000"},{"catalogName":"test_catalog1", "schemaName":"test_schema1", "tableName":"test_tabl1", "splitField":"id", "dataReadOnly":"false", "calcStepEnable":"true", "splitCount":"5", "fieldMinValue":"","fieldMaxValue":""}]

Descriptions of the properties:

  • jdbc.table-split-enabled: whether to enable the multi-split data read function. The default value is false.
  • jdbc.table-split-stepCalc-refresh-interval: interval for dynamically updating splits. The default value is 5 minutes.
  • jdbc.table-split-stepCalc-threads: number of threads for dynamically updating splits. The default value is 4.
  • jdbc.table-split-fields: split configuration of each data table. For details, see section “Split Configuration”.

Split Configuration

The configuration of each data table consists of multiple sub-properties, which are set in the JSON format. The description is as follows:

Sub-propertyDescriptionSuggestion
catalogNameName of the catalog to which the data table belongs in the data source, which corresponds to the value of the TABLE_CAT field returned by the standard JDBC API DatabaseMetaData.getTables.Set this sub-property to the actual value. If the value is empty, set it to null.
schemaNameName of the schema to which the data table belongs in the data source, which corresponds to the value of the TABLE_SCHEM field returned by the standard JDBC API DatabaseMetaData.getTables.Set this sub-property to the actual value. If the value is empty, set it to null.
tableNameName of the data table in the data source, which corresponds to the value of the TABLE_NAME field returned by the standard JDBC API DatabaseMetaData.getTables.Set this sub-property based on the actual value.
splitFieldColumn name of the splitSelect a column whose value is an integer. You are advised to select a column with fewer duplicate values to divide the column into even splits.
calcStepEnableWhether to dynamically adjust the split rangeSet this sub-property to true for data tables with data changes.
dataReadOnlyWhether the data table is read-onlySet this sub-property to true for read-only data tables.
splitCountNumber of concurrent reads of data splitsSet this sub-property based on the optimal value.
fieldMinValueMinimum value of the splitField fieldSet this sub-property for read-only data tables based on the query result. Otherwise, leave this sub-property empty or set it to null.
fieldMaxValueMaximum value of the splitField fieldSet this sub-property for read-only data tables based on the query result. Otherwise, leave this sub-property empty or set it to null.

有奖捉虫

“有虫”文档片段

0/500

存在的问题

文档存在风险与错误

● 拼写,格式,无效链接等错误;

● 技术原理、功能、规格等描述和软件不一致,存在错误;

● 原理图、架构图等存在错误;

● 版本号不匹配:文档版本或内容描述和实际软件不一致;

● 对重要数据或系统存在风险的操作,缺少安全提示;

● 排版不美观,影响阅读;

内容描述不清晰

● 描述存在歧义;

● 图形、表格、文字等晦涩难懂;

● 逻辑不清晰,该分类、分项、分步骤的没有给出;

内容获取有困难

● 很难通过搜索引擎,openLooKeng官网,相关博客找到所需内容;

示例代码有错误

● 命令、命令参数等错误;

● 命令无法执行或无法完成对应功能;

内容有缺失

● 关键步骤错误或缺失,无法指导用户完成任务,比如安装、配置、部署等;

● 逻辑不清晰,该分类、分项、分步骤的没有给出

● 图形、表格、文字等晦涩难懂

● 缺少必要的前提条件、注意事项等;

● 描述存在歧义

0/500

您对文档的总体满意度

非常不满意
非常满意

请问是什么原因让您参与到这个问题中

您的邮箱

创Issue赢奖品
根据您的反馈,会自动生成issue模板。您只需点击按钮,创建issue即可。
有奖捉虫