Horizontal Scaling

Use cases

Auto or manual scaling

Nodes can be added to or removed from openLooKeng clusters dynamically, to support scaling scenarios. While it’s the responsibility of the Resource Provider to determine when to scale and, in the case of scaling-in, which nodes to remove, openLooKeng ensures that these changes function properly. In particular, during a scale-in, removal of a node will not impact workloads on that node.

For auto-scaling scenarios, the resource provider may want to base the decision on:

  • System metrics, e.g. CPU, memory, and I/O, or
  • Node workload status (see status API below)

As an example, openLooKeng can be deployed in a Kubernetes environment, and use HorizontalPodAutoscaler to auto-scale the cluster size. You can find some sample deployment data under Kubernetes of the hetu-samples module.

Node maintenance

If a node needs to be separated from an openLooKeng cluster temporarily, for example to perform some maintenance tasks, it can be isolated first, then added back to the cluster afterwards.

Node status API

Workload information about a particular node can be obtained by invoking its status API. It includes CPU and memory allocation and usage information. For details about structure of the returned value, please refer to the io.prestosql.server.NodeStatus class.

For example, with curl and assuming the node address 1.2.3.4:8080

$ curl http://1.2.3.4:8080/v1/status | jq

{
  "nodeId": "...",
  ...
  "memoryInfo": {
    "availableProcessors": ...,
    "totalNodeMemory": "...",
    ...
  },
  "processors": ...,
  "processCpuLoad": ...,
  "systemCpuLoad": ...,
  "heapUsed": ...,
  "heapAvailable": ...,
  "nonHeapUsed": ...
}

Node state management API

Scaling and isolation can be achieved through the node state management API.

Shutdown a node

If a cluster does not need a node anymore (e.g. as part of a scaling down operation), the node can be shutdown.

Such process is graceful, in that

  • the cluster does not assign new workload to this node, and
  • the node attempts to finish all existing workloads before shutting down its main process.

The shutdown process is irreversible. It can be initiated through the REST API, by putting the SHUTTING_DOWN state to the info/state endpoint of the node. For example:

$ curl -X PUT -H "Content-Type: application/json" http://1.2.3.4:8080/v1/info/state \
    -d '"SHUTTING_DOWN"'

Isolate a node

Isolation takes a node off of the cluster temporarily.

  • Isolation can be graceful in a way similar to shutdown: wait for workloads to finish first. Nodes in this waiting period are in the isolating state.
  • It can also be non-graceful, to enter the isolated directly.
  • Isolating and isolated nodes are not assigned new workloads.

Nodes’ isolation status are changed via the REST API, with different target states. For example:

# To gracefully isolate a node
$ curl -X PUT -H "Content-Type: application/json" http://1.2.3.4:8080/v1/info/state \
    -d '"ISOLATING"'

# To isolate a node immediately
$ curl -X PUT -H "Content-Type: application/json" http://1.2.3.4:8080/v1/info/state \
    -d '"ISOLATED"'

# To make the node available again
$ curl -X PUT -H "Content-Type: application/json" http://1.2.3.4:8080/v1/info/state \
    -d '"ACTIVE"'

Node state transitions

An openLooKeng node (coordinator or worker) can be in one of 5 states:

  • Inactive
  • Active
  • Isolating
  • Isolated
  • Shutting down

Shutdown and isolation operations move nodes among these states.

node-state-transitions

  1. Other than INACTIVE, transitions to current state are allowed, but have no effect
  2. Request to isolate the node, by not assigning new workload on it, and waiting for existing workloads to finish
  3. Immediately isolate the node, by not assigning new workload. If there are existing workloads, they are deemed unimportant and may not finish
  4. Automatic transition when a node is ISOLATING and active workloads finish
  5. Restore the node back to normal operation
  6. Request to shut down the node, by not assigning new workload on it, and waiting for existing workloads to finish
  7. Automatic transition when a node is SHUTTING_DOWN and active workloads finish

有奖捉虫

“有虫”文档片段

0/500

存在的问题

文档存在风险与错误

● 拼写,格式,无效链接等错误;

● 技术原理、功能、规格等描述和软件不一致,存在错误;

● 原理图、架构图等存在错误;

● 版本号不匹配:文档版本或内容描述和实际软件不一致;

● 对重要数据或系统存在风险的操作,缺少安全提示;

● 排版不美观,影响阅读;

内容描述不清晰

● 描述存在歧义;

● 图形、表格、文字等晦涩难懂;

● 逻辑不清晰,该分类、分项、分步骤的没有给出;

内容获取有困难

● 很难通过搜索引擎,openLooKeng官网,相关博客找到所需内容;

示例代码有错误

● 命令、命令参数等错误;

● 命令无法执行或无法完成对应功能;

内容有缺失

● 关键步骤错误或缺失,无法指导用户完成任务,比如安装、配置、部署等;

● 逻辑不清晰,该分类、分项、分步骤的没有给出

● 图形、表格、文字等晦涩难懂

● 缺少必要的前提条件、注意事项等;

● 描述存在歧义

0/500

您对文档的总体满意度

非常不满意
非常满意

请问是什么原因让您参与到这个问题中

您的邮箱

创Issue赢奖品
根据您的反馈,会自动生成issue模板。您只需点击按钮,创建issue即可。
有奖捉虫