String Functions and Operators

String Operators

The || operator performs concatenation.

String Functions

Note

These functions assume that the input strings contain valid UTF-8 encoded Unicode code points. There are no explicit checks for valid UTF-8 and the functions may return incorrect results on invalid UTF-8. Invalid UTF-8 data can be corrected with from_utf8.

Additionally, the functions operate on Unicode code points and not user visible characters (or grapheme clusters). Some languages combine multiple code points into a single user-perceived character, the basic unit of a writing system for a language, but the functions will treat each code point as a separate unit.

The lower and upper functions do not perform locale-sensitive, context-sensitive, or one-to-many mappings required for some languages.

Specifically, this will return incorrect results for Lithuanian, Turkish and Azeri.

chr(n) -> varchar

Returns the Unicode code point n as a single character string.

codepoint(string) -> integer

Returns the Unicode code point of the only character of string.

concat(string1, ..., stringN) -> varchar

Returns the concatenation of string1, string2, ..., stringN. This function provides the same functionality as the SQL-standard concatenation operator (||).

hamming_distance(string1, string2) -> bigint

Returns the Hamming distance of string1 and string2, i.e. the number of positions at which the corresponding characters are different.

Note that the two strings must have the same length.

length(string) -> bigint

Returns the length of string in characters.

levenshtein_distance(string1, string2) -> bigint

Returns the Levenshtein edit distance of string1 and string2, i.e. the minimum number of single-character edits (insertions, deletions or substitutions) needed to change string1 into string2.

lower(string) -> varchar

Converts string to lowercase.

lpad(string, size, padstring) -> varchar

Left pads string to size characters with padstring. If size is less than the length of string, the result is truncated to size characters. size must not be negative and padstring must be non-empty.

ltrim(string) -> varchar

Removes leading whitespace from string.

replace(string, search) -> varchar

Removes all instances of search from string.

replace(string, search, replace) -> varchar

Replaces all instances of search with replace in string.

reverse(string) -> varchar

Returns string with the characters in reverse order.

rpad(string, size, padstring) -> varchar

Right pads string to size characters with padstring. If size is less than the length of string, the result is truncated to size characters. size must not be negative and padstring must be non-empty.

rtrim(string) -> varchar

Removes trailing whitespace from string.

split(string, delimiter) -> array(varchar)

Splits string on delimiter and returns an array.

split(string, delimiter, limit) -> array(varchar)

Splits string on delimiter and returns an array of size at most limit. The last element in the array always contain everything left in the string. limit must be a positive number.

split_part(string, delimiter, index) -> varchar

Splits string on delimiter and returns the field index. Field indexes start with 1. If the index is larger than than the number of fields, then null is returned.

split_to_map(string, entryDelimiter, keyValueDelimiter) -> map<varchar, varchar>

Splits string by entryDelimiter and keyValueDelimiter and returns a map. entryDelimiter splits string into key-value pairs.

keyValueDelimiter splits each pair into key and value.

split_to_multimap(string, entryDelimiter, keyValueDelimiter) -> map(varchar, array(varchar))

Splits string by entryDelimiter and keyValueDelimiter and returns a map containing an array of values for each unique key.

entryDelimiter splits string into key-value pairs. keyValueDelimiter splits each pair into key and value. The values for each key will be in the same order as they appeared in string.

strpos(string, substring) -> bigint

Returns the starting position of the first instance of substring in string. Positions start with 1. If not found, 0 is returned.

position(substring IN string) -> bigint

Returns the starting position of the first instance of substring in string. Positions start with 1. If not found, 0 is returned.

substr(string, start) -> varchar

Returns the rest of string from the starting position start. Positions start with 1. A negative starting position is interpreted as being relative to the end of the string.

substr(string, start, length) -> varchar

Returns a substring from string of length length from the starting position start. Positions start with 1. A negative starting position is interpreted as being relative to the end of the string.

trim(string) -> varchar

Removes leading and trailing whitespace from string.

upper(string) -> varchar

Converts string to uppercase.

word_stem(word) -> varchar

Returns the stem of word in the English language.

word_stem(word, lang) -> varchar

Returns the stem of word in the lang language.

Unicode Functions

normalize(string) -> varchar

Transforms string with NFC normalization form.

normalize(string, form) -> varchar

Transforms string with the specified normalization form. form must be be one of the following keywords:

FormDescription
NFDCanonical Decomposition
NFCCanonical Decomposition, followed by Canonical Composition
NFKDCompatibility Decomposition
NFKCCompatibility Decomposition, followed by Canonical Composition

Note

This SQL-standard function has special syntax and requires specifying form as a keyword, not as a string.

to_utf8(string) -> varbinary

Encodes string into a UTF-8 varbinary representation.

from_utf8(binary) -> varchar

Decodes a UTF-8 encoded string from binary. Invalid UTF-8 sequences are replaced with the Unicode replacement character U+FFFD.

from_utf8(binary, replace) -> varchar

Decodes a UTF-8 encoded string from binary. Invalid UTF-8 sequences are replaced with replace. The replacement string replace must either be a single character or empty (in which case invalid characters are removed).

有奖捉虫

“有虫”文档片段

0/500

存在的问题

文档存在风险与错误

● 拼写,格式,无效链接等错误;

● 技术原理、功能、规格等描述和软件不一致,存在错误;

● 原理图、架构图等存在错误;

● 版本号不匹配:文档版本或内容描述和实际软件不一致;

● 对重要数据或系统存在风险的操作,缺少安全提示;

● 排版不美观,影响阅读;

内容描述不清晰

● 描述存在歧义;

● 图形、表格、文字等晦涩难懂;

● 逻辑不清晰,该分类、分项、分步骤的没有给出;

内容获取有困难

● 很难通过搜索引擎,openLooKeng官网,相关博客找到所需内容;

示例代码有错误

● 命令、命令参数等错误;

● 命令无法执行或无法完成对应功能;

内容有缺失

● 关键步骤错误或缺失,无法指导用户完成任务,比如安装、配置、部署等;

● 逻辑不清晰,该分类、分项、分步骤的没有给出

● 图形、表格、文字等晦涩难懂

● 缺少必要的前提条件、注意事项等;

● 描述存在歧义

0/500

您对文档的总体满意度

非常不满意
非常满意

请问是什么原因让您参与到这个问题中

您的邮箱

创Issue赢奖品
根据您的反馈,会自动生成issue模板。您只需点击按钮,创建issue即可。
有奖捉虫