Skip to content

Latest commit

 

History

History
 
 

community

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Community UDFs

This directory contains community contributed user-defined functions to extend BigQuery for more specialized usage patterns. Each UDF within this directory will be automatically synchronized to the bqutil project within the fn dataset for reference in queries.

For example, if you'd like to reference the int function within your query, you can reference it like the following:

SELECT bqutil.fn.int(1.684)

UDFs

Documentation

Take a list of comma separated key-value pairs and creates a struct. Input: strList: string that has map in the format a:b,c:d.... Output: struct for the above map.

WITH test_cases AS (
  SELECT NULL as s
  UNION ALL
  SELECT '' as s
  UNION ALL
  SELECT ',' as s
  UNION ALL
  SELECT ':' as s
  UNION ALL
  SELECT 'a:b' as s
  UNION ALL
  SELECT 'a:b,c:d' as s
  UNION ALL
  SELECT 'a:b' as s
)
SELECT key, value from test_cases as t, UNNEST(bqutil.fn.csv_to_struct(t.s)) s;

results:

key value
a b
a b
c d
a b

Returns the first occurance of str in strList where strList is a comma-delimited string. Returns null if either argument is null. Returns 0 if the first argument contains any commas. For example, find_in_set('ab', 'abc,b,ab,c,def') returns 3. Input: str: string to search for. strList: string in which to search. Output: Position of str in strList

WITH test_cases AS (
  SELECT 'ab' as str, 'abc,b,ab,c,def' as strList
  UNION ALL
  SELECT 'ab' as str, 'mobile,tablet,mobile/tablet,phone,text' as strList
  UNION ALL
  SELECT 'mobile' as str, 'mobile,tablet,mobile/tablet,phone,text' as strList
  UNION ALL
  SELECT 'mobile,' as str, 'mobile,tablet,mobile/tablet,phone,text' as strList
)
SELECT bqutil.fn.find_in_set(str, strList) from test_cases

results:

f0_
3
NULL
1
0

Construct a frequency table (histogram) of an array of elements. Frequency table is represented as an array of STRUCT(value, freq)

SELECT bqutil.fn.freq_table([1,2,1,3,1,5,1000,5]) ft

results:

Row ft.value ft.freq
1 1 3
2 1
3 1
5 2
1000 1

Given a key and a map, returns the ARRAY type value. This is same as get_value except it returns an ARRAY type. This can be used when the map has multiple values for a given key.

WITH test AS (
  SELECT ARRAY(
    SELECT STRUCT('a' AS key, 'aaa' AS value) AS s
    UNION ALL
    SELECT STRUCT('b' AS key, 'bbb' AS value) AS s
    UNION ALL
    SELECT STRUCT('a' AS key, 'AAA' AS value) AS s
    UNION ALL
    SELECT STRUCT('c' AS key, 'ccc' AS value) AS s
  ) AS a
)
SELECT bqutil.fn.get_array_value('b', a), bqutil.fn.get_array_value('a', a), bqutil.fn.get_array_value('c', a) from test;

results:

f0_ f1_ f2_
["bbb"] ["aaa","AAA"] ["ccc"]

Given a key and a list of key-value maps in the form [{'key': 'a', 'value': 'aaa'}], returns the SCALAR type value.

WITH test AS (
  SELECT ARRAY(
    SELECT STRUCT('a' AS key, 'aaa' AS value) AS s
    UNION ALL
    SELECT STRUCT('b' AS key, 'bbb' AS value) AS s
    UNION ALL
    SELECT STRUCT('c' AS key, 'ccc' AS value) AS s
  ) AS a
)
SELECT bqutil.fn.get_value('b', a), bqutil.fn.get_value('a', a), bqutil.fn.get_value('c', a) from test;

results:

f0_ f1_ f2_
bbb aaa ccc

Convience wrapper which can be used to convert values to integers in place of the native CAST(x AS INT64).

SELECT bqutil.fn.int(1) int1
  , bqutil.fn.int(2.5) int2
  , bqutil.fn.int('7') int3
  , bqutil.fn.int('7.8') int4

1, 2, 7, 7

Note that CAST(x AS INT64) rounds the number, while this function truncates it. In many cases, that's the behavior users expect.

Returns the type of JSON value. It emulates json_typeof of PostgreSQL.

SELECT
       bqutil.fn.json_typeof('{"foo": "bar"}'),
       bqutil.fn.json_typeof(TO_JSON_STRING(("foo", "bar"))),
       bqutil.fn.json_typeof(TO_JSON_STRING([1,2,3])),
       bqutil.fn.json_typeof(TO_JSON_STRING("test")),
       bqutil.fn.json_typeof(TO_JSON_STRING(123)),
       bqutil.fn.json_typeof(TO_JSON_STRING(TRUE)),
       bqutil.fn.json_typeof(TO_JSON_STRING(FALSE)),
       bqutil.fn.json_typeof(TO_JSON_STRING(NULL)),

object, array, string, number, boolean, boolean, null

Get the date representing the last day of the month.

SELECT bqutil.fn.last_day(DATE("1987-12-25"))
  , bqutil.fn.last_day(DATE("1998-09-04"))
  , bqutil.fn.last_day(DATE("2020-02-21")) -- leap year
  , bqutil.fn.last_day(DATE("2019-02-21")) -- non-leap year

results:

f0_ f1_ f2_ f3_
1987-12-31 1998-09-30 2020-02-29 2019-02-28

Interpolate the current positions value from the preceding and folllowing coordinates

SELECT 
  bqutil.fn.linear_interpolate(2, STRUCT(0 AS x, 0.0 AS y), STRUCT(10 AS x, 10.0 AS y)),
  bqutil.fn.linear_interpolate(2, STRUCT(0 AS x, 0.0 AS y), STRUCT(20 AS x, 10.0 AS y))

results:

f0_ f1_
2.0 1.0

Get the median of an array of numbers.

SELECT bqutil.fn.median([1,1,1,2,3,4,5,100,1000]) median_1
  , bqutil.fn.median([1,2,3]) median_2
  , bqutil.fn.median([1,2,3,4]) median_3

3.0, 2.0, 2.5

Parse numbers from text.

SELECT bqutil.fn.nlp_compromise_number('one hundred fifty seven')
  , bqutil.fn.nlp_compromise_number('three point 5')
  , bqutil.fn.nlp_compromise_number('2 hundred')
  , bqutil.fn.nlp_compromise_number('minus 8')
  , bqutil.fn.nlp_compromise_number('5 million 3 hundred 25 point zero 1')

157, 3.5, 200, -8, 5000325.01

Extract names out of text.

SELECT bqutil.fn.nlp_compromise_people(
  "hello, I'm Felipe Hoffa and I work with Elliott Brossard - who thinks Jordan Tigani will like this post?"
) names

["felipe hoffa", "elliott brossard", "jordan tigani"]

Calculate the percentage change (increase/decrease) between two numbers.

SELECT bqutil.fn.percentage_change(0.2, 0.4)
  , bqutil.fn.percentage_change(5, 15)
  , bqutil.fn.percentage_change(100, 50)
  , bqutil.fn.percentage_change(-20, -45)

results:

f0_ f1_ f2_ f3_
1.0 2.0 -0.5 -1.125

Calculate the percentage difference between two numbers.

SELECT bqutil.fn.percentage_difference(0.2, 0.8)
  , bqutil.fn.percentage_difference(4.0, 12.0)
  , bqutil.fn.percentage_difference(100, 200)
  , bqutil.fn.percentage_difference(1.0, 1000000000)

results:

f0_ f1_ f2_ f3_
1.2 1.0 0.6667 2.0

Convert values into radian.

SELECT bqutil.fn.radians(180) is_this_pi

3.141592653589793

Generate random integers between the min and max values.

SELECT bqutil.fn.random_int(0,10) randint, COUNT(*) c
FROM UNNEST(GENERATE_ARRAY(1,1000))
GROUP BY 1
ORDER BY 1

Returns a random value from an array.

SELECT
  bqutil.fn.random_value(['tino', 'jordan', 'julie', 'elliott', 'felipe']),
  bqutil.fn.random_value(['tino', 'jordan', 'julie', 'elliott', 'felipe']),
  bqutil.fn.random_value(['tino', 'jordan', 'julie', 'elliott', 'felipe'])

'tino', 'julie', 'jordan'

For a given expression, replaces all occurrences of specified characters with specified substitutes. Existing characters are mapped to replacement characters by their positions in the characters_to_replace and characters_to_substitute arguments. If more characters are specified in the characters_to_replace argument than in the characters_to_substitute argument, the extra characters from the characters_to_replace argument are omitted in the return value.

SELECT bqutil.fn.translate('mint tea', 'inea', 'osin')

most tin

Generate a timestamp array associated with each key

SELECT *
FROM 
  UNNEST(bqutil.fn.ts_gen_keyed_timestamps(['abc', 'def'], 60, TIMESTAMP '2020-01-01 00:30:00', TIMESTAMP '2020-01-01 00:31:00))
series_key tumble_val
abc 2020-01-01 00:30:00 UTC
def 2020-01-01 00:30:00 UTC
abc 2020-01-01 00:31:00 UTC
def 2020-01-01 00:31:00 UTC

Interpolate the positions value using timestamp seconds as the x-axis

select bqutil.fn.ts_linear_interpolate(
  TIMESTAMP '2020-01-01 00:30:00', 
  STRUCT(TIMESTAMP '2020-01-01 00:29:00' AS x, 1.0 AS y),
  STRUCT(TIMESTAMP '2020-01-01 00:31:00' AS x, 3.0 AS y)
)
f0_
2.0

Function to compare two timestamp as being within the same session window. A timestamp in the same session window as its previous timestamp will evaluate as NULL, otherwise the current row's timestamp is returned. The "LAST_VALUE(ts IGNORE NULLS)" window function can then be used to stamp all rows with the starting timestamp for the session window.

--5 minute (300 seconds) session window
WITH ticks AS (
  SELECT 'abc' as key, 1.0 AS price, CAST('2020-01-01 01:04:59 UTC' AS TIMESTAMP) AS ts
  UNION ALL
  SELECT 'abc', 2.0, CAST('2020-01-01 01:05:00 UTC' AS TIMESTAMP)
  UNION ALL
  SELECT 'abc', 3.0, CAST('2020-01-01 01:05:01 UTC' AS TIMESTAMP)
  UNION ALL
  SELECT 'abc', 4.0, CAST('2020-01-01 01:09:01 UTC' AS TIMESTAMP)
  UNION ALL
  SELECT 'abc', 5.0, CAST('2020-01-01 01:24:01 UTC' AS TIMESTAMP)
)
SELECT
  * EXCEPT(session_group),
  LAST_VALUE(session_group IGNORE NULLS)
    OVER (PARTITION BY key ORDER BY ts ASC) AS session_group
FROM (
  SELECT
    *,
    bqutil.fn.ts_session_group(
      ts,
      LAG(ts) OVER (PARTITION BY key ORDER BY ts ASC),
      300
    ) AS session_group
  FROM ticks 
)
key price ts sesssion_group
abc 1.0 2020-01-01 01:04:59 UTC 2020-01-01 01:04:59 UTC
abc 2.0 2020-01-01 01:05:00 UTC 2020-01-01 01:04:59 UTC
abc 3.0 2020-01-01 01:05:01 UTC 2020-01-01 01:04:59 UTC
abc 4.0 2020-01-01 01:09:01 UTC 2020-01-01 01:04:59 UTC
abc 5.0 2020-01-01 01:24:01 UTC 2020-01-01 01:24:01 UTC

Calculate the sliding windows the ts parameter belongs to.

-- show a 15 minute window every 5 minutes and a 15 minute window every 10 minutes
WITH ticks AS (
  SELECT 1.0 AS price, CAST('2020-01-01 01:04:59 UTC' AS TIMESTAMP) AS ts
  UNION ALL
  SELECT 2.0, CAST('2020-01-01 01:05:00 UTC' AS TIMESTAMP)
  UNION ALL
  SELECT 3.0, CAST('2020-01-01 01:05:01 UTC' AS TIMESTAMP)
)
SELECT
  price,
  ts,
  bqutil.fn.ts_slide(ts, 300, 900) as _5_15,
  bqutil.fn.ts_slide(ts, 600, 900) as _10_15,
FROM ticks
price ts _5_15.window_start _5_15.window_end _5_15.window_start _5_15.window_end
1.0 2020-01-01 01:04:59 UTC 2020-01-01 00:50:00 UTC 2020-01-01 01:05:00 UTC 2020-01-01 00:50:00 UTC 2020-01-01 01:05:00 UTC
2020-01-01 00:55:00 UTC 2020-01-01 01:10:00 UTC 2020-01-01 01:00:00 UTC 2020-01-01 01:15:00 UTC
2020-01-01 01:00:00 UTC 2020-01-01 01:15:00 UTC
2.0 2020-01-01 01:05:00 UTC 2020-01-01 00:55:00 UTC 2020-01-01 01:10:00 UTC 2020-01-01 01:00:00 UTC 2020-01-01 01:15:00 UTC
2020-01-01 01:00:00 UTC 2020-01-01 01:15:00 UTC
2020-01-01 01:05:00 UTC 2020-01-01 01:20:00 UTC
3.0 2020-01-01 01:05:01 UTC 2020-01-01 00:55:00 UTC 2020-01-01 01:10:00 UTC 2020-01-01 01:00:00 UTC 2020-01-01 01:15:00 UTC
2020-01-01 01:00:00 UTC 2020-01-01 01:15:00 UTC
2020-01-01 01:05:00 UTC 2020-01-01 01:20:00 UTC

Calculate the tumbling window the input_ts belongs in

SELECT
  fn.ts_tumble(TIMESTAMP '2020-01-01 00:17:30', 900) AS min_15,
  fn.ts_tumble(TIMESTAMP '2020-01-01 00:17:30', 600) AS min_10,
  fn.ts_tumble(TIMESTAMP '2020-01-01 00:17:30', 60) As min_1
min_15 min_10
2020-01-01 00:15:00 UTC 2020-01-01 00:10:00 UTC 2020-01-01 00:17:00 UTC

Return the type of input or 'UNKNOWN' if input is unknown typed value.

SELECT
  bqutil.fn.typeof(""),
  bqutil.fn.typeof(b""),
  bqutil.fn.typeof(1.0),
  bqutil.fn.typeof(STRUCT()),

STRING, BINARY, FLOAT64, STRUCT

Get an array of url param keys.

SELECT bqutil.fn.url_keys(
  'https://www.google.com/search?q=bigquery+udf&client=chrome')

["q", "client"]

Get the value of a url param key.

SELECT bqutil.fn.url_param(
  'https://www.google.com/search?q=bigquery+udf&client=chrome', 'client')

"chrome"

Returns the specified part from the URL. Valid values for partToExtract include HOST, PATH, QUERY, REF, PROTOCOL For example, url_parse('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'.

WITH urls AS (
  SELECT 'http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1' as url
  UNION ALL
  SELECT 'rpc://facebook.com/' as url
)
SELECT bqutil.fn.url_parse(url, 'HOST'), bqutil.fn.url_parse(url, 'PATH'), bqutil.fn.url_parse(url, 'QUERY'), bqutil.fn.url_parse(url, 'REF'), bqutil.fn.url_parse(url, 'PROTOCOL') from urls

results:

f0_ f1_ f2_ f3_ f4_
facebook.com path1/p.php k1=v1&k2=v2#Ref1 Ref1 http
facebook.com NULL NULL NULL rpc

Convert a STRING formatted as a YYYYMMDD to a DATE

SELECT bqutil.fn.y4md_to_date('20201220')

"2020-12-20"

Normalize a variable so that it has zero mean and unit variance.

with r AS (
  SELECT 10 AS x
  UNION ALL SELECT 20
  UNION ALL SELECT 30
  UNION ALL SELECT 40
  UNION ALL SELECT 50
),
stats AS (
  SELECT AVG(x) AS meanx, STDDEV(x) AS stddevx
  FROM r
)
SELECT x, bqutil.fn.zeronorm(x, meanx, stddevx) AS zeronorm
FROM r, stats;

returns:

Row x zeronorm
1 10 -12.649110640673518
2 20 -6.324555320336759
3 30 0.0
4 40 6.324555320336759
5 50 12.649110640673518