uddsketch() and percentile_agg() functions

ToolkitTimescaleDB Toolkit functions are available under Timescale Community Edition. They are automatically included with Timescale, but must be installed separately for self-hosted TimescaleDB. Click to learn more.

Estimate the value at a given percentile, or the percentile rank of a given value, using the UddSketch algorithm. This estimation is more memory- and CPU-efficient than an exact calculation using PostgreSQL's percentile_cont and percentile_disc functions.

uddsketch is one of two advanced percentile approximation aggregates provided in TimescaleDB Toolkit. It produces stable estimates within a guaranteed relative error.

The other advanced percentile approximation aggregate is tdigest, which is more accurate at extreme quantiles, but is somewhat dependent on input order.

If you aren't sure which aggregate to use, try the default percentile estimation method, percentile_agg. It uses the uddsketch algorithm with some sensible defaults.

For more information about percentile approximation algorithms, see the algorithms overview.

Related hyperfunction groups

This group of functions uses the two-step aggregation pattern.

Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.

Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.

The two-step aggregation pattern has several advantages:

More efficient because multiple accessors can reuse the same aggregate
Easier to reason about performance, because aggregation is separate from final computation
Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
Can perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result

To learn more, see the blog post on two-step aggregates.

Functions in this group

uddsketch: Aggregate data in a uddsketch for further calculation of percentile estimates

percentile_agg: Aggregate data in a uddsketch, using some reasonable default values, for further calculation of percentile estimates

approx_percentile: Estimate the value at a given percentile from a uddsketch
approx_percentile_array: Estimate the values for an array of given percentiles from a uddsketch
approx_percentile_rank: Estimate the percentile of a given value from a uddsketch
error: Get the maximum relative error for a uddsketch
mean: Calculate the exact mean from values in a uddsketch
num_vals: Get the number of values contained in a uddsketch

rollup: Roll up multiple uddsketches

Function details

                         
                          uddsketch(
                         
                         
                              size INTEGER,
                         
                         
                              max_error DOUBLE PRECISION,
                         
                         
                              value DOUBLE PRECISION
                         
                         
                          ) RETURNS UddSketch

This is the first step for calculating approximate percentiles with the uddsketch algorithm. Use uddsketch to create an intermediate aggregate from your raw data. This intermediate form can then be used by one or more accessors in this group to compute final results.

Optionally, multiple such intermediate aggregate objects can be combined using rollup() before an accessor is applied.

If you aren't sure what values to set for size and max_error, try using the alternate aggregate function, percentile_agg(). percentile_agg also creates a UddSketch, but it sets some sensible default values for size and max_error that should work for many use cases.

Required arguments

Name	Type	Description
`size`	`INTEGER`	Maximum number of buckets in the `uddsketch`. Providing a larger value here makes it more likely that the aggregate is able to maintain the desired error, but potentially increases the memory usage.
`max_error`	`DOUBLE PRECISION`	The desired maximum relative error of the sketch. The true error may exceed this if too few buckets are provided for the data distribution. You can get the true error using the `error` function.
`value`	`DOUBLE PRECISION`	The column to aggregate for further calculation.

Returns

Column	Type	Description
`uddsketch`	`UddSketch`	A percentile estimator object created to calculate percentiles using the `uddsketch` algorithm

Examples

Given a table called samples, with a column called data, build a uddsketch using the data column. Use a maximum of 100 buckets and a relative error of 0.01:

                           
                            
SELECT uddsketch(100, 0.01, data) FROM samples;

                         
                          percentile_agg(
                         
                         
                            value DOUBLE PRECISION
                         
                         
                          ) RETURNS UddSketch

This is an alternate first step for calculating approximate percentiles. It provides some added convenience by using some sensible defaults to create a UddSketch. Internally, it calls uddsketch with 200 buckets and a maximum error rate of 0.001.

Use percentile_agg to create an intermediate aggregate from your raw data. This intermediate form can then be used by one or more accessors in this group to compute final results.

Optionally, multiple such intermediate aggregate objects can be combined using rollup() before an accessor is applied.

Required arguments

Name	Type	Description
`value`	`DOUBLE PRECISION`	Column of values to aggregate for percentile calculation

Returns

Column	Type	Description
`percentile_agg`	`UddSketch`	A percentile estimator object created to calculate percentiles using the `UddSketch` algorithm

Examples

Create a continuous aggregate that stores percentile aggregate objects. These objects can later be used with multiple accessors for retrospective analysis:

                           
                            
CREATE MATERIALIZED VIEW foo_hourly
                           
                           
                            
WITH (timescaledb.continuous)
                           
                           
                            
AS SELECT
                           
                           
                            
    time_bucket('1 h'::interval, ts) as bucket,
                           
                           
                            
    percentile_agg(value) as pct_agg
                           
                           
                            
FROM foo
                           
                           
                            
GROUP BY 1;

                         
                          approx_percentile(
                         
                         
                            percentile DOUBLE PRECISION,
                         
                         
                            uddsketch  UddSketch
                         
                         
                          ) RETURNS DOUBLE PRECISION

Estimate the approximate value at a percentile from a uddsketch aggregate.

Required arguments

Name	Type	Description
`percentile`	`DOUBLE PRECISION`	The percentile to compute. Must be within the range `[0.0, 1.0]`.
`sketch`	`UddSketch`	The `uddsketch` aggregate.

Returns

Column	Type	Description
`approx_percentile`	`DOUBLE PRECISION`	The estimated value at the requested percentile.

Examples

Estimate the value at the first percentile, given a sample containing the numbers from 0 to 100:

                           
                            
SELECT
                           
                           
                            
  approx_percentile(0.01, uddsketch(data))
                           
                           
                            
FROM generate_series(0, 100) data;

                           
                            
approx_percentile
                           
                           
                            
-------------------
                           
                           
                            
            0.999

                         
                          approx_percentile_array(
                         
                         
                            percentiles DOUBLE PRECISION[],
                         
                         
                            uddsketch  UddSketch
                         
                         
                          ) RETURNS DOUBLE PRECISION[]

Estimate the approximate values of an array of percentiles from a uddsketch aggregate.

Required arguments

Name	Type	Description
`percentiles`	`DOUBLE PRECISION[]`	Array of percentiles to compute. Must be within the range `[0.0, 1.0]`.
`sketch`	`UddSketch`	The `uddsketch` aggregate.

Returns

Column	Type	Description
`approx_percentile_array`	`DOUBLE PRECISION[]`	The estimated values at the requested percentiles.

Examples

Estimate the value at the 90th, 50th, and 20th percentiles, given a sample containing the numbers from 0 to 100:

                           
                            
SELECT
                           
                           
                            
  approx_percentile_array(array[0.9,0.5,0.2], uddsketch(100,0.005,data))
                           
                           
                            
FROM generate_series(0, 100) data;

                           
                            
approx_percentile_array
                           
                           
                            
-------------------
                           
                           
                            
 {90.0,50.0,20.0}

                         
                          approx_percentile_rank(
                         
                         
                            value DOUBLE PRECISION,
                         
                         
                            sketch UddSketch
                         
                         
                          ) RETURNS DOUBLE PRECISION

Estimate the percentile at which a given value would be located.

Required arguments

Name	Type	Description
`value`	`DOUBLE PRECISION`	The value to estimate the percentile of.
`sketch`	`UddSketch`	The `uddsketch` aggregate.

Returns

Column	Type	Description
`approx_percentile_rank`	`DOUBLE PRECISION`	The estimated percentile associated with the provided value.

Examples

Estimate the percentile rank of the value 99, given a sample containing the numbers from 0 to 100:

                           
                            
SELECT
                           
                           
                            
  approx_percentile_rank(99, uddsketch(data))
                           
                           
                            
FROM generate_series(0, 100) data;

                           
                            
approx_percentile_rank
                           
                           
                            
----------------------------
                           
                           
                            
        0.9851485148514851

                         
                          error(
                         
                         
                            sketch UddSketch
                         
                         
                          ) RETURNS DOUBLE PRECISION

Get the maximum relative error of a uddsketch. The correct (non-estimated) percentile falls within the range defined by approx_percentile(sketch) +/- (approx_percentile(sketch) * error(sketch)).

Required arguments

Name	Type	Description
`sketch`	`UddSketch`	The `uddsketch` to determine the error of.

Returns

Column	Type	Description
`error`	`DOUBLE PRECISION`	The maximum relative error of any percentile estimate.

Examples

Calculate the maximum relative error when estimating percentiles using uddsketch:

                           
                            
SELECT error(uddsketch(data))
                           
                           
                            
FROM generate_series(0, 100) data;

                         
                          mean(
                         
                         
                            sketch UddSketch
                         
                         
                          ) RETURNS DOUBLE PRECISION

Calculate the exact mean of the values in a uddsketch. Unlike percentile calculations, the mean calculation is exact. This accessor allows you to calculate the mean alongside percentiles, without needing to create two separate aggregates from the same raw data.

Required arguments

Name	Type	Description
`sketch`	`UddSketch`	The `uddsketch` to extract the mean from.

Returns

Column	Type	Description
`mean`	`DOUBLE PRECISION`	The mean of the values in the `uddsketch`.

Examples

Calculate the mean of the integers from 0 to 100:

                           
                            
SELECT mean(uddsketch(data))
                           
                           
                            
FROM generate_series(0, 100) data;

                         
                          num_vals(
                         
                         
                            sketch UddSketch
                         
                         
                          ) RETURNS DOUBLE PRECISION

Get the number of values contained in a uddsketch. This accessor allows you to calculate a count alongside percentiles, without needing to create two separate aggregates from the same raw data.

Required arguments

Name	Type	Description
`sketch`	`UddSketch`	The `uddsketch` to extract the number of values from.

Returns

Column	Type	Description
`num_vals`	`DOUBLE PRECISION`	The number of values in the `uddsketch`.

Examples

Count the number of integers from 0 to 100:

                           
                            
SELECT num_vals(uddsketch(data))
                           
                           
                            
FROM generate_series(0, 100) data;

                           
                            
num_vals
                           
                           
                            
-----------
                           
                           
                            
    101

                         
                          rollup(
                         
                         
                            sketch UddSketch
                         
                         
                          ) RETURNS UddSketch

Combine multiple intermediate uddsketch aggregates, produced by uddsketch, into a single intermediate uddsketch aggregate. For example, you can use rollup to combine uddsketches from 15-minute buckets into daily buckets.

Required arguments

Name	Type	Description
`sketch`	`UddSketch`	The `uddsketch` aggregates to roll up.

Returns

Column	Type	Description
`rollup`	`UddSketch`	A new `uddsketch` aggregate created by combining the input `uddsketch` aggregates.

Extended examples

Aggregate and roll up percentile data to calculate daily percentiles using `percentile_agg`

Create an hourly continuous aggregate that contains a percentile aggregate:

                    
                     
CREATE MATERIALIZED VIEW foo_hourly
                    
                    
                     
WITH (timescaledb.continuous)
                    
                    
                     
AS SELECT
                    
                    
                     
    time_bucket('1 h'::interval, ts) as bucket,
                    
                    
                     
    percentile_agg(value) as pct_agg
                    
                    
                     
FROM foo
                    
                    
                     
GROUP BY 1;

You can use accessors to query directly from the continuous aggregate for hourly data. You can also roll the hourly data up into daily buckets, then calculate approximate percentiles:

                    
                     
SELECT
                    
                    
                     
    time_bucket('1 day'::interval, bucket) as bucket,
                    
                    
                     
    approx_percentile(0.95, rollup(pct_agg)) as p95,
                    
                    
                     
    approx_percentile(0.99, rollup(pct_agg)) as p99
                    
                    
                     
FROM foo_hourly
                    
                    
                     
GROUP BY 1;

Aggregate and roll up percentile data to calculate daily percentiles using `uddsketch`

Create an hourly continuous aggregate that contains a percentile aggregate:

                    
                     
CREATE MATERIALIZED VIEW foo_hourly
                    
                    
                     
WITH (timescaledb.continuous)
                    
                    
                     
AS SELECT
                    
                    
                     
    time_bucket('1 h'::interval, ts) as bucket,
                    
                    
                     
    uddsketch(value) as uddsketch
                    
                    
                     
FROM foo
                    
                    
                     
GROUP BY 1;

You can use accessors to query directly from the continuous aggregate for hourly data. You can also roll the hourly data up into daily buckets, then calculate approximate percentiles:

                    
                     
SELECT
                    
                    
                     
    time_bucket('1 day'::interval, bucket) as bucket,
                    
                    
                     
    approx_percentile(0.95, rollup(uddsketch)) as p95,
                    
                    
                     
    approx_percentile(0.99, rollup(uddsketch)) as p99
                    
                    
                     
FROM foo_hourly
                    
                    
                     
GROUP BY 1;

uddsketch() and percentile_agg() functions

Related hyperfunction groups

Two-step aggregation

Functions in this group

Aggregate

Alternate aggregate

Accessor

Rollup

Function details

uddsketch()

percentile_agg()

approx_percentile()

approx_percentile_array()

approx_percentile_rank()

error()

mean()

num_vals()

rollup()

Extended examples

Aggregate and roll up percentile data to calculate daily percentiles using `percentile_agg`

Aggregate and roll up percentile data to calculate daily percentiles using `uddsketch`

uddsketch() and percentile_agg() functions

Introduction

Related hyperfunction groups

Two-step aggregation

Functions in this group

Aggregate

Alternate aggregate

Accessor

Rollup

Function details

uddsketch()

percentile_agg()

approx_percentile()

approx_percentile_array()

approx_percentile_rank()

error()

mean()

num_vals()

rollup()

Extended examples

Aggregate and roll up percentile data to calculate daily percentiles using percentile_agg

Aggregate and roll up percentile data to calculate daily percentiles using uddsketch

Related Content

Aggregate and roll up percentile data to calculate daily percentiles using `percentile_agg`

Aggregate and roll up percentile data to calculate daily percentiles using `uddsketch`