PostgreSQL performance: UUID vs. VARCHAR

ancoron · June 21, 2017, 12:38am

Hi,

I am using the Camunda BPM in a “clustered” environment using multiple execution nodes (process engine + job executor + BPMN/DMN deployer inside Apache Karaf) and by that - of course - use a UUID generator for providing identifiers.

My database of choice is PostgreSQL in a replicated high-availability (and read-scalability) setup using pgpool2. I recently did an upgrade from (almost ancient) version 7.4.0 to 7.7.0.

Main issue for me here: executing the SQL upgrade scripts took more than 4 hours. The main time has been consumed by creating new indexes on history data.

Looking a bit closer I see that all identifier fields are created as VARCHAR, which are making things much slower than they could be. If I could use the UUID data type of PostgreSQL then the identifier columns would just consume a fixed 16 bytes which are much less in size and will fit into the “plain” row storage area which is much more efficient when scanning than the “extended” area, where all variable-length columns are stored.

Read more on this matter here: PostgreSQL: Documentation: 15: 73.2. TOAST

In addition, column ordering does have an impact: database design - Calculating and saving space in PostgreSQL - Stack Overflow

Question for me now is: is there any reason (other than historical) to stick with simple integer ID’s converted to String?

I mean you could easily switch to UUID’s completely (drop the default hi-lo generator) and at that time optimize the DDL for the different databases to store it most efficiently (especially the ones that do not know what a UUID is). In addition, if the Java code would be adapted as well to use UUID instead of String, the memory footprint would go down a bit as well and benchmarks would be faster anway due to space and CPU-cycle savings in all layers - memory, indexes, storage.

thorben · June 21, 2017, 7:19am

That’s an interesting finding. If we were to write a new engine, your proposition may be a better design choice than using Strings and varchar columns. However any changes to Camunda 7 need to ensure that the following things are not violated:

Backwards compatibility of public API, e.g. we can’t simply change the return type of Task#getId from String to another class.
Backwards compatibility of the database schema for the rolling update use case. That means, a 7.X engine must be able to work on a 7.Y database schema, where X < Y. I have the feeling that this gets violated when we change the type of id columns.

So my current thoughts are that this cannot be easily changed.

Cheers,
Thorben

ancoron · June 21, 2017, 9:43pm

I totally agree with you on the semantic versioning part. Nevertheless, this could be an option for a version 8.0, no?

I also recently tried to delete individual (obsolete) process definitions including their historic data using the cockpit UI and/or the ReST API only to run into timeouts at the JDBC level (which I configured because I don’t want any thread to get stuck for more than 5 minutes.

Then I wrote myself a simple SQL history cleanup script, executed by a Jenkins with timestamp logging enabled. An excerpt from a first execution (basically carried out before the upgrade to 7.7):

2017-06-21T01:54:42+0200 CREATE MATERIALIZED VIEW v_cleanup_procinst AS (
2017-06-21T01:54:42+0200     SELECT
2017-06-21T01:54:42+0200         id_
2017-06-21T01:54:42+0200     FROM
2017-06-21T01:54:42+0200         act_hi_procinst
2017-06-21T01:54:42+0200     WHERE
2017-06-21T01:54:42+0200         end_time_ < (CURRENT_TIMESTAMP - '6 months'::interval)
2017-06-21T01:54:42+0200 );
2017-06-21T01:54:47+0200 SELECT 634546
...
2017-06-21T02:33:55+0200 DELETE FROM
2017-06-21T02:33:55+0200     act_hi_actinst
2017-06-21T02:33:55+0200 WHERE
2017-06-21T02:33:55+0200     proc_inst_id_ IN (
2017-06-21T02:33:55+0200         SELECT
2017-06-21T02:33:55+0200             id_
2017-06-21T02:33:55+0200         FROM
2017-06-21T02:33:55+0200             v_cleanup_procinst
2017-06-21T02:33:55+0200     )
2017-06-21T02:33:55+0200 ;
2017-06-21T03:49:50+0200 DELETE 9040134

Deleting ~9M rows took more than 1h 15m!!!

Of course, there are ways to optimize that in PostgreSQL (and the system is a pretty slow one in terms of I/O) but it shows the underlying problem quite good.

The new history TTL comes to rescue here. Really nice to see that it also has accounted for the basic business critical aspects like the batch window and batch size.

Nevertheless, I somehow get the impression that the DDL has a lot of room for improvement (at least for PostgreSQL).

Cheers,

ancoron

ancoron · September 23, 2018, 10:37am

Coming back to this topic I have actually failed to re-write the Java part so far, because the interface/class hierarchy actually prevents the attribute id being converted from String to UUID in only entity classes.

I am now concentrated on extending the MyBatis templates and DDL SQL only, so that I can at least use a more appropriate data structure in PostgreSQL.

The main reason to do this is because of table and index bloat in PostgreSQL, which happens with the UUID’s as varchar becoming treated like text and looking like random even for the type 1 UUID’s, which semantically are time and space but that semantic is lost with varchar. Historic cleanups don’t help much in this regard.

I’ll post back when I have some results here.