Page Content

I implemented an optimisation for default worklist system. I will describe here its rationale.

Table of Contents

Problem
Ideas
Implementation
Worklist caching
Related Articles

Problem¶

The worklist system get slower as the number of existing worklist increases. This is because each worklist issue an independant SQL query.

Ideas¶

By looking at the queries, one will notice that they are very similar: all filter on one or more portal types, some filter on simulation_state, others on validation_state, and all have a security-related part. This brings up the idea of factorizing queries: do all lookups for compatible worklists in one query. Compatible worklists are worklists which filter on the same criterions: (portal_type, simulation_state) is incompatible with (portal_type, validation _state), and also incompatible with (portal_type, simulation_state, security_uid).

Implementation¶

As query generation was done at worklist object level, a new method extracting interesting information is added to allow offloading this job at workflow tool level.

Interesting information extracted from worklists¶

Now that everything we need to know at workflow tool level can be extracted from each worklist, the merging mechanism can be implemented. (TODO)

First step: Grouping worklists¶

For each worklist, get the criterions (the catalog columns they use) to find all the existing combinations. For each possible combination, list each matching worklists with their criterions and criterion values. In the process, strip criterions which cannot be grabbed by SQL and warn about them: they are configuration errors.

The following 3 steps are executed for each entry of the returned list.

Second step: Query generation¶

Now that we have a list of worklists with common criterions, we can generate an SQL query. If the worklists look like:


    Validated Foos:
      validation_state = 'validated'
      portal_type = 'Foo'
    Draft Bars:
      validation_state = 'draft'
      portal_type = 'Bar'
    Validated Bars:
      validation_state = 'validated'
      portal_type = 'Bar'
    Validated Baz:
      validation_state = 'validated'
      portal_type = 'Baz'

The resulting query will look like:


    SELECT
      COUNT(*), validation_state, portal_type
    FROM
      catalog
    WHERE
      (validation_state = 'validated' AND portal_type IN ('Foo', 'Bar', 'Baz'))
      OR
      (validation_state = 'draft' AND portal_type = 'Bar')

This is done by estimating the number of possible value for each criterion, and sorting the list of criterions with the lowest number of possible value first: here, by using 'validation_state' criterion before 'portal_type', we use one 'OR', two 'AND', three '=' and one 'IN'. Otherwise, it would be like:


    SELECT
      COUNT(*), validation_state, portal_type
    FROM
      catalog
    WHERE
      (portal_type = 'Foo' AND validation_state = 'validated')
      OR
      (portal_type = 'Bar' AND validation_state IN ('validated', 'draft'))
      OR
      (portal_type = 'Baz' AND validation_state = 'validated')

This contains two 'OR', three 'AND', five '=' and one 'IN'. It is also longer, character-wise. All in all, it's worse than the sorted version.

Third step: Query execution¶

Not much to be said here. Note that as we use portal_catalog to execute the query, it adds some overhead at SQL level: it selects useless columns, does a useless "DISTINCT". These have shown of neglectible cost in measured cases. You mileage should not vary. If it does, feel free to improve this section and please send an example on the development mailing list.

Fourth step: Query result dispatching¶

Query result is composite: it has been generated for multiple worklists, and must be distribued back to them. It is done by generating a dictionnary providing for every criterion value combination the worklist(s) "interested" in the object count. Once this dictionnary is built, the result set is scanned once. For each line the value combination forms the dict key, and the object count is added to worklist's object count.

Fifth step: Action entry generation¶

Once the object count for all worklists is known, actions must be generated to be returned to caller. There actions are generated using worklist metadata and the computed object count.

Worklist caching¶

SQL table cache¶

Stores pre-computed and "anonimized" (grouped by all security criterions available to catalog, such as security_uid and owner) object counts in a table periodicaly, and fetch data from this table instead of counting object in catalog at worklist generation time. This is very efficient, but requires a special update mechanism.

CachingMethod cache¶

Stores action provider method result into a CachingMethod cache. Efficiency unknown.

HTML render cache¶

Stores already rendered HTML into a CachingMethod cache. This is very efficient since worklists can take around the half of the page source size. But it is incidentaly believed to be quite memory-consuming.

Most Powerful Open Source ERP

Technical Note on Worklists Optimisation