This document provides a list of ideas to improve the user response in relation with worklist.
Table of Contents
Worklist calculation on large ERP5 production systems is one of the most I/O intensive calculations. Because worklists are calculated at login time, it can lead to a slow login.
What is a worklist¶
A worklist is calculated by counting the number of documents which match certain criteria.
SELECT COUNT(DISTINCT uid)
FROM catalog
WHERE security_uid IN (a list of security UIDs)
AND validation_state = 'draft'
AND whatever else to define a predicate
Each calculation is relatively fast (ex. 0.1 second). But with 100+ worklists in ERP5 and hundreds of concurrent users, worklist calculation can make the SQL backend of ERP5
collapse.
The parameters to this problem are:
- N: the number of users of the ERP5 production instance
- C: the number of items in the user worklist
- P: the number of documents which match the given predicate (ex. state)
- S: the number of documents which match the security group of a user
- G: the number of security groups of a user
Depending on these figures, the approach to increase speed can be different.For this reason, any optimization in ERP5 core should be flexible enough to protect a flexible
declarative approach to define worklists and a flexible optimization approach.
If P or S are small (ex. < 100), then each individual worklist calculation will be extremely fast.
If P or S are big (ex. > 100,000), C is small (ex. 10) and G is big (ex. 1000), the individual worklist calculation can be quite slow.
If N is large (ex. > 1000), there is no hope to be able to precalculate or cache something very easily.
Here are some typical cases:
- N very large, C, P, S, G small: social network
- N small, C, P large, S, G small: web site of a small company wih many documents, accounting site of a small company with many transactions
- N small, C, P large, S, G large: accounting site of a group of companies with many transactions
Idea -1: update a worklist registry at each workflow transition¶
This goes again the principle in ERP5 "never store results of calculation". So, forget it.
Idea 0: calculate worklists for groups¶
The current optimization of worklist calculation is based on the idea of precalculating certain values of worklists for certain groups of users and predicates (ex. draft state
for ZPPB security group). Final worklist calculation is then done in near real time.
Still, this fails in some cases. It does not work apparently for all kinds of predicates. In situations where P, S and G are large, it fails to provide good performance.
Idea 1: calculate worklists in advance¶
Worklists could be calculated in advance.
This is already what we have with a caching approach (5 minutes). It could be improved easily by having a caching policy similar to what is used for web: display cache content
(even old), trigger cache content update.
The main issue is here is thus: which cache content update should be triggered, for which user and when.
If the number of users is very small (< 10), it does not really matter. But if it is large, then this becomes more complex. A good algorithm for predictive cache update could use
the following principles:
- never trigger update for users which never connect
- often trigger update for users which recently connected
- very often trigger update for users which are connected
Idea 2: do not calculate worklists at display time¶
When rendering the front page of ERP5, do not calculate worklist.
Either display an empty menu and let calculation happen later (when user clicks on menu through AJAX). Or display what was previously calculated (non AJAX).
Idea 3: update worklist menu at click time¶
Whenever user clicks on menu, gather most recent worklist.
Idea 4: push worklist updates¶
Whenever calculation is finished, push menu values (using websockets, long polling, hookbox) as for example what happens in facebook (worklists are updated while using the same
page). Change the color of menu to show that something happened.
Idea 5: trigger update at each workflow transition¶
Each time a workflow transition happens, trigger the update of all worlists for relevant users.
The key here is to make sure that not too many updates are triggered. Imagine for example that N=1,000,000. The following tactics can be considered:
- if a user never had a document in this worklist, do not trigger worklist update (quite aggressive)
- if a user never had a document in a worklist of this workflow, do not trigger worklist update (les aggressive)
- only consider users which can view the document (ie. this requires to know which user is in which security group and keep an index)
- only consider users which can view the document and are connected
- only consider users which are connected
Idea 6: use a worklist database¶
Keep a dedicated database or cluster to calculate worklists. It does not matter if it is not in perfect sync with the main database for the catalog. Multiple database can be
used.
Idea 7: guess the new worklist value¶
Instead of triggering a worklist cache update for a given user, try to guess what should be the next value of the worklist. For this:
- calculate the worklist predicates before the workflow transition
- calculate the worklist predicates after the workflow transition
Of course, this will not work for worklists which are based on a date value (ex. all evens which must be processed within 24 hours). So, it can only be considered as a kind of
"guess".
Idea 8: guess update frequency¶
By tracking the result of content cache update, and tracking the "time to click" of users to read the content of a worklist, automatically decide the relevant update frequency
for caches.
- do not calculate often worklists which do not change often
- do not calculate often worklists which the user does not care of
- calculate often worklists which the user "jumps on" when it changes
- calculate often worklists which show a lot of activity
Idea 9: reify worklists¶
This is especially useful for social networks. Generate an acknowledgement document for each document of a worklist and for each user. Then calculate the worklist based on
acknowledgement documents.
This is useful to track whether a user checked or not his worklist.
Old acknowledgements can be erased.
NOTE: acknowledgement documents should be part of portal_acknowledgements, not of event_module
Architecture¶
Appropriate worklist optimization management must be based on a propper API and plugin architecture, a bit like portal_caches.
Related Articles¶