Big data Odoo optimization
Big data Odoo optimization is a very serious issue indeed. Because Odoo can be provided to big companies, this also implies that they will use the system extensively and that goes in pairs with a very fast and sustained data growth. Obviously, the most impacted standard tables will be the ones we call the “_lines” tables such as: sales_order_line ; account_move_line ; stock_move_line etc….etc….
To evaluate the impact of generic data growth on these tables, let’s imagine a standard company with 10 branches and 2 accounting persons validating 100 invoices each per open day. Let’s be fair and set the average number of sales_invoice_line per invoice at 10 items per invoice. The results would be the following:
- 100 invoices X 10 items per invoice = 1000 invoice lines.
- 1000 invoice lines will generate 1000 product lines + 100 tax lines + 100 total invoice amount lines inside the Sales Journal if we configured the latter to write account_moves PER PRODUCT, which is the only good way of pulling later good Dashboarding values per account or per product category etc…etc…. Anyways, this means 1200 account_move_lines per user per day.
- Multiplied by 2 users, this gives us 2400 account_move_line entries per branch per day.
- Multiplied by 10 branches, this gives us 24 000 account_move_line entries per company per day.
- Multiplied by 25 open days, we get 24 000 X 25 = 600 000 account_move_lines per month.
- Yearly, we get 600 000 x 12 = 7 200 000
Beyond 3 million lines, you will start feeling an important impact on performance.
Beyond 5 to 6 million lines, you will be heavily penalized. Of course, this depends on your hardware, disk technology, CPU availibility, queuing strategy….. but all in all, in our present case, there are more bad news coming:
- 7 200 000 account_move_line entries stands for purely validated invoices….. but this means that the sales_order_lines table can be 2 to 10 times bigger within the same period because it also contains quotations. Let’s be fair and assess you convert 30% of your quotations into sales: your sales_order_line table will be about 3 times bigger: that’s a rough 21 000 000 entries per year we are talking about.
- You have just as much coming into stock move lines tables.
- I’m not even talking about MRP which can make even bigger tables along with lot numbers etc… if this client is doing production as well.
- I’m assuming this is a MONO COMPANY system …. otherwise, you will end up with data from different companies inside the same tables… Picture this multiplied by 3 companies inside the same tables!
- I’m assuming you have a dedicated server and dedicated resources per company…. but it is not always the case since many people mutualize hosting….etc…
Now, is we consider you have around 30 users accessing these basic and most important tables simultaneously and a couple are trying to do Dashboarding while the rest try to work….well…. Forget it! Go buy yourself some coffee or subscrive to Zen classes.
Measuring the performance gap
Many of our clients including IT DEPARTMENTS, other Odoo Partners or Independant developpers have asked us to optimize their implementations because the system was slowing down. I spent some time playing with the following in order to give you a true insight on Odoo Big data odoo optimization. My test kit consisted in the following:
- 1 XEON E5 – 6 cores CPU 2Ghz
- 8 Gigs of RAM
- SSD Hard Drive 256 Gb
- Ubuntu 16.04
- PostgreSQL 9.5
- Test done with and without PgBouncer
- Fake data generated for the occasion inside the account_move / account_move_line tables
Here are the results of my measuring the following:
- Selecting all Journal Items from All journals
- Grouped by period
- Effective date greater than…
- Effective date lesser than…
- With total SUM calculated for Credit / Debit columns at the bottom
From the above, one can clearly understand that:
- Indexing helps but is not enough in the long run
- Partitioning works a lot better
- Coupling Partitioning and indexing works even better
- Coupling Partitioning and indexing and associating them with PgBouncer gives the best results.
For 1 000 000 000 rows, difference between standard out of the box PostgreSQL Odoo structure and Optimized with Partitioning, indexing and PgBouncer drops from 30 000 ms (30 seconds) to less than 3 000 ms (3 seconds). Yet, keep in mnid this is only one request from one single user. Put this in perspective with 30 or more concurrent users and you understand this is the difference between a system that runs and one that just systematically times out and causes error until it ultimately crashes.
If you are sceptical about the data I collected above for Big data Odoo optimization, please, take a look at this post: PostgreSQL: Partitioned Table vs Non Partitioned Table
This is not dedicated to Odoo and deals with simpler tables inside PostgreSQL but the approach is very straightforward and accurate… and conclusions are quite identical.
Implementing your Big data Odoo optimization strategy
If you are reading this post, it means that you either consider providing big data solutions with Odoo or PosgreSQL as a base or that you are already experiencing slowness.
Be aware that slowness can come from different reasons:
- You left your PostgreSQL configured by default setup. This is a huge mistake… and here are a few clues to get you started on basic postgreSQL setup
- You have a session queuing and HBA auth issue or both: lean to use PgBouncer
If your case if none of these two applies, then you probably need to step into Indexing and Partitionning indeed and my recommendations are the following:
- Beware that indexing can become expensive in resource, especialy if you play with strings values.
- Putting Indexing everywhere is a WRONG approach. Only put it where you know the ORM and your module are prone to push and pull values from.
- Define a partitioning strategy from the start and don’t wait until your tables get huge. You KNOW your client’s profile, turnover, users and this MUST be anticipated.
- Your partitioning strategy must follow data growth orientations. Every client is different. This is always a “per case” process.
- Apply partitioning table-structure logic that follows the triggers’ logic.
- Beware of wrong-defined trigger rules that can cause data-overlap or infinite loops.
- Partitioning Rules work… but from experience, Triggers just work BETTER with Odoo. So, just stick to triggers
- Be gentle with Triggers. If you need complexity, put it into the function instead.
- Stretch your child-partitions according to the Odoo or Data pattern.
- Use Constraint Exclusions to optimize your query speed. To be able to do so, remember to structure your partitionning according to these too.
If you already have a slow system and need to resegment your data through a Big data Odoo optimization partitioning process, you should follow the below guide-lines:
- Apply the above recommentations too.
- Remember Odoo does not know you are modifying data structure inside PostgreSQL. Doing it the right way implies that you preserve all parent-table names, columns, data format and inter-table constraints.
- Checksum your data before and after resegmentation / partitioning. YES, it is basic… but you need to be sure everything is there.
- Remember to aim for the future as well. Splitting data per year for past-data for example but not splitting it for the years to come is a BIG mistake. Remember PostgreSQL will go as slow as the slowest requests of all. So, if you consider splitting nicely the pasts 5 heavy completed years and then let a big mess as an active table for all the rest that keeps growing, you solved absolutely nothing… and only made things messy. Performance will be degraded just as much.
Last but not least, if you feel this Big data Odoo optimization is too hot and too specialized, then you’d better have this done and setup by people who have this experience in both Odoo and PostgreSQL. One single mistake and the entire Odoo would be ruined forever.