katratxo on Software Development

tail -f /var/log/brain | grep -i software

Archive for September 2009

Learning from the big ones

leave a comment »

On September 1st Gmail had a service outage, you can see it in the service availability page [1], after they fixed the problem they made blog post about it, what happened, next steps [2]. I’ll like to quote one sentence of that post, and please keep that idea, I’ll come back later to that.

(we take monitoring very seriously)

Taking this as an example of transparency here you have what happened with 2.50MP4.

On August 31st we published 2.50MP4 everything was uploaded and ready in souceforge.net, that same day Asier Zabaleta (part of Openbravo’s consulting team) asked in the #openbravo IRC channel early access to the 2.50MP4 obx file, to test the upgrade process; you can see the our conversation on the IRC logs [4]. He tried to update his 2.50MP3 instance using this obx and the Module Management Console (MMC) that night, and he was the first one that realized the problem, you can see his comments the next day [5].

On September 1st the Product Development team was aware of the problem, and we deleted everything related to 2.50MP4 from sourceforge.net, even when the virtual appliances, source code, and the rest of deliverable were just fine. In fact if you have a fresh new install of 2.50MP4 you won’t get any problem. This day we found that everything was fine and just the upgrade process made using the MMC could fail, so on September 2nd we publish (re-upload) 2.50MP4 with a know issue [3], you cannot upgrade your instance using an Core’s obx file through the MMC and hat’s why we started the release of 2.50MP5 just after releasing MP4.

Want technical details? Here’s what happened:

The problem

Part of the 2.50MP4 is a core enhancement to be able to add extension points to PL/SQL procedures, this enhancement added a new database dependent function ad_get_rdbms() e.g. [6] that is used on those extension points.

Usually DBSourceManager takes care of translating all your PL/SQL code, so you don’t need to worry about it, but when you have database dependent functions (this means the is not the same definition in PostgreSQL or Oracle) we use the Prescript and Postscript files. This files that contains SQL/DDL sentences for an specific database and makes all the necessary operations in your database to be able to use Openbravo.

The ad_get_rdbms() function was declared in the in the Prescript for each database, this function only returns, ‘POSTGRE’ when using PostgreSQL and ‘ORACLE’ when using Oracle, it’s simple.

DBSourceManager has a list of all available database dependent functions on your database, when create a new function and you place it in the Prescript, you must inform DBSourceManager with a filter [7] that your new function must be untouched while performing an update.database, if not you’ll loose it in the process that’s what happened when upgrading to 2.50MP4.

A further technical explanation

Because we use DBSourceManager to update your database when you install a module, the OpenbravoExcludeFilter class is loaded and cached in Tomcat. When you try to update Openbravo using an obx Tomcat uses the old cached class, and modifications to this class coming in the new version of DBSourceManager are ignored.

In the case of the upgrade from 2.50MP3 to 2.50MP4 the OpenbravoExcludeFilter didn’t know anything about the ad_get_rdbms() function, so the prescript added the function, and some steps after, the update process removed the function because was not part of the filtered functions.

Why didn’t the QA guys got the problem?

While testing the upgrade from 2.50MP3 to 2.50MP4 they had some issue (i don’t know what) that force them to do something command line. When they made this action in command line they updated the version of DBSourceManager so when they tested the upgrade process through the MMC everything went smooth.

We take the upgrade process very seriously

We want, and we are working on making the upgrade process painless. In previous versions of Openbravo ERP the upgrade process was made by merging code, so everybody hesitates every time we release Maintenance Pack, now in 2.50 with Modularity the process is simpler since the Core module is untouchable (you can always customize it with a template) you can upgrade without being worried about what’s coming.

What’s Next:

We fixed the database function issue and start to releasing 2.50MP5.

We are working and testing several approaches to solve the OpenbravoExcludeFilter class cache problem, including: making the functions filter a properties file, and others. But we are trying very hard on making that the update database process reload all the classes of DBSourceManager.

We’ll keep you posted about this and the way we solved the problem and remember that you can upgrade your instance to 2.50MP4 through our SCM (Mercurial) [8].

You need to know also that you can work safely with 2.40MP4. 2.50MP5 will just include the fix of this database function issue.

2.50MP5 has been released [9] and a fix the the main issue has been pushed to PI repository [10].

[1] http://www.google.com/appsstatus#di=1&ddo=2&hl=en
[2] http://gmailblog.blogspot.com/2009/09/more-on-todays-gmail-issue.html
[3] https://issues.openbravo.com/view.php?id=10444
[4] http://irc.openbravo.com/logs/openbravo/2009.08.31.log
[5] http://irc.openbravo.com/logs/openbravo/2009.09.01.log
[6] https://code.openbravo.com/erp/devel/pi/diff/e75e87a75552/src-db/database/model/prescript-PostgreSql.sql
[7] https://code.openbravo.com/erp/devel/pi/file/tip/src-db/src/com/openbravo/db/OpenbravoExcludeFilter.java
[8] https://code.openbravo.com/erp/devel/main
[9] http://forge.openbravo.com/plugins/espnews/browse.php?group_id=100&news_id=151
[10] https://code.openbravo.com/erp/devel/pi/rev/20e3a339d7df


Written by katratxo

September 4, 2009 at 6:11 pm

Posted in Openbravo

Tagged with , ,