Monday, 26 February 2007

Database triggers are bad

Triggers are quite special to relational databases. Things magically happen when you change the data in a table. You can maintain cache tables and check the integrity of data before inserting. They sound great but cause all sorts of problems.

My experience


20 years ago I was part of team that producing an intelligent database where triggers automatically maintained the database, my role was simply to test that everything worked as it should. My issue was that after changing any data in the database I needed to check every table in the database to see what might of been affected. While these triggers might of sounded magical to the customers they were starting to look demonic to me.

Take an ASP program that inserts a record into a table. Testing that this happened is quite simple but how do you check what other tables might of been afected? Obviously just looking at the original insert statement isn't enough, you need to see what triggers may have been called, then check to see if other tables are affected by this trigger and whether in turn these tables have triggers. In the end I just wrote a program to take a snapshot of the database before and after each test and then see what tables were altered. It was a nightmare.

Use stored procedures instead


Don't let your 3GL programs change data directly but instead call a stored procedure to do whatever is required. This way it is obvious what code is called while the activation of a trigger is hidden from the user.

You can easily follow the complete logic of the code being called without needing to search for other hidden triggers.

Getting your 3GL programs to call stored procedures rather than updating the data directly ensures that database changes can happen without needing to rewrite all your 3GL programs.

Limitations of cache tables


Limited reporting: Cache tables are used to store summary data for quick reporting (i.e. monthly trading by branch). The type of reporting from these tables is very limited i.e. you can pick which months and which branches but that's it. So if your users want to see total trading by your female employees then you're out of luck unless that is then built specifically into the cache table.

Duplicate code: So perhaps you offer two types of reporting

  • one on the original trade table offering detailed reporting on recent trades
  • another on the cache table offering historical, but limited, reporting
The problem is you now have two sets of code each offering reporting but with different capablities and performance. Over time these differences will get larger.

Additional batches are now required to check that the cache table and original trade table are perfectly aligned. I know your triggers are supposed to ensure the two cache table holds identical totals but I'll guarantee that over time they'll differ. This means yet more code.

Users get confused when they can get detailed reporting over the last month but are limited to standard reporting before that. You can explain about triggers, cache tables and the amount of disk space saved but if your customer wants to see all his trades over the last year then he's not going to be impressed.


Removing cache tables


If you do all your reporting from the main trade table then you don't need
  • triggers to update the cache tables
  • batches to check cache vs main table
  • 2 sets of reporting programs
If you're removing wasted code and allowing detailed reports for any month then the cost of producing your system go down while its functionality is going up.

Because you have removed all that wasted code then the cost of maintaining the code will be less and bugs are less likely to appear.

Reporting performance is likely to degrade but with proper indexing there is no reason that detailed reporting shouldn't be nearlly instant over 100 million records.

The space require by your system is going to go up if you were totally replacing your older trade data with the cache table. This is unavoidable but seeing as my mp3 player is 40GB then I figure you could probably aford a little more space.

Database locking may raise it's head if we're running large reports of a table while rapidly inserting trades. Proper indexing should be able to avoid locking the active pages but you can allways duplicate the database and report of the duplicate table.

No comments: