Part 1 – Intro and Storage
After having the argument over and over with other guys in IT I decided to write down some notes on how the Force.com DB is “different”. This is not an Introduction to the Force.com Database. That article is already written. And it’s not a deep dive into how the data is defined and stored. Check out that section (starting on page 5) of the whitepaper on the Force.com Multitentant Architecture. I’m talking about the points I find difficult to explain to the people familiar with traditional databases like Oracle, MySQL, and even DB2.
Disclaimer: I’m going to make some statements here about the Force.com database as I understand it, so bear with me. Let’s get the heavy one out of the way first.
“Salesforce.com doesn’t store data the way traditional databases do.” Or at least they don’t count the space used like everyone else does. We all learned that to figure out how much space you need you multiply the number of bytes in a row times the number of rows. Then you add some factor, say 20%, for indexes. You repeat this for each table in your database and when you’re done you know how much storage to ask the DBAs for. I’m purposely leaving out things like future growth and archiving to keep it simple.
So how does Salesforce do it? They’ve create a layer on top of their actual DB ,which I dub “the magic DB layer”, which stores each row as 2k. That by itself has a huge impact when you’re trying to normalize your data model. Check out Database.com’s pricing and you’ll see they’re charging by number of records and transactions.
Let’s take an example. If you’re keeping track of customer information and you know every customer will have two phone numbers, one home and one mobile, then you would likely create two tables (sObjects in Force.com): Customer and Phone. On the Phone object you would add a reference to the Customer. This is the traditional parent-child relationship and is future ready when the requirement comes along for additional phone numbers. Now if you insert a single customer (1 record) and their home and mobile numbers (2 records), you now have 3 records and have taken up 6k. That may not seem like a lot, but when you have 4.2 million customers (as a certain company I’m familiar with does), that’s now 25.2 GB of data. But we can flatten that out by adding the home and mobile phone to the Customer object which eliminates the separate object and brings our usage back down to 8.4 GB.
Wow, that’s some savings, but what happens when the requirement comes along to add another phone per customer for fax number? You would just add the fax number as another field on the Customer object and that doesn’t require any additional storage usage. How? The answer to that is, “I don’t care. Salesforce.com takes care of that.” Had I used the normalized model with a separate object for Phone I would have added (assuming every customer has a fax number) another 8.4 GB of usage.
What else have you impacted by de-normalizing? What about the UI? There has to be an impact there because now you need to make that field value available to the users. That’s ok, though, because in my example I’m using the object’s standard page layouts (I kept that a secret), and I can just update the page layout and add the fax number to the customer page.
My point with this is not to avoid normalization, just to consider the impact. Your storage capacity on Salesforce.com is based on your Salesforce.com Edition and number of licensed users. For most editions each user gets 20MB (approximately 10,000 records), and if you need to store more than your capacity you may find yourself calling up your account rep trying to negotiate additional users or storage.
Finally, the example I used was just to get the point across. There are standard objects like Account and Contact that already have those phone fields flattened on them.