Saturday, January 3, 2009

Star Schemas - Miles to the gallon.. How far can you go ?

SAP BI has been implemented in a lot of customer locations and many of them have matured data models and have carefully built their data warehouses which slowly but steadily inch towards the multi terabyte mark and start filling up with data ... valid data that :

1. Can be used for data mining
2. Actively used by a lot of users for day to day reporting
3. Business critical functions are built on the reports that are served out of SAP BI

now it is a picture of everyone happy ... but then problems start cropping up in the backend....

1. Reports start slowing down because of too much data
2. Data models cannot be changed overnight and lead to more complicated multiproviders and reports to serve complex reports
3. User requirements start becoming more demanding and simple row and column reports are no longer in tune....
4. Reports start having more RKFs than direct Key Figures

Some of them are innate to the star schema itself.. some of which are ...

Brittleness - Changing a data model is something like changing a glass mould..... you will have to basically redo it in terms of melting and recasting the same to redo it some other way - there is no simple accessory way out. Even though you have tools like remodeling the process is still the same even though it happens out of sight to you....

A change of data model would mean

  • Lot of synchronized transports
  • Loads of testing
  • Downtime for reports


Models are not fully extendible - Multiproviders , Infosets , Navigation attributes etc can take you only so far - more like buying time fully knowing that these will not suffice in the long term.

Changes in view are not permitted : The models are built with something in mind, you set out to build a cathedral and two years hence , to look at it as a chapel ...? In many cases the person who envisioned the architecture in the first place is not there after two years and the people who come next might view it as a series of mistakes rather than a vision that has not got completed , leading to deviations and changes which might lead to mixed results. Also it is not possible for one person to view the EDW as the person before viewed it. Also the EDW cannot satisfy everyone.

Multiple views of the same data : One cube for Query A , One cube for Query B.. and so on , you end up with a lot of cubes and a lot of redundant data. Often a lot of Navigational attributes for one query alone and dedicating a lot of resources like Disk Space , CPU , App Servers ( yes it is possible ) for a particular set of reports... And then there is the issue of data reconciliation and disk space by way of cubes and aggregates etc...

Changing granularities : Cross functional reports would mean having data at the same granularities. This will lead to creating more summarized cubes for report sake and different cubes at different granularities. Multiproviders can do the trick but then performance is something to worry about. Leading to building more relational data models.


Star schemas are not very good in the long run for a data warehouse. When there are a lot of data and a lot of users, with a lot of diversity, the only way to make the star schema approach work is to make sure the data is non-redundant and model it at the lowest level of granularity.

Even then, when the requirements change, the star schema may still have to be revised or replaced. However, in an environment in which there is little diversity in the way users perceive data, if the requirements do not change over time, and if there aren’t many users, then it is possible that a star schema might serve as a basis for a data warehouse.

These are somethings I have observed across some implementations and again these need not be the order of the day. Just wanted to highlight some things I felt would affect EDWs in the long range of time.

I observed some of the points mentioned in a book by William Inmom , Derek Strauss and Genia Neushloss. Called DW 2.0 - The Architecture for the Next Generation of Data Warehousing. Looked at how the points mentioned against star schema stacked up against the extended star schema of SAP BI.

More of an attempt to draw parallels between traditional data warehousing and SAP BI.

No comments:

Post a Comment