I bet over the Internet can be found number of posts with comparison of workbook size with different formats. However, nothing is better than own data :).
Assume that we have an Excel file with quite big amount of data
~250k rows
As we already know XLSB format makes file size much smaller than XLSX
But what if we load data into Data Model of empty workbook – will it be smaller than xlsb?
To check this I created new Excel workbook, went to PowerPivot tab and opened Data Model window
I need import from Excel
It is stored under “From Other Source” – scroll down to the end
Browse for my source file (I took xlsx as a source)
On the next step I can make “Preview and Filter” to exclude unnecessary data.
You may see checkboxes and usual dropdown filter buttons on columns – like in Excel I can select what I need in filter menu
For this “size challenge” I should take everything. So simply press “Finish” and wait while PowerPivot imports data.
After few seconds we may see result. All rows were imported – no errors.
Result in PowerPivot window
Save this file as xlsx and xlsb
Champion still is XLSB with simple table. PP version size is 11% greater.
And we may see minimal difference between PP versions in different formats. xlsx and xlsb have almost the same size.
And what interesting – xlsb is slightly bigger than xlsx in this case.
Let’s make one more test. Delete text column from initial file. It contains comments for transactions.
This column contains huge number of unique values, in addition it is text – the most heaviest data type.
I repeated same actions with file without text column and result is opposite
PP version has 11% less size than initial “xlsb without text column” and 4.5 times less than initial Without text column.xlsx.
Thoughts that come in my mind after all
Choice where to store data, in Excel table or in PowerPivot, depends on data. There are pros and cons. In the end, PowerPivot don’t lose. Usually data models contain more numbers and less text.
Work with 250k rows in Excel so slow that close to impossible, several SUMIFs will raise a “lack of memory error”. Instead of SUMIFS PivotTables can be used and then GETPIVOTDATA function… All of this currently “old style” modelling that helped us a lot before PowerPivot came – should stay in the past.
What if we want analyze R12M and take 12 files? We will have >1M rows, that is not supported by Excel sheets, BUT PowerPivot supports and still has quick performance of calculation.
Happy modelling in 2016!