Recently I helped my colleague with a report. The task was to consolidate data from multiple Excel files, each contains 100k-500k rows. Hence, size of each Excel file is relatively big.
I proposed to use xlsb to store those files. One file per month, report needs R24m, so 50 MB is less than 150 MB.
In the end I finished with deep analysis what is better as Excel-files database for Power Query – XLSB or XLSX.
Resume
For PQ solutions use XLSX instead of XLSB
PQ from XLSX twice faster than from XLSB
Engine handling XLSB from time to time return error – cannot read file
“UseHeaders” parameter of Excel.Workbook function works differently for xlsx and xlsb
If sheet and table names are equal in Excel workbook – Navigator generates unique name of object.
If you interested how I revealed all this – continue reading.
Files for tests
I thought that PQ works faster with smaller xlsb files, but I was wrong.
In the same time I knew that some people had problems with xlsb, e.g. issue.
Nevertheless, I started with xlsx with idea in mind to switch to xlsb later.
First surprise for me was that if sheet and table on this sheet have same name – PQ engine renames table
Don’t worry, despite in Navigator we see DATA1, M-code access table object with name DATA.
let
Source = Excel.Workbook(File.Contents(“C:\Temp\PQ Samples\2015 09.xlsx”), null, true),
DATA_Table = Source{[Item=”DATA”,Kind=”Table”]}[Data],
#”Changed Type” = Table.TransformColumnTypes(DATA_Table,{{” Lcurr.”, type text}})
in
#”Changed Type”
Then I saved xlsx file as xlsb
And tried to apply same code
let
Source = Excel.Workbook(File.Contents(“C:\Temp\PQ Samples\2015 09.xlsb”), null, true),
DATA_Table = Source{[Item=”DATA”,Kind=”Table”]}[Data],
#”Changed Type” = Table.TransformColumnTypes(DATA_Table,{{” Lcurr.”, type text}})
in
#”Changed Type”
Got an error and analyzed query step-by-step.
First step – get Excel file structure – similar formula – different result
xlsx | ![]() |
xlsb | ![]() |
Structure is different. We see “DATA” object for xlsb, but it is not clear – is it a table of sheet.
I renamed sheet and checked this step once again
OK – column Name contains sheet names.
Second step – Navigation – didn’t work for xlsb
xlsx | ![]() |
xlsb | ![]() |
Error, as expected, because first step for xlsb doesn’t contain Kind and Item columns.
Not a big problem, just need to fix navigation step.
Nevertheless, this step that doesn’t allow simply switch from xlsx to xlsb.
One click on Table in Data column and we see result:
Now you probably want to apply Table.PromoteHeaders function, but wait, read description of Excel.Workbook function
Second parameter – useHeaders, will help to avoid additional transformation step. Turn it on and query XLSB
Wow, this was not expected.
Be careful with this parameter – some characters will be replaced by # if you work with xlsb.
Dot – definitely, rest special characters – I didn’t test.
Let’s save solution with PromotedHeaders and check speed.
Transform query into function
Let
fGetXlsb = (path as text) =>
let
Source = Excel.Workbook(File.Contents( path ), null, true),
sheet = Source{[Name=”DATA”]}[Data],
#”Promoted Headers” = Table.PromoteHeaders(sheet)
in
#”Promoted Headers”
in
fGetXlsb
Do the same for XLSX and load data to Data Model
xlsx | First try – 43 sec
Next try: 38 sec |
xlsb | Oops, first try gave me
I saw such error several times in my practice of work with XLSB Second try – 84 sec. Next try: 82 sec |
Simple VBA code that check Power Query refresh with data load time
Sub test_time()
Dim start as Single
start = Timer
ThisWorkbook.Connections(“Query – QueryXLSB”).Refresh
Debug.Print “xlsb: “; Timer – start
start = Timer
ThisWorkbook.Connections(“Query – QueryXLSX”).Refresh
Debug.Print “xlsx: “; Timer – start
End Sub
What I also noticed – cosmetic difference in PQ editor
xlsx | PQ editor shows MB
|
xlsb | PQ editor shows ROWS
|
I saw somewhere – “DAX is easy but not simple”, same is relevant to Power Query.
Ivan, excellent post!
I’ve been working on a solution that combines data from several files with macros in them. My first choice was XLSB, because it can contain macros, and it is usually smaller than XLSM.
I ended up switching to XLSM (essentially XLSX with macros support, right?), because of the above-mentioned “External table is not in the expected format”.
Also, occasionally my consolidating file was opening one of the consolidated files! I still have no idea why one of my consolidated files (always the first one alphabetically) was being open.
Thanks for the post!
Hi Daniil, thanks for positive feedback. Advice is simple, if you want to combine data with Power Query, use xlsx or xlsm. Seems PQ works better with these two formats.
However, if you want to continue with VBA, than xlsb is a better option, as it has smaller size and is opening faster.
What can open a file is usually a macro… maybe you have it on Workbook_Open event… Hard to guess.
Good luck with your solution!
Thank you so much for the post! I understand that PQ works better with XLSX and Xlsm. But, what if we use XLSB instead due to fear of the overall size (Xlsb) creates less MBs. What issues and problems can PQ face?
Evan – Great post. Its really useful to know about this different. It saved a lot time wastage due to file format issues.
You are welcome! Glad this old post is still valid 🙂
Thanks for the info.. really helpful
wow, thanks for posting
Greetings sir! hope all is well!
Need your help and advice please.
I have a folder with 14 CSV files; they are 1.4GB in size. I created an excel sheet with power query & PowerPivot that is reading from those 14 files (combine from folder via PQ, to PowerPivot. The size of the excel became 105MB.
Is the size reasonable? Can I make it smaller? How?
Would it make a difference if I saved all 14 files as xlsm? I tried that and total size became 700MB) but the Excel is still 100MB.
Your experience, feedback and help is truly appreciated