In my previous post I wrote about one interesting technique used by my colleague Zoltán Kaszaki-Krsjak.
Categoty_tmp = Table.AddColumn(Buffer, "Category_tmp", each Buffer[Category]{[ID2]}?), // It helps to shift values of column [Category] one row upwards.
It is a trick that you most probably will not use in any of your solutions. However, worth to know how it works and how to do this without adding new columns.
“each Buffer[Category]{[ID2]}?” is just a function, so we can combine it with methods described in one of my posts (Transform Column Using Custom Function).
Here is the code.
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], #"Duplicated Column" = Table.DuplicateColumn(Source, "Category", "Method1"), #"Duplicated Column1" = Table.DuplicateColumn(#"Duplicated Column", "Category", "Method2"), #"Duplicated Column2" = Table.DuplicateColumn(#"Duplicated Column1", "Category", "Method3"), Buffer = Table.Buffer( #"Duplicated Column2" ), Method1 = Table.FromRecords( Table.TransformRows( Buffer, each [Index = [Index], Category = [Category], Method1 = Buffer[Method1]{[Index]}?, Method2 = [Method2], Method3 = [Method3] ] ) ), Method2 = Table.FromRecords( Table.TransformRows( Method1, (row) => Record.TransformFields( row, {"Method2", each Buffer[Method2]{ row[Index] }? } ) ) ), // method offered by Miguel Escobar in comment to previous post Method3 = Table.ReplaceValue(Method2, each [Method3], each Buffer[Method3]{[Index]}?, Replacer.ReplaceValue, {"Method3"}) in Method3
File with methods is here.
But which method is faster?
Let’s make a series of tests with fake data.
I used sample table with 5000, 10000 and 20000 rows, with
- buffering (Table.Buffer) and without.
- With clearing of cache and without.
I cleared cache – no impact
Code that I used for tests
Sub Exec() Call TestSpeed("Query - Method1") Call TestSpeed("Query - Method2") Call TestSpeed("Query - Method3") End Sub Sub TestSpeed(query_name As String) Dim timer_start timer_start = Timer ThisWorkbook.Connections(query_name).OLEDBConnection.Refresh Debug.Print query_name, ThisWorkbook.Sheets("Data").ListObjects("Table1").DataBodyRange.Rows.Count, Timer - timer_start End Sub
What makes sense is order of refreshing.
1-2-3, 1-3-2, 2-3-1 and so on – show different timing
Here are the results
Test conditions: nothing special, just daily usage of laptop in parallel while waiting.
All in all, such “shift of cells” is a very slow operation, I wouldn’t recommend to use is for massive tables.
Used methods are very close in time of execution, maybe in some conditions results can be different.
You may download workbook with queries and macro if you want to check described methods by yourself.
Hi Ivan!
Can you try the next one:
= Table.Join(Buffer[[Index]], “Index”, Table.AddIndexColumn(Buffer[[Category]], “Index2”,0,1), “Index2”)[[Index],[Category]]
Is it slower?
@Maxim
amazing speed, somehow M seems to be SQL under the hood …
Query – Method1 10000 34,89453
Query – Method2 10000 34,96875
Query – Method3 10000 34,99219
Query – Method4 10000 0,03125
Thanks, Maxim! This one is much better, for sure. Just added a “JoinKind” argument
Table.Join(Buffer[[Index]], “Index”,
Table.AddIndexColumn(Buffer[[Category]], “Index2”,0,1),
“Index2”, JoinKind.LeftOuter )[[Index],[Category]]