This is a short story of how I ruined one of my Power BI Desktop files. Luckily I stored it on SharePoint where I have versioning enabled, so I could restore a working version.
In that Power BI report I used an Excel file as a source and decided to turn it into a parameter for other queries – Binary parameter.
BTW, I faced that issue using Power BI Desktop June 2019 version from MS Store.
Parameter
When you create a parameter in Power BI Desktop, you can choose type Binary, but cannot set a value – fields are greyed out
In my case, I wanted to link a parameter to another query. There is no such option in the UI.
So, I created a parameter and opened it in the Advanced Editor to change type to Binary and replace value with a reference to another query #”SourceFile” .
My Mistake
I wrongly entered parameter’s Type, used “binary” instead of “Binary”. I don’t usually use this type, so didn’t remember the correct spelling, I though it should be in lowcase as other frequently used types “text” or “number”.
And got an error message
I reported an issue and clicked Cancel.
Then by mistake clicked on the Close button which closed window with parameter’s code with no questions
For some reason I decided to “Close and Apply” at certain point.
I didn’t know this will be the end of story for my report.
Soon, I tried to get back to the queries but couldn’t. Yes, I couldn’t open Power Query Editor window anymore – lost access to this area of Power BI Desktop file.
Every time I’m trying to open it I see this error window
Conclusion
Those who watched Chernobyl by HBO, probably remember old good Russian proverb “Trust but verify” (Доверяй, но проверяй) said by the head of KGB in one of the episodes. This applies to your files and the way you work as well. “Trust” that everything will be OK with your files, but “verify” by storing them in the locations with versioning enabled such as OneDrive or SharePoint and don’t forget to save frequently during active development phase.
(url as text) as table =>
let
response = Web.Contents(url, [Headers=[Authorization=#"Authorization"]]),
body = Json.Document(response),
moreavailable = try Logical.From( body[more_available] ) otherwise false,
users = body[users],
#"Converted to Table" = Table.FromList(users, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1",
{"id", "state", "full_name", "job_title", "mugshot_url", "web_url", "activated_at", "stats", "email"},
{"id", "state", "full_name", "job_title", "mugshot_url", "web_url", "activated_at", "stats", "email"}),
#"Expanded stats" = Table.ExpandRecordColumn(#"Expanded Column1", "stats", {"following", "followers"}, {"following", "followers"})
in
#"Expanded stats" meta [ MoreAvailable = moreavailable ]
This function returns a page with list of users (up to 50 items) and some of their attributes available through API. On top of this, response will contain a meta parameter – MoreAvailable, which takes true/false value. This meta flag will help us to stop “do-while” loop.
Pagination
I prefer to use List.Generate function for loops in Power Query, so I ended up with the following code that iterates through pages with Group Members and returns a table with users and their parameters
let
Delay = 1,
GroupID = Text.From( GroupID ),
url_base = "https://www.yammer.com/api/v1/users/in_group/" & GroupID & ".json",
Source = List.Generate(
()=> [
i = 2,
url = url_base,
Page = fGetUsersPage(url_base),
more = true,
last_page = false // change to true when last page reached
],
// arg2 = do while
each [last_page] = false and [i]<100, // hardcoded limit, increase for groups with more than 5000 members
// arg3 - iteration
each
[ i = [i] + 1,
Page = try Function.InvokeAfter(
()=>fGetUsersPage(url_base & "?page=" & Text.From([i])), #duration(0,0,0,Delay)
) otherwise null,
more = try Value.Metadata(Page)[MoreAvailable]? = true otherwise false,
url = url_base & "?page=" & Text.From([i]),
last_page = if not [more] then true else false // [more] in square brackets is a reference to the result of previous iteration
],
// arg4 - output of each iteration
each [[i], [url], [Page], [more], [last_page]]
),
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"i", "url", "Page", "more", "last_page"}, {"i", "url", "Page", "more", "last_page"}),
#"Removed Other Columns" = Table.SelectColumns(#"Expanded Column1",{"Page"}),
#"Expanded Page" = Table.ExpandTableColumn(#"Removed Other Columns", "Page", {"id", "state", "full_name", "job_title", "mugshot_url", "web_url", "activated_at", "following", "followers", "email"}, {"id", "state", "full_name", "job_title", "mugshot_url", "web_url", "activated_at", "following", "followers", "email"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Page",{{"id", Int64.Type}, {"state", type text}, {"full_name", type text}, {"job_title", type text}, {"mugshot_url", type text}, {"web_url", type text}, {"email", type text}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Changed Type",{{"activated_at", type datetimezone}}),
#"Extracted Date" = Table.TransformColumns(#"Changed Type1",{{"activated_at", DateTime.Date, type date}}),
#"Changed Type2" = Table.TransformColumnTypes(#"Extracted Date",{{"following", Int64.Type}, {"followers", Int64.Type}}),
#"Inserted Age" = Table.AddColumn(#"Changed Type2", "Age", each Date.From(DateTime.LocalNow()) - [activated_at], type duration),
#"Inserted Total Years" = Table.AddColumn(#"Inserted Age", "Total Years", each Duration.TotalDays([Age]) / 365, type number),
#"Removed Columns" = Table.RemoveColumns(#"Inserted Total Years",{"Age"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Total Years", "UserYears"}}),
#"Replaced Value" = Table.ReplaceValue(#"Renamed Columns","48x48","100x100",Replacer.ReplaceText,{"mugshot_url"})
in
#"Replaced Value"
Highlights
I’d like to highlight some parts of this code as you might want to re-use these techniques.
1. To add a delay between API queries we can use Function.InvokeAfter
2. Use “try … otherwise …” to suppress potential errors, unless you really want to fail entire refresh process when one of the pages cannot be returned.
3. And the last tip I find really helpful – return all record fields in 4th argument of List.Generate
// arg4 - output of each iteration
each [[i], [url], [Page], [more], [last_page]]
It helps a lot with debugging as I may see steps of List.Generate: page index, URL, returned page table and “loop stopping” parameters (“more” and “last_page” in this case):
You probably know that you can manually set data type for custom column created in Power Query with help of Table.AddColumn function (using 4th argument).
Recently I found a PowerQuery gem, trick with Java/VB Script that allows to export data from Power Query to CSV without R / DAX Studio / SMS and Registration. However, related with risk. As everything else in our life.
Shi proposes to use following code (extended with my comments)
Let
// reference to a query you wish to export to CSV
Source = ReferenceToYourTableOrQuery,
// demote headers to have headers in resulting CSV
// if you don't need headers, remove Table.DemoteHeaders
Json = Text.FromBinary(Json.FromValue(Table.ToRows(Table.DemoteHeaders(Source)))),
// trigger execution of script
Export = Web.Page("
var fso=new ActiveXObject('Scripting.FileSystemObject');
var f1=fso.CreateTextFile('C:/Temp/test.csv',true);
var arr=" & Json & ";
f1.WriteLine(arr.join('\n'));
f1.WriteBlankLines(1);
f1.Close();
")
in Export
All great, but this method doesn’t work with default settings of Internet Explorer.
This is my second post about List.Zip. First one was about general usage of List.Zip, where I touched question of transforming column types in Power Query.
Another scenario where List.Zip can be used – renaming columns in Power Query.
When you rename columns manually, auto-generated function looks like
There is one useful Power Query M function – List.Zip, but with poor documentation on MSDN.
I hope, at some point, library of M functions will be available on Github like it is done for VBA. Power Query enthusiasts then would get a chance to contribute. E.g. from MSDN page of Workbook object we can go to Github and make a pull request for changes.
I plan 2-3 posts about application of List.Zip, this is the first one.
How does List.Zip work
Let’s start from “Help” in Power Query editor, it shows simple sample
Note: to get help on function – type name of function in formula bar and press Enter. Pay attention to register, M is case sensitive.
Having short documentation directly in power query editor is great idea! However, it is hard to show all scenarios with function and keep documentation short. In this particular case, it might be not obvious what happens when we have list of lists with more than 2 elements, or lists with different number of elements, or with more than two lists. (more…)
Just a short post today. I’ve updated my Power Query cheat sheet and created repository on Github so anyone can now contribute.
Good news for Russian-speaking readers. Шпаргалка теперь доступна на русском языке. Ура!
If you want to have this Cheat Sheet in any other language – just translate it and send me a pull request. Too difficult? Then send me an email with translated .docx attached.
Have several ideas for “PQ shortcuts” section, so stay tuned.
Go to Github and get your copy of Power Query cheat sheet.
Usually, I use Power Pivot and VBA in Excel to measure Power Query performance by comparing refresh time.
But, I suppose Power BI Desktop refresh process may be different, therefore would be nice to have something that would allow measure time between start and end of refresh.
Unfortunately, we do not have VBA in Power BI Desktop, nor can trigger and monitor refresh of Power BI Desktop from another application. So, we have only M and queries.
For the experiment I’ve created a new Power BI Desktop file, three queries in it, and put queries in order Start-Delay-End
The idea is naive – hope that Power BI Desktop will execute queries in the same order as in the list of queries.
If so – query “Start” will load start time, Delay will make a pause, then “End” will load time of the end of refresh.
However, by default, Power BI Desktop loads tables in parallel, to optimize load time.
This property can be found in Options -> Current File ( Data Load ) -> Enable parallel loading of tables
This really works. I was receiving same time for Start and End tables while this property was enabled.
When I disabled it, I finally got desired difference between Start and End
Then I changed order of queries in the list of queries to check whether it impacts on execution order
Result shows that “Yes – order does matter“:
Of course, Power Query engine must be generating Execution plan when user press Refresh and then follows it. But in simple scenarios, when “Parallel loading of tables” is disabled, seems like Power BI Desktop follows order of queries from Query Editor.
I have no groups of queries, have no references between queries etc. It allowed me to check load time.
What about complex models
I tried to use same technique in more complex file – with groups of queries (but with no references).
I created query “Start” and placed it into the first group.
Similar for “End” query – but in the last group.
Result seems correct
Conclusion
Above is, of course, not a serious solution. Mainly, because you won’t want to disable parallel loading of tables, and won’t rely on order or queries.
Nevertheless, it would be good to have total refresh time directly in the model. It would allow to monitor refresh time of growing datasets.
More sophisticated way of query engine processing analysis is hidden in diagnostics of trace logs. You may read about this in several posts from Chris Webb here, here, and here.