Yammer Analytics in Power BI Part 3

List.Generate Power Query

This is the third post in the series “Yammer Analytics in Power BI”

1 – Intro to Yammer Analytics in Power BI

2 – Paging through Yammer group messages

3 – Paging through Yammer group members

4 – Paging through Yammer user messages

Group Members

Querying group users (members) is a relatively simple task as we have an API for that

https://developer.yammer.com/docs/usersin_groupidjson

Yammer Users in Group API parameters

Note that API returns only 50 users per page, meaning we have to build a loop through unknown number of pages.

Basic Patterns

For getting users from Yammer group we can use following pattern:
https://www.yammer.com/api/v1/users/in_group/[groupid].json

or to get response in XML format
https://www.yammer.com/api/v1/users/in_group/[groupid].xml

For paging we need to add “page=N” as a query option
https://www.yammer.com/api/v1/users/in_group/[groupid].json?page=1

Using paging is as important as breathing, because as we may see from documentation, Yammer returns only 50 records per query.

It is not said in the documentation but API returns [more_available] tag

So we can use it to exit loop.

First Step

Let’s firstly build a “response catcher”. A function that will transform web response into a meaningful form.

We will be providing URLs with a following pattern

url_base = "https://www.yammer.com/api/v1/users/in_group/" & GroupID & ".json"

as an argument to a function fGetUsersPage:

(url as text) as table =>
let    
    response = Web.Contents(url, [Headers=[Authorization=#"Authorization"]]),
    body = Json.Document(response),
    moreavailable = try Logical.From( body[more_available] ) otherwise false,
    users = body[users],
    #"Converted to Table" = Table.FromList(users, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", 
        {"id", "state", "full_name", "job_title", "mugshot_url", "web_url", "activated_at", "stats", "email"}, 
        {"id", "state", "full_name", "job_title", "mugshot_url", "web_url", "activated_at", "stats", "email"}),
    #"Expanded stats" = Table.ExpandRecordColumn(#"Expanded Column1", "stats", {"following", "followers"}, {"following", "followers"})
in
    #"Expanded stats" meta [ MoreAvailable = moreavailable ]

This function returns a page with list of users (up to 50 items) and some of their attributes available through API. On top of this, response will contain a meta parameter – MoreAvailable, which takes true/false value. This meta flag will help us to stop “do-while” loop.

Pagination

I prefer to use List.Generate function for loops in Power Query, so I ended up with the following code that iterates through pages with Group Members and returns a table with users and their parameters

let
    Delay = 1,
    GroupID = Text.From( GroupID ),
    url_base = "https://www.yammer.com/api/v1/users/in_group/" & GroupID & ".json",
    Source = List.Generate(
            ()=> [
                i = 2,
                url = url_base,
                Page = fGetUsersPage(url_base),
                more = true,
                last_page = false // change to true when last page reached
            ],
            // arg2 = do while
            each [last_page] = false and [i]<100, // hardcoded limit, increase for groups with more than 5000 members
            // arg3 - iteration
            each 
            [ i = [i] + 1,
            Page = try Function.InvokeAfter(
                            ()=>fGetUsersPage(url_base & "?page=" & Text.From([i])), #duration(0,0,0,Delay) 
                        ) otherwise null,
            more = try Value.Metadata(Page)[MoreAvailable]? = true otherwise false,
            url = url_base & "?page=" & Text.From([i]),
            last_page = if not [more] then true else false // [more] in square brackets is a reference to the result of previous iteration
            ],
            // arg4 - output of each iteration
            each [[i], [url], [Page], [more], [last_page]]
    ),
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"i", "url", "Page", "more", "last_page"}, {"i", "url", "Page", "more", "last_page"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Expanded Column1",{"Page"}),
    #"Expanded Page" = Table.ExpandTableColumn(#"Removed Other Columns", "Page", {"id", "state", "full_name", "job_title", "mugshot_url", "web_url", "activated_at", "following", "followers", "email"}, {"id", "state", "full_name", "job_title", "mugshot_url", "web_url", "activated_at", "following", "followers", "email"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Page",{{"id", Int64.Type}, {"state", type text}, {"full_name", type text}, {"job_title", type text}, {"mugshot_url", type text}, {"web_url", type text}, {"email", type text}}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Changed Type",{{"activated_at", type datetimezone}}),
    #"Extracted Date" = Table.TransformColumns(#"Changed Type1",{{"activated_at", DateTime.Date, type date}}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Extracted Date",{{"following", Int64.Type}, {"followers", Int64.Type}}),
    #"Inserted Age" = Table.AddColumn(#"Changed Type2", "Age", each Date.From(DateTime.LocalNow()) - [activated_at], type duration),
    #"Inserted Total Years" = Table.AddColumn(#"Inserted Age", "Total Years", each Duration.TotalDays([Age]) / 365, type number),
    #"Removed Columns" = Table.RemoveColumns(#"Inserted Total Years",{"Age"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Total Years", "UserYears"}}),
    #"Replaced Value" = Table.ReplaceValue(#"Renamed Columns","48x48","100x100",Replacer.ReplaceText,{"mugshot_url"})
in
    #"Replaced Value"

Highlights

I’d like to highlight some parts of this code as you might want to re-use these techniques.

1. To add a delay between API queries we can use Function.InvokeAfter

Page = try Function.InvokeAfter(
                            ()=>fGetUsersPage(url_base & "?page=" & Text.From([i])), #duration(0,0,0,Delay) 
                        ) otherwise null,

2. Use “try … otherwise …” to suppress potential errors, unless you really want to fail entire refresh process when one of the pages cannot be returned.

3. And the last tip I find really helpful – return all record fields in 4th argument of List.Generate

        // arg4 - output of each iteration
        each [[i], [url], [Page], [more], [last_page]]

It helps a lot with debugging as I may see steps of List.Generate: page index, URL, returned page table and “loop stopping” parameters (“more” and “last_page” in this case):

List.Generate output
List.Generate result expanded

That’s all for today.

Stay tuned. New posts are coming hopefully soon.

Yammer Analytics in Power BI Part 2

This is the second post in the series “Yammer Analytics in Power BI”

1 – Intro to Yammer Analytics in Power BI

2 – Paging through Yammer group messages

3 – Paging through Yammer user messages

In the first post I shed some light on simple Yammer authorization approach and general Yammer API URLs.

In this post, let’s try to build a function to get group messages with their attributes.

(more…)

Data type conversion in custom columns

data type conversion

You probably know that you can manually set data type for custom column created in Power Query with help of Table.AddColumn function (using 4th argument).

Rick de Groot has recently published a good post about this at Excelgorilla.com.

Just be sure you use right data type for the result of your calculation.

Once I faced strange Power Query behavior, as I thought, but later I understood that everything works fine (probably).

(more…)

Export PowerQuery query to CSV

Recently I found a PowerQuery gem, trick with Java/VB Script that allows to export data from Power Query to CSV without R / DAX Studio / SMS and Registration. However, related with risk. As everything else in our life.

Kudos to user Shi Yang from Stack Overflow who replied to How to write to DATA sources with Power Query?.

Shi proposes to use following code (extended with my comments)

Let
     // reference to a query you wish to export to CSV
    Source = ReferenceToYourTableOrQuery,
     // demote headers to have headers in resulting CSV
     // if you don't need headers, remove Table.DemoteHeaders
     Json = Text.FromBinary(Json.FromValue(Table.ToRows(Table.DemoteHeaders(Source)))),
     // trigger execution of script
  Export = Web.Page("
 var fso=new ActiveXObject('Scripting.FileSystemObject');
 var f1=fso.CreateTextFile('C:/Temp/test.csv',true);
 var arr=" & Json & ";
 f1.WriteLine(arr.join('\n'));
 f1.WriteBlankLines(1);
 f1.Close();
 ")
 in Export

All great, but this method doesn’t work with default settings of Internet Explorer.

(more…)

Transform Columns with List.Zip

This post continues series of articles about M Function List.Zip ( first post, second post).

Table.TransformColumns is another function that requires list of pairs if you want to transform several columns in Power Query


When we want to transform all fields we can simply use

Table.TransformColumns(#"Changed Type", {}, Text.Trim) 

// would be great to see same behaviour for Table.TransformColumnTypes

However, when we need to change only part of the table we have to generate list of pairs {column name, function}.

Text.Trim highlighted in last sample is a good example of transformation function.

In general, transformation function can contain any logic and can transform objects of any type.

(more…)

How List.Zip helps with Renaming Columns in Power Query

This is my second post about List.Zip. First one was about general usage of List.Zip, where I touched question of transforming column types in Power Query.

Another scenario where List.Zip can be used – renaming columns in Power Query.

When you rename columns manually, auto-generated function looks like

Table.RenameColumns( Source, 
     { {"Column1", "Col 1"}, 
     {"Column2", "Col 2"}, 
     {"Column3", "Col 3"}} )

As well as Table.TransformColumnTypes, it requires list of pairs {“Old Name” , “New Name” } for its second argument.

List.Zip helps to create list of pairs:

Table.RenameColumns( Source, List.Zip( { Table.ColumnNames( Source ), 
    { "Col 1", "Col 2", "Col 3" } } ) )

Result:


Dynamic column names

When we have, so called, “RenamingTable”, which contains two columns, first – with old name, second – with new name, we can use following pattern

Table.RenameColumns( TargetTable, 
    Table.ToColumns( Table.Transpose( RenamingTable ) ), 
    MissingField.Ignore )

You can read more detailed explanation in one of my previous posts.

Using List.Zip, we don’t need RenamingTable, as we can generate new names on the fly by using following pattern

(more…)

How to use List.Zip in Power Query

There is one useful Power Query M function – List.Zip, but with poor documentation on MSDN.

I hope, at some point, library of M functions will be available on Github like it is done for VBA. Power Query enthusiasts then would get a chance to contribute. E.g. from MSDN page of Workbook object we can go to Github and make a pull request for changes.

I plan 2-3 posts about application of List.Zip, this is the first one.

How does List.Zip work

Let’s start from “Help” in Power Query editor, it shows simple sample


Note: to get help on function – type name of function in formula bar and press Enter. Pay attention to register, M is case sensitive.

Having short documentation directly in power query editor is great idea! However, it is hard to show all scenarios with function and keep documentation short. In this particular case, it might be not obvious what happens when we have list of lists with more than 2 elements, or lists with different number of elements, or with more than two lists. (more…)

Power Query Cheat Sheet Update

Power Query cheat sheet repository

Just a short post today. I’ve updated my Power Query cheat sheet and created repository on Github so anyone can now contribute.

Good news for Russian-speaking readers. Шпаргалка теперь доступна на русском языке. Ура!

If you want to have this Cheat Sheet in any other language – just translate it and send me a pull request. Too difficult? Then send me an email with translated .docx attached.

Have several ideas for “PQ shortcuts” section, so stay tuned.

Go to Github and get your copy of Power Query cheat sheet.

Power Query cheat sheet repository

How to track refresh time in Power BI Desktop

Usually, I use Power Pivot and VBA in Excel to measure Power Query performance by comparing refresh time.

But, I suppose Power BI Desktop refresh process may be different, therefore would be nice to have something that would allow measure time between start and end of refresh.

Unfortunately, we do not have VBA in Power BI Desktop, nor can trigger and monitor refresh of Power BI Desktop from another application. So, we have only M and queries.

For the experiment I’ve created a new Power BI Desktop file, three queries in it, and put queries in order Start-Delay-End

Start

= DateTime.LocalNow()

Delay

= Function.InvokeAfter(()=> DateTime.LocalNow(), #duration(0,0,0,3))

End

= DateTime.LocalNow()

“Delay” must make a 3-seconds delay. You may read about Function.InvokeAfter in old good post from Chris Webb.

The idea is naive – hope that Power BI Desktop will execute queries in the same order as in the list of queries.

If so – query “Start” will load start time, Delay will make a pause, then “End” will load time of the end of refresh.

However, by default, Power BI Desktop loads tables in parallel, to optimize load time.

This property can be found in Options -> Current File ( Data Load ) -> Enable parallel loading of tables

This really works. I was receiving same time for Start and End tables while this property was enabled.

When I disabled it, I finally got desired difference between Start and End

Then I changed order of queries in the list of queries to check whether it impacts on execution order

Result shows that “Yes – order does matter“:

Of course, Power Query engine must be generating Execution plan when user press Refresh and then follows it. But in simple scenarios, when “Parallel loading of tables” is disabled, seems like Power BI Desktop follows order of queries from Query Editor.

I have no groups of queries, have no references between queries etc. It allowed me to check load time.

What about complex models

I tried to use same technique in more complex file – with groups of queries (but with no references).

I created query “Start” and placed it into the first group.

Similar for “End” query – but in the last group.

Result seems correct

Conclusion

Above is, of course, not a serious solution. Mainly, because you won’t want to disable parallel loading of tables, and won’t rely on order or queries.

Nevertheless, it would be good to have total refresh time directly in the model. It would allow to monitor refresh time of growing datasets.

More sophisticated way of query engine processing analysis is hidden in diagnostics of trace logs. You may read about this in several posts from Chris Webb here, here, and here.

Set of CSVs as a database for Power BI

What if part of your reporting database is a set of CSV files?

Apart from possible problems with different number of columns, data types, delimiters, encoding etc., you have to care about performance.

According to my practice, large number of files kills productivity. It is better to firstly combine CSV / TXT files into one, then use Power Query to load it.

(more…)