Yammer Analytics in Power BI Part 3

List.Generate Power Query

This is the third post in the series “Yammer Analytics in Power BI”

1 – Intro to Yammer Analytics in Power BI

2 – Paging through Yammer group messages

3 – Paging through Yammer group members

4 – Paging through Yammer user messages

Group Members

Querying group users (members) is a relatively simple task as we have an API for that

https://developer.yammer.com/docs/usersin_groupidjson

Yammer Users in Group API parameters

Note that API returns only 50 users per page, meaning we have to build a loop through unknown number of pages.

Basic Patterns

For getting users from Yammer group we can use following pattern:
https://www.yammer.com/api/v1/users/in_group/[groupid].json

or to get response in XML format
https://www.yammer.com/api/v1/users/in_group/[groupid].xml

For paging we need to add “page=N” as a query option
https://www.yammer.com/api/v1/users/in_group/[groupid].json?page=1

Using paging is as important as breathing, because as we may see from documentation, Yammer returns only 50 records per query.

It is not said in the documentation but API returns [more_available] tag

So we can use it to exit loop.

First Step

Let’s firstly build a “response catcher”. A function that will transform web response into a meaningful form.

We will be providing URLs with a following pattern

url_base = "https://www.yammer.com/api/v1/users/in_group/" & GroupID & ".json"

as an argument to a function fGetUsersPage:

(url as text) as table =>
let    
    response = Web.Contents(url, [Headers=[Authorization=#"Authorization"]]),
    body = Json.Document(response),
    moreavailable = try Logical.From( body[more_available] ) otherwise false,
    users = body[users],
    #"Converted to Table" = Table.FromList(users, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", 
        {"id", "state", "full_name", "job_title", "mugshot_url", "web_url", "activated_at", "stats", "email"}, 
        {"id", "state", "full_name", "job_title", "mugshot_url", "web_url", "activated_at", "stats", "email"}),
    #"Expanded stats" = Table.ExpandRecordColumn(#"Expanded Column1", "stats", {"following", "followers"}, {"following", "followers"})
in
    #"Expanded stats" meta [ MoreAvailable = moreavailable ]

This function returns a page with list of users (up to 50 items) and some of their attributes available through API. On top of this, response will contain a meta parameter – MoreAvailable, which takes true/false value. This meta flag will help us to stop “do-while” loop.

Pagination

I prefer to use List.Generate function for loops in Power Query, so I ended up with the following code that iterates through pages with Group Members and returns a table with users and their parameters

let
    Delay = 1,
    GroupID = Text.From( GroupID ),
    url_base = "https://www.yammer.com/api/v1/users/in_group/" & GroupID & ".json",
    Source = List.Generate(
            ()=> [
                i = 2,
                url = url_base,
                Page = fGetUsersPage(url_base),
                more = true,
                last_page = false // change to true when last page reached
            ],
            // arg2 = do while
            each [last_page] = false and [i]<100, // hardcoded limit, increase for groups with more than 5000 members
            // arg3 - iteration
            each 
            [ i = [i] + 1,
            Page = try Function.InvokeAfter(
                            ()=>fGetUsersPage(url_base & "?page=" & Text.From([i])), #duration(0,0,0,Delay) 
                        ) otherwise null,
            more = try Value.Metadata(Page)[MoreAvailable]? = true otherwise false,
            url = url_base & "?page=" & Text.From([i]),
            last_page = if not [more] then true else false // [more] in square brackets is a reference to the result of previous iteration
            ],
            // arg4 - output of each iteration
            each [[i], [url], [Page], [more], [last_page]]
    ),
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"i", "url", "Page", "more", "last_page"}, {"i", "url", "Page", "more", "last_page"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Expanded Column1",{"Page"}),
    #"Expanded Page" = Table.ExpandTableColumn(#"Removed Other Columns", "Page", {"id", "state", "full_name", "job_title", "mugshot_url", "web_url", "activated_at", "following", "followers", "email"}, {"id", "state", "full_name", "job_title", "mugshot_url", "web_url", "activated_at", "following", "followers", "email"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Page",{{"id", Int64.Type}, {"state", type text}, {"full_name", type text}, {"job_title", type text}, {"mugshot_url", type text}, {"web_url", type text}, {"email", type text}}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Changed Type",{{"activated_at", type datetimezone}}),
    #"Extracted Date" = Table.TransformColumns(#"Changed Type1",{{"activated_at", DateTime.Date, type date}}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Extracted Date",{{"following", Int64.Type}, {"followers", Int64.Type}}),
    #"Inserted Age" = Table.AddColumn(#"Changed Type2", "Age", each Date.From(DateTime.LocalNow()) - [activated_at], type duration),
    #"Inserted Total Years" = Table.AddColumn(#"Inserted Age", "Total Years", each Duration.TotalDays([Age]) / 365, type number),
    #"Removed Columns" = Table.RemoveColumns(#"Inserted Total Years",{"Age"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Total Years", "UserYears"}}),
    #"Replaced Value" = Table.ReplaceValue(#"Renamed Columns","48x48","100x100",Replacer.ReplaceText,{"mugshot_url"})
in
    #"Replaced Value"

Highlights

I’d like to highlight some parts of this code as you might want to re-use these techniques.

1. To add a delay between API queries we can use Function.InvokeAfter

Page = try Function.InvokeAfter(
                            ()=>fGetUsersPage(url_base & "?page=" & Text.From([i])), #duration(0,0,0,Delay) 
                        ) otherwise null,

2. Use “try … otherwise …” to suppress potential errors, unless you really want to fail entire refresh process when one of the pages cannot be returned.

3. And the last tip I find really helpful – return all record fields in 4th argument of List.Generate

        // arg4 - output of each iteration
        each [[i], [url], [Page], [more], [last_page]]

It helps a lot with debugging as I may see steps of List.Generate: page index, URL, returned page table and “loop stopping” parameters (“more” and “last_page” in this case):

List.Generate output
List.Generate result expanded

That’s all for today.

Stay tuned. New posts are coming hopefully soon.

Data type conversion in custom columns

data type conversion

You probably know that you can manually set data type for custom column created in Power Query with help of Table.AddColumn function (using 4th argument).

Rick de Groot has recently published a good post about this at Gorilla.BI.

Just be sure you use right data type for the result of your calculation.

Once I faced strange Power Query behavior, as I thought, but later I understood that everything works fine (probably).

(more…)

Shift cells up/down in same column in Power Query

In my previous post I wrote about one interesting technique used by my colleague Zoltán Kaszaki-Krsjak.

Categoty_tmp = Table.AddColumn(Buffer, "Category_tmp", each Buffer[Category]{[ID2]}?),
// It helps to shift values of column [Category] one row upwards.

It is a trick that you most probably will not use in any of your solutions. However, worth to know how it works and how to do this without adding new columns.

“each Buffer[Category]{[ID2]}?” is just a function, so we can combine it with methods described in one of my posts (Transform Column Using Custom Function).

Here is the code.

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Duplicated Column" = Table.DuplicateColumn(Source, "Category", "Method1"),
#"Duplicated Column1" = Table.DuplicateColumn(#"Duplicated Column", "Category", "Method2"),
#"Duplicated Column2" = Table.DuplicateColumn(#"Duplicated Column1", "Category", "Method3"),
Buffer = Table.Buffer( #"Duplicated Column2" ),

Method1 = Table.FromRecords( Table.TransformRows( Buffer, each [Index = [Index],
    Category = [Category],
    Method1 = Buffer[Method1]{[Index]}?,
    Method2 = [Method2],
    Method3 = [Method3] ] ) ),

Method2 = Table.FromRecords( Table.TransformRows( Method1, (row) =>
    Record.TransformFields( row,
    {"Method2", each Buffer[Method2]{ row[Index] }? } ) ) ),
    // method offered by Miguel Escobar in comment to previous post

Method3 = Table.ReplaceValue(Method2, each [Method3], each Buffer[Method3]{[Index]}?, Replacer.ReplaceValue, {"Method3"})
in
Method3

File with methods is here.

But which method is faster?

(more…)

Transform Column using custom function in Power Query (part 2)

Year ago I wrote a post Transform table column using own function in Power Query. According to stats, it is quite popular topic, and I’m not surprised.

However, I think that old post is quite complex and too long. Advice should be shorter.

In addition, that post do not cover another very important scenario, when you need to transform column using value from another column.

Consider following case


Task:

  1. We need to Trim column [Name]
  2. We need to divide [Amount] by [Counter] (and do not want to add additional column)

Let’s do this.

(more…)

Decode Active Directory field UserAccountControl using Power Query

Recently my colleague had to identify inactive accounts in Active Directory and use this data in automated report on regular basis, to quickly react on new switched off users.

There are plenty of VBA solutions or Vbscripts that work with Active Directory and pull data from it.

However, for automated reports developed in Excel or Power BI it will be more convenient to use Power Query (Get & Transform) to get data directly from AD. In addition this helps to reduce mirroring of data.

Information about user status is stored in field UserAccountControl.

According to documentation on MSDN, field value is not very user friendly

“To disable a user’s account, set the UserAccountControl attribute to 0x0202 (0x002 + 0x0200). In decimal, this is 514 (2 + 512).”


Resulting value is a sum of different values of multiple properties.

E.g. what 2146 could mean?

Not easy to say when number is in decimal notation.

However, it is much easier if value of UserAccountControl is represented in binary string – of 0 and 1.

2146 = 2048 + 64 + 32 + 2 = > 100001100010

For example, if we need to check property ACCOUNTDISABLE, we only need to check second digit (from right).

Unfortunately, there is no standard function in Power Query that converts decimal number to binary notation, so I had to create own function.

As usually, it is available on Github: https://github.com/IvanBond/pquery/blob/master/Number.ToBinaryString.m

// Number.ToBinaryString( 1026 )

// result: 10000000010

Function code is quite short, although it is recursive


When number is in binary notation, we can use Text.End, Text.Start function to get needed digit.

Just an idea. Can be used with Power BI alerts to unblock self-blocked users even before they call to IT, for audit, or with Microsoft Flow somehow…

Changed Type step in Power Query

My work is connected with data coming from different countries, which have different local formats of date, numbers etc.

To be able to consolidate data in Excel models, I have to convert data accordingly.

In Power Query and Get & Transform we can set default Locale in Query Options

However, when I load data from different sources and apply Changed Type ( Table.TransformColumnTypes ) operation I must use corresponding locale for argument “culture”, which is optional.

I noticed interesting behavior. When I change type

New step “Changed Type” appears in Applied Steps without gear

Of course, when Power Query engine created this step it used default Regional Locale, which is “en-US”, as a result values are wrong. I have to use Russian locale for this sample.

When I manually add argument “culture” in formula bar – gear appears near the “Changed Type” step:

It shows nice menu – Change Type with Locale

It even shows sample format when I select Locale

Conclusion

For me it is fine to add locale manually, but when I explained this to my colleagues they were not happy to hear this.

Maybe I don’t know something… But how to call this “Change Type with Locale” by default?

E.g. when I change Data Type – ask about locale I want to use.

Probably, worth to add something like checkbox “Always ask about locale when I change data type” in the area of Current Workbook options.

Or just display gear for step Changed Type by default to let user go the and select local from list.

There is also an idea with close meaning on Power BI uservoice forum

https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/17155889-add-using-locale-option-to-new-data-type-menu-bu

UPD:

The option was found. Thanks to Frank and to Maxim Zelensky (@Hohlick)

change-type-right-click

To be honest, I’ve never used this menu to change type. Much faster was to change it from ribbon and from  new small button in column header, and then manually add locale.

Good to know that at least option exists, although is very hidden :-(.

Query folding and dynamic parameters in SQL Queries in Power Query

Before you start reading this article, I recommend you to go through very good post on Ken Puls’s blog – Pass Parameters to SQL Queries.

If it is fine for you to use native queries to database then most probably scenario described below is not so interesting for you.

Here I’ll talk about workaround allowing to use query folding and pass parameters in Power Query without building of native SQL query.

Level: intermediate

Task definition

Generate N workbooks with data models populated with limited data scope.

Then each workbook will be shared with certain group of people, that allowed to see only relevant set of data (e.g. department OPEX of particular subsidiary).

This is dictated by fact, that if you share Excel workbook with someone – user immediately has access to all info inside data model. So, I cannot create one common data model and hide slicer somewhere with activated department. I must prepare separated workbooks.

Conditions

Data: Table with millions of records in SQL database – main data source for report (query folding required for obvious reasons)

Tools: Excel + Power Query + PowerPivot

Support data: table in Excel, defining datasets – data scopes, let’s call it Control Table

(more…)

Power Query: Xlsb vs Xlsx

Recently I helped my colleague with a report. The task was to consolidate data from multiple Excel files, each contains 100k-500k rows. Hence, size of each Excel file is relatively big.

I proposed to use xlsb to store those files. One file per month, report needs R24m, so 50 MB is less than 150 MB.

In the end I finished with deep analysis what is better as Excel-files database for Power Query – XLSB or XLSX.

Resume

For PQ solutions use XLSX instead of XLSB

PQ from XLSX twice faster than from XLSB

Engine handling XLSB from time to time return error – cannot read file

“UseHeaders” parameter of Excel.Workbook function works differently for xlsx and xlsb

If sheet and table names are equal in Excel workbook – Navigator generates unique name of object.

If you interested how I revealed all this – continue reading.

(more…)

Query SAP Business ByDesign Web Services via Power Query

In this post I would like to share my experience about work with web services using Power Query.

Unfortunately, some very important information is not available in data sources, e.g. Accounts Relationships and Pricing Conditions (price lists, customer specific discount lists etc.). Hope someday we will get this in reports.

However, currently, we can get this info only from

  • UI manually, almost impossible to do if you have X tenants and XXX customers with specific price lists
  • Web services – Jedi path, require knowledge and skills, or trustable tools, that can be checked

To our luck, with every new release we get more and more web services.

Basic idea of web service usage

  1. Make POST http request with XML
  2. Parse XML response

(more…)

Load Power Query function from GitHub

Why scenario of “Loading Power Query function from GitHub” is interesting?

Assume that you work in big company, or small, doesn’t matter.

You and your colleagues use shared library of Power Query functions, published on shared drive.

You also have separated workstation or server, that has no access to the internet, only can access shared folders in corporate network. This workstation updating workbooks with Power Query all day and night long.

Typical model
101615_0737_LoadPowerQu1.png


From time to time, you also have to collect data from web using PQ functions. E.g. if you are using cloud ERP, like SAP Business ByDesign.

When you are in the office, everything is OK, your laptop is in corporate network, you simply take functions from shared folder.

Of course, company’s network resources are not accessible from the outside – from home, from partner’s office or other place.

You probably can use VPN, it solves problem, however, partially. Because VPN reduces speed to some sites.

What would be good in such situation

  • library on the web that can be used by you and colleagues from everywhere
  • library on corporate network for isolated server

If you have such two libraries then following scenario becomes possible :

When you or your colleague in the office – you simply take function from corporate network

When you outside of company network, you load same function from web library

This simple function, Load.m published to GitHub, will help you

https://github.com/IvanBond/pquery/blob/master/Load.m

(fnName as text, optional BasePath as text) as function =>
let
//If you wish to hardcode the path to load the queries from, you can edit the following line:
DefaultPath = “C:\PQuery\”,
GitHubPath = “https://raw.githubusercontent.com/IvanBond/pquery/master/“,

BasePath = if (BasePath <> null) then BasePath else DefaultPath,
Path = BasePath & (if Text.End(BasePath, 1) <> “\” then “\” else “”),
File = Path & fnName & “.m”,

Function = try Expression.Evaluate(Text.FromBinary(Binary.Buffer(File.Contents(File))), #shared)
otherwise Expression.Evaluate( Text.FromBinary(Binary.Buffer(Web.Contents(GitHubPath & fnName & “.m”))), #shared)
in
Function

If you provide BasePath as parameter – function will take code from there.

If BasePath is empty then Load.m will use hardcoded DefaultPath.

If Load.m failed to load function from BasePath and DefaultPath – then it goes to hardcoded GitHubPath.

GitHub library opens wide opportunities, because it is shared, open-source, accessible from any place.