This is the second post in the series “Yammer Analytics in Power BI”
1 – Intro to Yammer Analytics in Power BI
2 – Paging through Yammer group messages
3 – Paging through Yammer user messages
In the first post I shed some light on simple Yammer authorization approach and general Yammer API URLs.
In this post, let’s try to build a function to get group messages with their attributes.
As I mentioned in earlier post, Yammer API query to group messages has following URL pattern
https://www.yammer.com/api/v1/users/in_group/[groupid].json
It returns only 50 messages per query, which include both – thread starting messages and comments. We will need a loop to get all messages we need.
And we are interested in the messages that are older than a certain date.
Current API is very limited and only allows to use following parameters
No “date”-field, unfortunately.
As you may see, current API is not an Odata-kind of API.
We only have a “newer_than” parameter, which can help to restrict # of messages.
It accepts a number – “message ID”. Therefore, we need to query messages page by page until we get a message with “creation_date” older than required.
Query Yammer group messages
Firstly, I decided to get an ID of the last message in the group to have a starting point for further queries.
Even before that, I had to create two parameters that will be re-used in queries:
// GroupID - numeric Id is expected
"1234567" meta [IsParameterQuery=true, Type="Text", IsParameterQueryRequired=true]
// Authorization - expected string "Bearer value"
"Bearer BearerValue" meta [IsParameterQuery=true, Type="Text", IsParameterQueryRequired=true]
And another parameter to limit messages by date
// DateFrom -
#date(2013, 1, 1) meta [IsParameterQuery=true, Type="Date", IsParameterQueryRequired=true]
Query to get the latest message ID:
// NewestGroupMessageId
let
Source = Web.Contents("https://www.yammer.com/api/v1/messages/in_group/" &
GroupID & ".json?threaded=true&limit=1", [Headers=[Authorization= Authorization ] ] ),
buffer = Binary.Buffer(Source),
json = Json.Document(buffer),
#"meta" = json[meta],
newest_message_details = #"meta"[newest_message_details],
id = newest_message_details[id]
in
id
Pretty simple function: query, buffer, unpack json.
Yammer API for group messages returns sorted list of messages, sorted by ID descending. So, Top 1 message is the latest published.
It helped in the main query with paging: last message in the list is always the oldest.
After playing around a bit, I ended up with following two functions:
First: query pre-defined URL and return json (or null)
// fQueryPageJson
(url as text) =>
let
Source = Web.Contents( url, [ Headers=[Authorization= Authorization ] ] ),
buffer = Binary.Buffer( Source ),
json = try Json.Document( buffer ) otherwise null
in
json
Second: main iterator going through pages (using List.Generate) – extended with fields for debugging
// GroupMessagesBase
let
LatestMessageId = try Number.From( NewestGroupMessageId ) otherwise 0,
listOfPages = List.Generate(
// arg1 - set starting point
() => [ i=0, url="", Tbl=null, LastMessageId=LatestMessageId,
FirstMessageDate = Date.From( DateTime.FixedLocalNow() ),
LastMessageDate = Date.From( DateTime.FixedLocalNow() ),
MinMessageId = LatestMessageId,
more_available = true,
found_last = false,
found_last_test = false
],
// arg2 - loop exit condition
// arg2 uses value from arg3's result
// so arg3 is calculated before arg2
// do while
each [i] <= 1000 // limit number of queries just in case
and [found_last] = false,
// [field_name] references to the current step ot list.generate
// test: and [found_last_test] = false,
// [field_name] references to the current step ot list.generate
// arg3 - step's result calculation
each
let
// build a query string
url = "https://www.yammer.com/api/v1/messages/in_group/" & GroupID &
".json?older_than=" & Text.From( [LastMessageId] ) & "&limit=50",
page = fQueryPageJson( url ) // query data
in
[ i = [i]+1,
url = url,
Tbl = page,
LastMessageId = try Number.From( List.Last( page[messages] )[id] ) otherwise [LastMessageId],
FirstMessageDate = try Date.From( DateTime.From( List.First( page[messages] )[created_at] ) )
otherwise Date.From(DateTime.FixedLocalNow()),
LastMessageDate = try Date.From( DateTime.From( List.Last( page[messages] )[created_at] ) )
otherwise Date.From(DateTime.FixedLocalNow()),
more_available = try page[meta][older_available]? = true otherwise false,
found_last = ( [LastMessageDate] < Date.From( DateFrom ) ) or [more_available] = false,
// reference to previous step of list.generate
found_last_test = ( LastMessageDate < Date.From( DateFrom ) ) or more_available = false
// reference to current step to list.generate
],
// arg4 - output (depends on arg3 results)
each [
[i], [url], [Tbl],
[LastMessageId], [FirstMessageDate], [LastMessageDate],
[more_available], [found_last], [found_last_test]
]
),
#"Converted to Table" = Table.FromList(listOfPages, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
// keep all columns for debugging
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1",
{"i", "url", "Tbl", "LastMessageId", "FirstMessageDate", "LastMessageDate", "MinMessageId", "more_available", "found_last", "found_last_test" },
{"i", "url", "Tbl", "LastMessageId", "FirstMessageDate", "LastMessageDate","MinMessageId", "more_available" , "found_last", "found_last_test" } ),
// keep only necessary data
#"Removed Other Columns" = Table.SelectColumns(#"Expanded Column1",{"Tbl"}),
#"Removed Blank Rows" = Table.SelectRows(#"Removed Other Columns", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
// drill down to messages
#"Expanded Tbl" = Table.ExpandRecordColumn(#"Removed Blank Rows", "Tbl", {"messages"}, {"messages"}),
#"Expanded messages" = Table.ExpandListColumn(#"Expanded Tbl", "messages"),
// expand message attributes
#"Expanded messages1" = Table.ExpandRecordColumn(#"Expanded messages", "messages",
{"id", "sender_id", "created_at", "message_type", "thread_id", "client_type", "attachments", "liked_by", "content_excerpt"},
{"id", "sender_id", "created_at", "message_type", "thread_id", "client_type", "attachments", "liked_by", "content_excerpt"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded messages1",{{"created_at", type datetime}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Changed Type",{{"created_at", type date}}),
// filter out messages older that required date
#"Filtered Rows" = Table.SelectRows(#"Changed Type1", each [created_at] >= DateFrom)
in
#"Filtered Rows"
What was tricky is to include the last page with messages, where we find message that meets condition “LastMessageDate < Date.From( DateFrom )”. That page can contain messages with creation_date older than DateFrom and newer. Therefore, it shouldn’t be ignored.
However, List.Generate behaves in a way: when condition is met, step is ignored.
Can’t say what would be the ideal solution here, but I ended up with calculating a “variable” that referencing to previous step
found_last = ( [LastMessageDate] < Date.From( DateFrom ) )
or [more_available] = false, // reference to previous step of list.generate
Then I check this “variable” in condition
// do while
each [i] <= 1000 // limit number of queries just in case
and [found_last] = false,
// [field_name] references to the current step ot list.generate
Result
Information about Likes (Liked users) and Attachments comes as records from API. Those records can be expanded into list, what we usually do not want to do. So I had to load messages into one table, Likes and Attachments – into two separate tables.
“Liked_by” comes as a record which can be expanded into list of UserId values
“Attachment” is a bit more complex object than “Like”, but they have something in common. Message may have several of both:
Depending on attachment type list of record fields is different. For a “page” is probably the simplest
For image or video attachment records have more fields.
That’s all for today.
Next time I’ll describe my approach for getting group members, which is also based on List.Generate but slightly different from used for messages.