Unfold Child-Parent hierarchy in Power Query

In this post I consider two types of hierarchies


First table defines typical hierarchy of companies.

Usually, such hierarchy is used for financial reports to group key figures.

Most probably, we know number of hierarchy levels upfront, but not always.

Task: Expand hierarchy, automatically detect quantity of levels in resulting table using Power Query (Get & Transform) / Power BI

Stay close to scenario described in pattern: http://www.daxpatterns.com/parent-child-hierarchies/

Goal:


Second table defines substitution of products.

Case: sometimes we required to get sales history of all predecessor products and group result on Actual Product. There can be multiple substitutions, we never know amount.

Task: In this case we need to find the latest successor, Actual Product for each product in other words.

Goal:


(more…)

Refresh Excel Data Models on schedule

UPD: Excel-based version of Power Refresh (Reports Controller) is here.

I know what you think.

Yet another solution to refresh Excel files with connections and Data Model (aka PowerPivot).

Plenty of them can be found on the Internet. Different programming languages, different level of complexity

C#:

  1. https://github.com/alapolloni/ExcelRefresh
  2. Excel and PowerPivot refresh suite: https://excelrefresher.codeplex.com/

Perl: http://search.cpan.org/~ctbrown/Win32-Excel-Refresh-0.02/Refresh.pm

Power Shell: https://rpmachado.wordpress.com/2015/07/10/powershell-auto-refresh-excel-powerpivot-data/

VBA: http://velin-georgiev-blog.appspot.com/blog/fully-automated-excel-reporting-by-using-windows-task-scheduler-vba-and-simple-vba-functions-library/

And, of course – Power Update

http://exceleratorbi.com.au/power-update-software/

http://www.powerpivotpro.com/2015/02/introducing-power-update/

Year ago, when I decided to write my own refreshing program, Power Update was already on market, however, I couldn’t get it for myself, due to some reasons. In addition, I wanted to slightly adjust a process of refreshing and saving.

(more…)

Query folding and dynamic parameters in SQL Queries in Power Query

Before you start reading this article, I recommend you to go through very good post on Ken Puls’s blog – Pass Parameters to SQL Queries.

If it is fine for you to use native queries to database then most probably scenario described below is not so interesting for you.

Here I’ll talk about workaround allowing to use query folding and pass parameters in Power Query without building of native SQL query.

Level: intermediate

Task definition

Generate N workbooks with data models populated with limited data scope.

Then each workbook will be shared with certain group of people, that allowed to see only relevant set of data (e.g. department OPEX of particular subsidiary).

This is dictated by fact, that if you share Excel workbook with someone – user immediately has access to all info inside data model. So, I cannot create one common data model and hide slicer somewhere with activated department. I must prepare separated workbooks.

Conditions

Data: Table with millions of records in SQL database – main data source for report (query folding required for obvious reasons)

Tools: Excel + Power Query + PowerPivot

Support data: table in Excel, defining datasets – data scopes, let’s call it Control Table

(more…)

Bulk extract data from SAP Business ByDesign

Post describes how Power Query (Get & Transform) can be used as ETL tool for SAP Business ByDesign (SAP ByD).

Purely Self-Service BI solution built on Excel for SAP Business ByDesign (SAP ByD).

Level: Advanced

How to extract master and operational data from SAP ByD using Power Query

Common information can be found in my previous posts, which I’d recommend to go through

Several ways of data extraction from SAP Business ByDesign

Query SAP Business ByDesign Web Services via Power Query

From all possible methods of data extraction, I chose Excel + Power Query as the most simple one for Modern Excel data analysts (non-IT pro).

What do we need to start this journey?

What can help?

Massive data extraction from SAP ByDesign

Let’s start from the most interesting – extract of large data volume from SAP ByD.

(more…)

Power Query: Xlsb vs Xlsx

Recently I helped my colleague with a report. The task was to consolidate data from multiple Excel files, each contains 100k-500k rows. Hence, size of each Excel file is relatively big.

I proposed to use xlsb to store those files. One file per month, report needs R24m, so 50 MB is less than 150 MB.

In the end I finished with deep analysis what is better as Excel-files database for Power Query – XLSB or XLSX.

Resume

For PQ solutions use XLSX instead of XLSB

PQ from XLSX twice faster than from XLSB

Engine handling XLSB from time to time return error – cannot read file

“UseHeaders” parameter of Excel.Workbook function works differently for xlsx and xlsb

If sheet and table names are equal in Excel workbook – Navigator generates unique name of object.

If you interested how I revealed all this – continue reading.

(more…)

Restore Scheduled Tasks

Recently, one of our workstations used for automated reports refresh “died”. OS stopped to load after reboot.

Fortunately, all necessary stuff were copied to backup.

We used “Self-Service BI for poor” approach on that machine. Excel + Vbscripts, scheduled in Windows Task Scheduler.

It’s typical task was to handle following simplified schema

  • Run Excel
  • Open workbook
  • RefreshAll (Power Query connections + Data Model)
  • Save workbook
  • Kill Excel

Repeat for all reports scheduled in Task Manager.

(more…)

SAP ByDesign + Power BI = cloud friends


In this post you will find an idea how to pull data from SAP ByDesign directly to Power BI. This is a totally self-service solution, which can be built if you have literally nothing. Only you, Excel, free license of Power BI and SAP ByDesign tenant(s).

Samples based on data from test tenants that were opened for participants of openSAP course “Reporting with SAP Business ByDesign“.

Couple of words about SAP ByD for those who are not aware what is this. Cloud ERP system, has own real-time analytics engine, quite powerful, but poor. Poor calculation capabilities, poor visualization capabilities. Handles only basic things, obvious calculations.

Moreover, if you have several SAP ByDesign systems in responsibility you most probably have to consolidate data somewhere. Build sub-regional, regional reports etc.

(more…)

Several ways of data extraction from SAP Business ByDesign

There are several reasons to extract data from corporate ERP system.

First one and the most important – reporting. Nicely visualized data, usually pre-calculated using specific logic before. We all want to see beautiful reports.

Second, I think, is a data transfer to: BI system(s), DWH etc. Maybe for some companies this is a first priority instead of reporting.

Third, rare, but happens in my practice – data migration due to system “end-of-life”. Yes, this happens. What to say? In a moment I’m involved in such project, second in my practice.

I would like to share my experience. Hope it will simplify life of many other reporting specialists working with SAP Business ByDesign.
First of all, basic ways of getting data:
(more…)

Rename columns of nested tables in Power Query

Typical scenario

You have several data sources – Table1, Table2 and Table3

You want to combine data


Unfortunately, column names are different in those data sources, but structure is similar.

One of possible solutions is to create three separate queries, apply necessary transformations there and combine them after all.

And this would be fine solution. Moreover, it is only possible solution when function of data extraction is different for each data source.

Scalability: when we get another source, e.g. Table4, we will create fourth query and add it in Table.Combine() in addition to previous three.

But! We can slightly simplify scalability process when same extraction function used for each data source.

(more…)

Xlsb vs Xlsx PowerPivot model

I bet over the Internet can be found number of posts with comparison of workbook size with different formats. However, nothing is better than own data :).

Assume that we have an Excel file with quite big amount of data

~250k rows


As we already know XLSB format makes file size much smaller than XLSX


But what if we load data into Data Model of empty workbook – will it be smaller than xlsb?

To check this I created new Excel workbook, went to PowerPivot tab and opened Data Model window


I need import from Excel

It is stored under “From Other Source” – scroll down to the end


Browse for my source file (I took xlsx as a source)


On the next step I can make “Preview and Filter” to exclude unnecessary data.

You may see checkboxes and usual dropdown filter buttons on columns – like in Excel I can select what I need in filter menu


For this “size challenge” I should take everything. So simply press “Finish” and wait while PowerPivot imports data.

After few seconds we may see result. All rows were imported – no errors.


Result in PowerPivot window


Save this file as xlsx and xlsb


Champion still is XLSB with simple table. PP version size is 11% greater.

And we may see minimal difference between PP versions in different formats. xlsx and xlsb have almost the same size.

And what interesting – xlsb is slightly bigger than xlsx in this case.

Let’s make one more test. Delete text column from initial file. It contains comments for transactions.


This column contains huge number of unique values, in addition it is text – the most heaviest data type.

I repeated same actions with file without text column and result is opposite


PP version has 11% less size than initial “xlsb without text column” and 4.5 times less than initial Without text column.xlsx.

Thoughts that come in my mind after all

Choice where to store data, in Excel table or in PowerPivot, depends on data. There are pros and cons. In the end, PowerPivot don’t lose. Usually data models contain more numbers and less text.

Work with 250k rows in Excel so slow that close to impossible, several SUMIFs will raise a “lack of memory error”. Instead of SUMIFS PivotTables can be used and then GETPIVOTDATA function… All of this currently “old style” modelling that helped us a lot before PowerPivot came – should stay in the past.

What if we want analyze R12M and take 12 files? We will have >1M rows, that is not supported by Excel sheets, BUT PowerPivot supports and still has quick performance of calculation.

Happy modelling in 2016!