References in Power Query are incredibly useful. I often use it to organize my queries. But when you are using them you could trigger extra calls against your data sources.

The problem

Let say you created two Power Queries:

  • DatabaseData – This fetch data from a database
  • FileData – This fetch data from a file

Now you want to merge all this data into a single large table. You are creating a new query named AllData that is defined like this:

AllData definition
let
    Source = Table.Combine({FileData, DatabaseData})
in
    Source

Now when you update the data in all your queries this happens:

  • DatabaseData is updated by fetching data from the database.
  • FileData is updated by fetching data from the file.
  • AllData is updated by using the already fetched data from the two previous queries.

Wrong! This I what I thought would happen. Instead, what happens is that when AllData is updated a new call is made to the database and the file is read a second time.

The reason for this, as I understand it, is that Power Query is designed to push as much work against the data sources as possible. So, if AllData instead did some grouping on the database data this work would be pushed to the database. This could be a good thing.

But, in the situation above this is not what we want.

The solution

There are two potential solutions for this.

Block updating data in the queries

If we do not intend to use the data in DatabaseData and FileData we could block the update for these queries.

In Excel you right click on the queries, select properties, and then uncheck the option Refresh this connection on Refresh All.

Refresh this connection on Refresh All

If you are using PowerBI, you instead in the Power Query editor right click on the query and uncheck Include in report refresh.

Include in report refresh

Note when you do this AllData will still be using the latest data that is in the data sources. The difference is that DatabaseData and FileData will not be updated when you refresh all data in your document.

Use a calculated table

Another solution is to create a calculated table instead of having merging data in Power Query, this is only possible in PowerBI. In the Power BI editor select Table tools > New Table.

Add calculated table

Then you are entering this DAX command:

DAX
AllData = UNION(DatabaseData, FileData)

When all data is being updated, DatabaseData and FileData are updated first. Then the calculated table is created from the result from these queries. No additional calls are done against your database.

You could do a lot more than just union tables. You could do joins with different tables, add more columns and much more. There is more information about this in the documentation

Summary

I learned all this when I had a Power BI solution that was updating surprisingly slow. After some trouble shooting, I started to suspect that it made more request against my data sources than expected and this made the update process very slow. I wished I had learned this earlier.