Skip to main content

PowerBI

There are 2 sources that provide integration with PowerBI

Source ModuleDocumentation

powerbi

This plugin extracts the following:

  • Power BI dashboards, tiles and datasets
  • Names, descriptions and URLs of dashboard and tile
  • Owners of dashboards Read more...

powerbi-report-server

Use this plugin to connect to PowerBI Report Server. It extracts the following:

Metadata that can be ingested:

  • report name
  • report description
  • ownership(can add existing users in DataHub as owners)
  • transfer folders structure to DataHub as it is in Report Server
  • webUrl to report in Report Server

Due to limits of PBIRS REST API, it's impossible to ingest next data for now:

  • tiles info
  • datasource of report
  • dataset of report

Next types of report can be ingested:

  • PowerBI report(.pbix)
  • Paginated report(.rdl)
  • Mobile report
  • Linked report Read more...

Module powerbi

Certified

Important Capabilities

CapabilityStatusNotes
Extract OwnershipOn by default but can disabled by configuration

This plugin extracts the following:

  • Power BI dashboards, tiles and datasets
  • Names, descriptions and URLs of dashboard and tile
  • Owners of dashboards

Configuration Notes

  1. Refer Microsoft AD App Creation doc to create a Microsoft AD Application. Once Microsoft AD Application is created you can configure client-credential i.e. client_id and client_secret in recipe for ingestion.

  2. Enable admin access only if you want to ingest dataset, lineage and endorsement tags. Refer section Admin Ingestion vs. Basic Ingestion for more detail.

    Login to PowerBI as Admin and from Admin API settings allow below permissions

    • Allow service principals to use read-only admin APIs
    • Enhance admin APIs responses with detailed metadata
    • Enhance admin APIs responses with DAX and mashup expressions

Concept mapping

PowerBIDatahub
DashboardDashboard
Dataset's TableDataset
TileChart
Report.webUrlChart.externalUrl
WorkspaceN/A
ReportDashboard
PageChart

If Tile is created from report then Chart.externalUrl is set to Report.webUrl.

Lineage

This source extract table lineage for tables present in PowerBI Datasets. Lets consider a PowerBI Dataset SALES_REPORT and a PostgreSQL database is configured as data-source in SALES_REPORT dataset.

Consider SALES_REPORT PowerBI Dataset has a table SALES_ANALYSIS which is backed by SALES_ANALYSIS_VIEW of PostgreSQL Database then in this case SALES_ANALYSIS_VIEW will appear as upstream dataset for SALES_ANALYSIS table.

You can control table lineage ingestion using extract_lineage configuration parameter, by default it is set to true.

PowerBI Source extracts the lineage information by parsing PowerBI M-Query expression.

PowerBI Source supports M-Query expression for below listed PowerBI Data Sources

  1. Snowflake
  2. Oracle
  3. PostgreSQL
  4. Microsoft SQL Server
  5. Google BigQuery

Native SQL query parsing is only supported for Snowflake data-source and only first table from FROM clause will be ingested as upstream table. Advance SQL construct like JOIN and SUB-QUERIES in FROM clause are not supported.

For example refer below native SQL query. The table OPERATIONS_ANALYTICS.TRANSFORMED_PROD.V_UNIT_TARGET will be ingested as upstream table.

let
Source = Value.NativeQuery(
Snowflake.Databases(
"sdfsd788.ws-east-2.fakecomputing.com",
"operations_analytics_prod",
[Role = "OPERATIONS_ANALYTICS_MEMBER"]
){[Name = "OPERATIONS_ANALYTICS"]}[Data],
"select #(lf)UPPER(REPLACE(AGENT_NAME,\'-\',\'\')) AS Agent,#(lf)TIER,#(lf)UPPER(MANAGER),#(lf)TEAM_TYPE,#(lf)DATE_TARGET,#(lf)MONTHID,#(lf)TARGET_TEAM,#(lf)SELLER_EMAIL,#(lf)concat((UPPER(REPLACE(AGENT_NAME,\'-\',\'\'))), MONTHID) as AGENT_KEY,#(lf)UNIT_TARGET AS SME_Quota,#(lf)AMV_TARGET AS Revenue_Quota,#(lf)SERVICE_QUOTA,#(lf)BL_TARGET,#(lf)SOFTWARE_QUOTA as Software_Quota#(lf)#(lf)from OPERATIONS_ANALYTICS.TRANSFORMED_PROD.V_UNIT_TARGETS#(lf)#(lf)where YEAR_TARGET >= 2020#(lf)and TEAM_TYPE = \'foo\'#(lf)and TARGET_TEAM = \'bar\'",
null,
[EnableFolding = true]
),
#"Added Conditional Column" = Table.AddColumn(
Source,
"Has PS Software Quota?",
each
if [TIER] = "Expansion (Medium)" then
"Yes"
else if [TIER] = "Acquisition" then
"Yes"
else
"No"
)
in
#"Added Conditional Column"

M-Query Pattern Supported For Lineage Extraction

Lets consider a M-Query which combine two PostgreSQL tables. Such M-Query can be written as per below patterns.

Pattern-1

let
Source = PostgreSQL.Database("localhost", "book_store"),
book_date = Source{[Schema="public",Item="book"]}[Data],
issue_history = Source{[Schema="public",Item="issue_history"]}[Data],
combine_result = Table.Combine({book_date, issue_history})
in
combine_result

Pattern-2

let
Source = PostgreSQL.Database("localhost", "book_store"),
combine_result = Table.Combine({Source{[Schema="public",Item="book"]}[Data], Source{[Schema="public",Item="issue_history"]}[Data]})
in
combine_result

Pattern-2 is not supported for upstream table lineage extraction as it uses nested item-selector i.e. {Source{[Schema="public",Item="book"]}[Data], Source{[Schema="public",Item="issue_history"]}[Data]} as argument to M-QUery table function i.e. Table.Combine

Pattern-1 is supported as it first assign the table from schema to variable and then variable is used in M-Query Table function i.e. Table.Combine

Extract endorsements to tags

By default, extracting endorsement information to tags is disabled. The feature may be useful if organization uses endorsements to identify content quality.

Please note that the default implementation overwrites tags for the ingested entities, if you need to preserve existing tags, consider using a transformer with semantics: PATCH tags instead of OVERWRITE.

Admin Ingestion vs. Basic Ingestion

PowerBI provides two sets of API i.e. Basic API and Admin API.

The Basic API returns metadata of PowerBI resources where service principal has granted access explicitly on resources whereas Admin API returns metadata of all PowerBI resources irrespective of whether service principal has granted or doesn't granted access explicitly on resources.

The Admin Ingestion (explain below) is the recommended way to execute PowerBI ingestion as this ingestion can extract most of the metadata.

Admin Ingestion: Service Principal As Admin in Tenant Setting and Added as Member In Workspace

To grant admin access to the service principal, visit your PowerBI tenant Settings.

If you have added service principal as member in workspace and also allowed below permissions from PowerBI tenant Settings

  • Allow service principal to use read-only PowerBI Admin APIs
  • Enhance admin APIs responses with detailed metadata
  • Enhance admin APIs responses with DAX and mashup expressions

PowerBI Source would be able to ingest below listed metadata of that particular workspace

  • Lineage
  • PowerBI Dataset
  • Endorsement as tag
  • Dashboards
  • Reports
  • Dashboard's Tiles
  • Report's Pages

Lets consider user don't want (or doesn't have access) to add service principal as member in workspace then you can enable the admin_apis_only: true in recipe to use PowerBI Admin API only. if admin_apis_only is set to true then report's pages would not get ingested as page API is not available in PowerBI Admin API.

Basic Ingestion: Service Principal As Member In Workspace

If you have added service principal as member in workspace then PowerBI Source would be able ingest below metadata of that particular workspace

  • Dashboards
  • Reports
  • Dashboard's Tiles
  • Report's Pages

CLI based Ingestion

Install the Plugin

pip install 'acryl-datahub[powerbi]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: "powerbi"
config:
# Your Power BI tenant identifier
tenant_id: a949d688-67c0-4bf1-a344-e939411c6c0a
# Ingest elements of below PowerBi Workspace into Datahub
workspace_id_pattern:
allow:
- 4bd10256-e999-45dd-8e56-571c77153a5f
deny:
# Workspaces dataset environments (PROD, DEV, QA, STAGE)
env: DEV
# Azure AD Application identifier
client_id: foo
# Azure AD App client secret
client_secret: bar
# Enable / Disable ingestion of user information for dashboards
extract_ownership: true
# Enable/Disable extracting workspace information to DataHub containers
extract_workspaces_to_containers: true
# Enable / Disable ingestion of endorsements.
# Please notice that this may overwrite any existing tags defined to ingested entities!
extract_endorsements_to_tags: false
# dataset_type_mapping is fixed mapping of Power BI datasources type to equivalent Datahub "data platform" dataset
dataset_type_mapping:
PostgreSql: postgres
Oracle: oracle
Sql: mssql
GoogleBigQuery: bigquery

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

View All Configuration Options
FieldRequiredTypeDescriptionDefault
envstringThe environment that all assets produced by this connector belong toPROD
platform_instancestringThe instance of the platform that all assets produced by this recipe belong toNone
tenant_idstringPowerBI tenant identifierNone
workspace_idstring[deprecated] Use workspace_id_pattern insteadNone
dataset_type_mappingGeneric dictMapping of PowerBI datasource type to DataHub supported data-sources. See Quickstart Recipe for mappingNone
client_idstringAzure app client identifierNone
client_secretstringAzure app client secretNone
scan_timeoutintegertimeout for PowerBI metadata scanning60
extract_ownershipbooleanWhether ownership should be ingested. Admin API access is required if this setting is enabled. Note that enabling this may overwrite owners that you've added inside DataHub's web application.False
extract_reportsbooleanWhether reports should be ingestedTrue
extract_lineagebooleanWhether lineage should be ingested between X and Y. Admin API access is required if this setting is enabledTrue
extract_endorsements_to_tagsbooleanWhether to extract endorsements to tags, note that this may overwrite existing tags. Admin API access is required is this setting is enabledFalse
extract_workspaces_to_containersbooleanExtract workspaces to DataHub containersTrue
native_query_parsingbooleanWhether PowerBI native query should be parsed to extract lineageTrue
convert_urns_to_lowercasebooleanWhether to convert the PowerBI assets urns to lowercaseFalse
convert_lineage_urns_to_lowercasebooleanWhether to convert the urns of ingested lineage dataset to lowercaseTrue
admin_apis_onlybooleanRetrieve metadata using PowerBI Admin API only. If this is enabled, then Report Pages will not be extracted. Admin API access is required if this setting is enabledFalse
stateful_ingestionStatefulStaleMetadataRemovalConfig (see below for fields)PowerBI Stateful Ingestion Config.
stateful_ingestion.enabledbooleanThe type of the ingestion state provider registered with datahub.False
stateful_ingestion.ignore_old_statebooleanIf set to True, ignores the previous checkpoint state.False
stateful_ingestion.ignore_new_statebooleanIf set to True, ignores the current checkpoint state.False
stateful_ingestion.remove_stale_metadatabooleanSoft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.True
workspace_id_patternAllowDenyPattern (see below for fields)Regex patterns to filter PowerBI workspaces in ingestion{'allow': ['.*'], 'deny': [], 'ignoreCase': True}
workspace_id_pattern.allowArray of stringList of regex patterns to include in ingestion['.*']
workspace_id_pattern.denyArray of stringList of regex patterns to exclude from ingestion.[]
workspace_id_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True

Code Coordinates

  • Class Name: datahub.ingestion.source.powerbi.powerbi.PowerBiDashboardSource
  • Browse on GitHub

Module powerbi-report-server

Incubating

Important Capabilities

CapabilityStatusNotes
Extract OwnershipEnabled by default

Use this plugin to connect to PowerBI Report Server. It extracts the following:

Metadata that can be ingested:

  • report name
  • report description
  • ownership(can add existing users in DataHub as owners)
  • transfer folders structure to DataHub as it is in Report Server
  • webUrl to report in Report Server

Due to limits of PBIRS REST API, it's impossible to ingest next data for now:

  • tiles info
  • datasource of report
  • dataset of report

Next types of report can be ingested:

  • PowerBI report(.pbix)
  • Paginated report(.rdl)
  • Mobile report
  • Linked report

Configuration Notes

See the

  1. Microsoft Grant user access to a Report Server doc
  2. Use your user credentials from previous step in yaml file

Concept mapping

Power BI Report ServerDatahub
Paginated ReportDashboard
Power BI ReportDashboard
Mobile ReportDashboard
Linked ReportDashboard
Dataset, DatasourceN/A

CLI based Ingestion

Install the Plugin

pip install 'acryl-datahub[powerbi-report-server]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: powerbi-report-server
config:
# Your Power BI Report Server Windows username
username: username
# Your Power BI Report Server Windows password
password: password
# Your Workstation name
workstation_name: workstation_name
# Your Power BI Report Server host URL, example: localhost:80
host_port: host_port
# Your alias for Power BI Report Server host URL, example: local_powerbi_report_server
server_alias: server_alias
# Workspace's dataset environments, example: (PROD, DEV, QA, STAGE)
env: DEV
# Your Power BI Report Server base virtual directory name for reports
report_virtual_directory_name: Reports
# Your Power BI Report Server base virtual directory name for report server
report_server_virtual_directory_name: ReportServer
# Enable/Disable extracting ownership information of Dashboard
extract_ownership: True
# Set ownership type
ownership_type: TECHNICAL_OWNER


sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

View All Configuration Options
FieldRequiredTypeDescriptionDefault
envstringThe environment that all assets produced by this connector belong toPROD
usernamestringWindows account usernameNone
passwordstringWindows account passwordNone
workstation_namestringWorkstation namelocalhost
host_portstringPower BI Report Server host URLNone
server_aliasstringAlias for Power BI Report Server host URL
graphql_urlstring[deprecated] Not usedNone
report_virtual_directory_namestringReport Virtual Directory URL nameNone
report_server_virtual_directory_namestringReport Server Virtual Directory URL nameNone
extract_ownershipbooleanWhether ownership should be ingestedTrue
ownership_typestringOwnership type of ownerNONE
platform_namestringpowerbi
platform_urnstringurn:li:dataPlatform:powerbi
report_patternAllowDenyPattern (see below for fields){'allow': ['.*'], 'deny': [], 'ignoreCase': True}
report_pattern.allowArray of stringList of regex patterns to include in ingestion['.*']
report_pattern.denyArray of stringList of regex patterns to exclude from ingestion.[]
report_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True
chart_patternAllowDenyPattern (see below for fields){'allow': ['.*'], 'deny': [], 'ignoreCase': True}
chart_pattern.allowArray of stringList of regex patterns to include in ingestion['.*']
chart_pattern.denyArray of stringList of regex patterns to exclude from ingestion.[]
chart_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True

Code Coordinates

  • Class Name: datahub.ingestion.source.powerbi_report_server.report_server.PowerBiReportServerDashboardSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for PowerBI, feel free to ping us on our Slack