Skip to main content

DataHub Releases

Summary

VersionRelease DateLinks
v0.10.02023-02-07Release Notes, View on GitHub
v0.9.6.12023-01-31Release Notes, View on GitHub
v0.9.62023-01-13Release Notes, View on GitHub
v0.9.52022-12-23Release Notes, View on GitHub
v0.9.42022-12-20Release Notes, View on GitHub
v0.9.32022-11-30View on GitHub
v0.9.22022-11-04View on GitHub
v0.9.12022-10-31View on GitHub
v0.9.02022-10-11View on GitHub
v0.8.452022-09-23View on GitHub
v0.8.442022-09-01View on GitHub
v0.8.432022-08-09View on GitHub
v0.8.422022-08-03View on GitHub
v0.8.412022-07-15View on GitHub
v0.8.402022-06-30View on GitHub
v0.8.392022-06-24View on GitHub
v0.8.382022-06-09View on GitHub
v0.8.372022-06-09View on GitHub
v0.8.362022-06-02View on GitHub
v0.8.352022-05-18View on GitHub
v0.8.342022-05-04View on GitHub
v0.8.332022-04-15View on GitHub
v0.8.322022-04-04View on GitHub
v0.8.312022-03-17View on GitHub
v0.8.302022-03-17View on GitHub
v0.8.292022-03-10View on GitHub
v0.8.282022-03-07View on GitHub
v0.8.272022-02-23View on GitHub

DataHub v0.10.0

Released on 2023-02-07 by @david-leifker.

Release Highlights

Potential Downtime

This release introduces substantial improvements to search functionality which require reindexing indices.

During the reindexing:

  • a system-update job will set indices to read-only and create a backup/clone of each index
  • new components will be prevented from start-up until the reindex completes
  • Helm deployments will go into read-only mode and new ingestion runs will fail

This process can take anywhere from 5 minutes to multiple hours; as rough estimate, please expect it to take 1 hour for every 2.3 million entities. After the reindex is complete, please check your ingestion run to re-run any that did not complete.

If you are deploying containers yourself

If you're deploying the Docker containers yourself (without Helm or Docker-Compose Quickstart), then you'll need to ensure that you first run the acryldata/datahub-upgrade docker image (v0.10.0 tag) with the following environment variables enabled.

Then, run the container this with the command

docker run acryldata/datahub-upgrade:v0.10.0 -u SystemUpdate

For the full set of environment variables required, check out the default docker.env provided for Docker Compose deployments.

This will run the required reindex against your elasticsearch instance, after which other DataHub components should start correctly. If you do not run the datahub-upgrade container successfully, other components in the stack will fail to start correctly.

User Experience

We have some really exciting improvements to the DataHub user experience in this release!

Improved documentation editor, contributed by @ngamanda and the Grab Team. This work provides a much more intuitive documentation editing experience within the UI, providing “what you see is what you get” formatting & removing the need for markdown expertise.

Additionally, you can easily:

  • Add links to other entities/users within DataHub
  • embed and resize tables & images
  • toggle between font sizes and formats
  • embed syntax-highlighted code blocks

<img src="https://user-images.githubusercontent.com/114954101/217367791-3d392ae4-f422-4188-8d3c-768cb7c120ea.png" width="800">

Filter lineage graphs based on time windows You can now easily see the full lineage graph of an entity at a specific point in time. This makes it much easier to understand how interdependencies have evolved over time and to troubleshoot data issues in the past.

Improvements in Search As noted above, we have rolled out substantial improvements to Search functionality, making it easier than ever for end-user to find the entities that matter most. This release includes:

  • Stemm & Synonyms
  • Search by full or partial URN
  • Autocomplete improvements
  • Quoted search analyzer for exact & prefix match
Metadata Ingestion

Here are some of the most notable ingestion-related improvements:

  • Redshift: You can now extract lineage information from unload queries – thanks for the contrib, @mmmeeedddsss
  • PowerBI: Ingestion now maps Workspaces to DataHub Containers – thanks for the contrib, @looppi
  • BigQuery: You can now extract lineage metadata from the Catalog API – thanks for the crontrib, @PatrickfBraz
  • Glue: Ingestion now uses table name as the human-readable name – thanks for the contrib, @danielcmessias
Developer Experience
  • This release introduces DataHub Lite - a new experimental lightweight implementation of DataHub. It is intended to enable local developer tooling use-cases such as simple access to metadata for scripts and other tools. DataHub Lite is compatible with the DataHub metadata format and all the ingestion connectors that DataHub supports. Checkout the docs here.
Breaking Changes

[#7103](https://github.com/datahub-project/datahub/pull/7103) This should only impact users who have configured explicit non-default names for DataHub's Kafka topics. The environment variables used to configure Kafka topics for DataHub used in the kafka-setup docker image have been updated to be in-line with other DataHub components, for more info see our docs on Configuring Kafka in DataHub . They have been suffixed with _TOPIC where as now the correct suffix is _TOPIC_NAME. This change should not affect any user who is using default Kafka names.

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.6...v0.10.0

DataHub v0.9.6.1

Released on 2023-01-31 by @david-leifker.

Release Highlights

Please upgrade from 0.9.6 ASAP to avoid ongoing issues creating and using secrets.

Important Release Notes

With this release, if you are using Neo4J as your graph implementation, you need to set: GRAPH_SERVICE_DIFF_MODE_ENABLED=false

For GMS (or MAE Consumer for standalone mode).

Bug fix for secrets encryption

  • Prevents decryption errors for existing secrets
  • Affects reading ingestion secret created with a previous release
  • Affects native user password validation

What's Changed

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.6...v0.9.6.1

DataHub v0.9.6

Released on 2023-01-13 by @maggiehays.


⚠️ This Release has been patched. Please upgrade to 0.9.6.1 ⚠️

As of January 19th, 2023 0.9.6.1 is now the official release build, and should be used over 0.9.6. Upgrade to 0.9.6.1 when possible to avoid issues creating and using secrets.

</br></br>

Release Highlights

Important Release Notes

With this release, if you are using Neo4J as your graph implementation, you need to set: GRAPH_SERVICE_DIFF_MODE_ENABLED=false

For GMS (or MAE Consumer for standalone mode).

User Experience
  • We now support embedding Dashboards, Charts, and Datasets. This allows us to do things like directly embed Looker / Tableau / Mode / Redash Looks, Dashboards, Explores into the Dataset pages themselves.

image

  • [Experimental] You can now customize the number of queries displayed on the Query tab of a Dataset entity

image

  • Improved error messaging for bulk editing via the UI
Metadata Ingestion
  • Update to data profiling to allow configurable number of sample values to be returned
  • Postgres ingestion now supports emitting lineage edges for Views - shoutout to @LucasRoesler for the contribution!
  • Snowflake ingestion now supports extracting tags - shoutout to @frsann for the contribution!
  • Vertica ingestion now supports projections and lineage- thanks for the contribution, @vishalkSimplify!
  • Glue ingestion now emits an s3 lineage edge when data was written with an s3a/s3n client - thanks for the contribution, @danielli-ziprecruiter!
Developer Experience
  • Fixes quickstart/docker compose issues for M1 machines
  • Improvements in reliability and performance of the Restli Service endpoints for ingestion:
    • Scale Restli Service thread pool based on CPU
    • Add retry (exp backoff) to Restli Entity Client
    • MCE no longer relies on GMS for Restli service
    • Converted Restli Service from standalone servlet to Spring injectable
    • Docker build externalized (significantly faster on m1, <7 minute build times, based on this)
    • Frontend asset generation refactor (causing tests to fail intermittently)

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.5...v0.9.6

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.5...v0.9.6

DataHub v0.9.5

Released on 2022-12-23 by @jjoyce0510.

Release Highlights

Notice: This PR includes a fix for Single Sign-On (OIDC) that was introduced in the previous release, v0.9.4.

Important Release Notes

With this release, if you are using Neo4J as your graph implementation, you need to set: GRAPH_SERVICE_DIFF_MODE_ENABLED=false

For GMS (or MAE Consumer for standalone mode).

User Experience
  • Manual Lineage is LIVE! You can now add and remove lineage between entities in the Lineage Visualization screen, making it easier than ever to manage the complex relationships between your data resources.

ui_lineage_1 ui_lineage_2 ui_lineage_3

  • Our new Views feature makes it easy to create curated sets of Entities within DataHub. This is a great way to start to isolate the entities that matter most, and provide your DataHub end-users with a streamlined view of the assets that are relevant to their use cases. See the original demo video.

create_view sharing_views

  • In-App Product Tours are here! When logging into DataHub and/or visiting a new page type for the first time, new users will be prompted with a helpful walkthrough of core functionality to get them familiar with the platform. We’ll continue to add modules as we roll out new features!

in_app_product_tour

  • Automatically send updates to Slack and/or Microsoft Teams when changes are made within DataHub by leveraging our the new Slack and Teams Actions.
Metadata Ingestion

We’re continuing to improve the user experience for UI-based ingestion for the following sources:

  • DataBricks Unity Catalog
  • dbt Cloud
  • MySQL
  • Trino/Presto
  • Microsoft SQL Server
  • MariaDB

If you’re just getting started with UI-based Ingestion, check out our new BigQuery & Snowflake guides.

Stateful ingestion is now supported for Iceberg (thanks for the contrib, @cccs-Dustin!) and LDAP (thanks for the contrib, @bda618!)

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.4...v0.9.5

[Known Issues] DataHub v0.9.4

Released on 2022-12-20 by @maggiehays.

Known Issues

In this release, the version of our OIDC SSO library was majorly upgraded. There is an issue with how the newer version of the library interacts with OIDC providers. We have addressed this issue in v0.9.5. We recommend avoiding upgrading to this version if your organization is actively using OIDC to manage user authentication.

Important Release Notes

With this release, if you are using Neo4J as your graph implementation, you need to set: GRAPH_SERVICE_DIFF_MODE_ENABLED=false

For GMS (or MAE Consumer for standalone mode).

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.3...v0.9.4

DataHub v0.9.3

Released on 2022-11-30 by @maggiehays.

View the release notes for DataHub v0.9.3 on GitHub.

DataHub v0.9.2

Released on 2022-11-04 by @maggiehays.

View the release notes for DataHub v0.9.2 on GitHub.

DataHub v0.9.1

Released on 2022-10-31 by @maggiehays.

View the release notes for DataHub v0.9.1 on GitHub.

DataHub v0.9.0

Released on 2022-10-11 by @szalai1.

View the release notes for DataHub v0.9.0 on GitHub.

DataHub v0.8.45

Released on 2022-09-23 by @gabe-lyons.

View the release notes for DataHub v0.8.45 on GitHub.

DataHub v0.8.44

Released on 2022-09-01 by @jjoyce0510.

View the release notes for DataHub v0.8.44 on GitHub.

DataHub v0.8.43

Released on 2022-08-09 by @maggiehays.

View the release notes for DataHub v0.8.43 on GitHub.

v0.8.42

Released on 2022-08-03 by @gabe-lyons.

View the release notes for v0.8.42 on GitHub.

v0.8.41

Released on 2022-07-15 by @anshbansal.

View the release notes for v0.8.41 on GitHub.

v0.8.40

Released on 2022-06-30 by @gabe-lyons.

View the release notes for v0.8.40 on GitHub.

v0.8.39

Released on 2022-06-24 by @maggiehays.

View the release notes for v0.8.39 on GitHub.

[!] DataHub v0.8.38

Released on 2022-06-09 by @jjoyce0510.

View the release notes for [!] DataHub v0.8.38 on GitHub.

[!] DataHub v0.8.37

Released on 2022-06-09 by @jjoyce0510.

View the release notes for [!] DataHub v0.8.37 on GitHub.

DataHub V0.8.36

Released on 2022-06-02 by @treff7es.

View the release notes for DataHub V0.8.36 on GitHub.

[!] DataHub v0.8.35

Released on 2022-05-18 by @dexter-mh-lee.

View the release notes for [!] DataHub v0.8.35 on GitHub.

v0.8.34

Released on 2022-05-04 by @maggiehays.

View the release notes for v0.8.34 on GitHub.

DataHub v0.8.33

Released on 2022-04-15 by @dexter-mh-lee.

View the release notes for DataHub v0.8.33 on GitHub.

DataHub v0.8.32

Released on 2022-04-04 by @dexter-mh-lee.

View the release notes for DataHub v0.8.32 on GitHub.

DataHub v0.8.31

Released on 2022-03-17 by @dexter-mh-lee.

View the release notes for DataHub v0.8.31 on GitHub.

Datahub v0.8.30

Released on 2022-03-17 by @rslanka.

View the release notes for Datahub v0.8.30 on GitHub.

DataHub v0.8.29

Released on 2022-03-10 by @shirshanka.

View the release notes for DataHub v0.8.29 on GitHub.

DataHub v0.8.28

Released on 2022-03-07 by @shirshanka.

View the release notes for DataHub v0.8.28 on GitHub.

DataHub Release Candidate v0.8.28 (rc1)

Released on 2022-03-05 by @shirshanka.

View the release notes for DataHub Release Candidate v0.8.28 (rc1) on GitHub.

Release Candidate v0.8.28

Released on 2022-03-05 by @shirshanka.

View the release notes for Release Candidate v0.8.28 on GitHub.

DataHub v0.8.27

Released on 2022-02-23 by @shirshanka.

View the release notes for DataHub v0.8.27 on GitHub.