antoineabastian

Jan 24, 20234 min read

Build a Datawarehouse in Azure

Updated: Mar 23, 2023

D365 Finance and Supply Chain Management is a leader in the market. That does not mean there are no limitations. I feel that data visibility is one of them. Finance users need more visibility to control and reclassify GL entries for instance.

Users can use Power BI to get that.

Export data to Azure data lake

I won’t spend much time on it as there are a lot of available resources explaining that.

Navigate to System administration > Setup > Export data to Azure data lake > Export to Data Lake

The list of tables is listed in that form

Activate tables that need to be exported to the Azure data lake Gen2. There is an option to export all tables that are used by a data entity. Go the to the tab Choose using Entities, then select Add tables using Entities

There is a section showing all tables included in a data entity.

Use the Filter by status field to display all tables that are exported to the Azure data lake.

Link: https://learn.microsoft.com/en-us/dynamics365/fin-ops-core/dev-itpro/data-entities/configure-export-data-lake

It will export to an Azure data lake Gen2 in near real time in CSV files with a .json metadata file attached.

There is one root folder for D365 Finance and Operations and one root folder for dataverse tables. There is also one sub-folder per environment, then right after there are multiple folders. The most important one is Tables.

Tables are structured following the common data model schema. The sync is near real time thanks to CDC (Change Data Capture) capabilities.

Link: https://learn.microsoft.com/en-us/common-data-model/

Metadata properties (including column names & data type) are stored in the manifest file.

The partitioning of tables is also managed automatically by Microsoft. For dataverse tables, the portioning is done monthly.

Every time new or updated data will occur in D365 F&O, CSV data will be up-to-date too.

You may also have data in dataverse if you use mapping tables for integration for instance.

Azure Synapse Link for Dataverse

Azure Synapse Link for Dataverse provides a continuous pipeline of data from Dataverse to Azure Synapse Analytics. Any data or metadata changes in Dataverse are automatically pushed to the Azure Synapse Analytics meta store and Azure Data Lake.

Select the Azure data lake Gen2

Figure 1: Active tables

Only tables that have change tracking enabled can be exported

Click Manage tables to add or remove tables that need be synchronized with the Azure data lake Gen2 and Synapse.

This feature creates a Data Lakehouse. Data are stored in the data lake, and we can query tables directly via TSQL from Apache Spark behind. It follows a serverless mode which means no SQL database is provisioned.

One lake database is created for fetching data from Dataverse

If you don’t want to use the serverless option in Azure Synapse Analytics, you can create a dedicated SQL pool. Dedicated SQL pools represent a collection of resources (CPU, memory, and IO) that leverages a scale-out architecture to distribute computational processing of data across multiple nodes (parallel processing). Dedicated SQL pools need to be provisioned.

Data architecture

You can use two schema, one for ingesting raw data and one for your data warehouse model.

I am using dataflows for large tables. Incremental refresh is used to write only new rows.

Two sources of data need to be fetched:

· The original files (ADLSOURCE)

· The target table (QueryOdsGeneralJournalAccountEntry)

1) Retrieve target table (Synapse table)

Select Query input

Use a Select statement to fetch data. Don’t use * but list all fields that you need to retrieve

Select Import projection

Activate Enable staging

· Add hash columns to apply a row signature to data coming from the Synapse table (= target table)

Add a derived column component

Name the new column “Fingerprint1” and use the following expression:

Sha2(256,columns())

2) Ingest data from the azure data lake Gen2

Select Inline dataset type = Common Data Model

Select the linked service you configured to connect to the Azure data lake Gen2

Go to Source options

Metadata format = Manifest

Select in which repository the manifest file is stored

Activate Change data capture to only ingest files that were modified. It avoids reading all files once again and it improves performance.

Entity = table name

Add a SELECT component to only select the columns you need from the source files

Add a derived column for the source file to add a hash column

The purpose is to have a row signature for data coming from the source file

Name the new column “Fingerprint” and use the following expression:

Sha2(256,columns())

Searched new rows by comparing row signatures from both data sources

Add an Exists component

Exist type = Doesn’t exist

Condition : Fingerprint1 = Fingerprint

Write new rows in the sink table

Configure a dataset for the target table

Go to Settings

Tick Allow insert

Data Warehouse

Large fact tables were created to store transactional or measured data, and dimensional tables were created to store attributes about the data.

One pipeline is dedicated to load data in dimension tables and another one is dedicated to load data in Fact tables.

Stored procedures were chosen for dimension tables:

SET ANSI_NULLS ON

SET QUOTED_IDENTIFIER ON

CREATE PROC [dwh].[SP_Load_Dim_OMBusinessUnit] AS

--Empty TB

Truncate table [dwh].[Dim_OMBusinessUnit]

---Load TB

insert into [dwh].[Dim_OMBusinessUnit]

(

[KEY_]

,[VALUE]

,[NAME]

,[OPENINGDATE]

,[CLOSINGDATE]

,[BRAND]

,[BUSINESSUNITTYPES]

,[CHAIN]

,[CONCEPT]

,[ADDRESSSTREET]

,[ADDRESSSTATE]

,[ADDRESSCITY]

,[ADDRESSZIPCODE]

,[ADDRESSCOUNTRYREGIONID]

,[REPORTINGADDRESS]

,[ISA_EDICORNERGLN]

,[ISASTORENUMBER]

,[HANSTOREGLN]

)

SELECT T1.RECID AS KEY_,

T1.OMOPERATINGUNITNUMBER AS VALUE,

T2.NAME AS NAME,

OMB.HANOPENINGDATE,

OMB.HANCLOSINGDATE,

OMB.HANBRAND,

OMB.HANBUSINESSUNITTYPES,

OMB.HANCHAIN,

OMB.HANCONCEPT,

OMB.[ADDRESSSTREET],

OMB.[ADDRESSSTATE],

OMB.[ADDRESSCITY],

OMB.[ADDRESSZIPCODE],

OMB.[ADDRESSCOUNTRYREGIONID],

FROM [ods].[DirPartyTable] AS T1 CROSS JOIN [ods].[DirPartyTable] AS T2

LEFT JOIN [ods].[OMOperatingUnit] AS OMB ON OMB.OPERATINGUNITNUMBER = T1.OMOPERATINGUNITNUMBER

WHERE (((T1.OMOPERATINGUNITTYPE = 4)

AND ((T1.RECID = T2.RECID)

))

AND T1.INSTANCERELATIONTYPE IN (10397))

Dataflows and stored procedures were chosen for Fact tables.

Another article will explain that in details.

Data model

We applied a star schema to build the accounting datamart.

The following tables are loaded:

1 Fact table :

· Fact_Financialtransactions

10 dimensions tables :

· Dim_FiscalCalendar

· Dim_Anaplan_MainAccount

· Dim_MainAccount

· Dim_Customer

· Dim_Vendor

· Dim_FiscalCalendar

· Dim_OMBusinessUnit

· Dim_OMDepartment

· Dim_OMFinancialtag

· LedgerPosting