top of page
antoineabastian

Build a Datawarehouse in Azure

Updated: Mar 23, 2023

D365 Finance and Supply Chain Management is a leader in the market. That does not mean there are no limitations. I feel that data visibility is one of them. Finance users need more visibility to control and reclassify GL entries for instance.


Users can use Power BI to get that.


Export data to Azure data lake


I won’t spend much time on it as there are a lot of available resources explaining that.

Log in D365 F&SCM


Navigate to System administration > Setup > Export data to Azure data lake > Export to Data Lake


The list of tables is listed in that form


Activate tables that need to be exported to the Azure data lake Gen2. There is an option to export all tables that are used by a data entity. Go the to the tab Choose using Entities, then select Add tables using Entities


There is a section showing all tables included in a data entity.




Use the Filter by status field to display all tables that are exported to the Azure data lake.



It will export to an Azure data lake Gen2 in near real time in CSV files with a .json metadata file attached.



There is one root folder for D365 Finance and Operations and one root folder for dataverse tables. There is also one sub-folder per environment, then right after there are multiple folders. The most important one is Tables.


Tables are structured following the common data model schema. The sync is near real time thanks to CDC (Change Data Capture) capabilities.



Metadata properties (including column names & data type) are stored in the manifest file.

The partitioning of tables is also managed automatically by Microsoft. For dataverse tables, the portioning is done monthly.


Every time new or updated data will occur in D365 F&O, CSV data will be up-to-date too.

You may also have data in dataverse if you use mapping tables for integration for instance.


Azure Synapse Link for Dataverse


Azure Synapse Link for Dataverse provides a continuous pipeline of data from Dataverse to Azure Synapse Analytics. Any data or metadata changes in Dataverse are automatically pushed to the Azure Synapse Analytics meta store and Azure Data Lake.


Select the Azure data lake Gen2



Figure 1: Active tables

Only tables that have change tracking enabled can be exported


Click Manage tables to add or remove tables that need be synchronized with the Azure data lake Gen2 and Synapse.


This feature creates a Data Lakehouse. Data are stored in the data lake, and we can query tables directly via TSQL from Apache Spark behind. It follows a serverless mode which means no SQL database is provisioned.


One lake database is created for fetching data from Dataverse


If you don’t want to use the serverless option in Azure Synapse Analytics, you can create a dedicated SQL pool. Dedicated SQL pools represent a collection of resources (CPU, memory, and IO) that leverages a scale-out architecture to distribute computational processing of data across multiple nodes (parallel processing). Dedicated SQL pools need to be provisioned.


Data architecture


You can use two schema, one for ingesting raw data and one for your data warehouse model.

I am using dataflows for large tables. Incremental refresh is used to write only new rows.


Two sources of data need to be fetched:


· The original files (ADLSOURCE)

· The target table (QueryOdsGeneralJournalAccountEntry)

1) Retrieve target table (Synapse table)


Select Query input


Use a Select statement to fetch data. Don’t use * but list all fields that you need to retrieve


Select Import projection


Activate Enable staging


· Add hash columns to apply a row signature to data coming from the Synapse table (= target table)


Add a derived column component


Name the new column “Fingerprint1” and use the following expression:

Sha2(256,columns())



2) Ingest data from the azure data lake Gen2


Select Inline dataset type = Common Data Model


Select the linked service you configured to connect to the Azure data lake Gen2

Go to Source options


Metadata format = Manifest


Select in which repository the manifest file is stored


Activate Change data capture to only ingest files that were modified. It avoids reading all files once again and it improves performance.


Entity = table name

Add a SELECT component to only select the columns you need from the source files

Add a derived column for the source file to add a hash column


The purpose is to have a row signature for data coming from the source file

Name the new column “Fingerprint” and use the following expression:

Sha2(256,columns())

Searched new rows by comparing row signatures from both data sources


Add an Exists component


Exist type = Doesn’t exist

Condition : Fingerprint1 = Fingerprint

Write new rows in the sink table


Configure a dataset for the target table


Go to Settings


Tick Allow insert


Data Warehouse


Large fact tables were created to store transactional or measured data, and dimensional tables were created to store attributes about the data.

One pipeline is dedicated to load data in dimension tables and another one is dedicated to load data in Fact tables.

Stored procedures were chosen for dimension tables:


SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE PROC [dwh].[SP_Load_Dim_OMBusinessUnit] AS


--Empty TB

Truncate table [dwh].[Dim_OMBusinessUnit]


---Load TB

insert into [dwh].[Dim_OMBusinessUnit]

(

[KEY_]

,[VALUE]

,[NAME]

,[OPENINGDATE]

,[CLOSINGDATE]

,[BRAND]

,[BUSINESSUNITTYPES]

,[CHAIN]

,[CONCEPT]

,[ADDRESSSTREET]

,[ADDRESSSTATE]

,[ADDRESSCITY]

,[ADDRESSZIPCODE]

,[ADDRESSCOUNTRYREGIONID]

,[REPORTINGADDRESS]

,[ISA_EDICORNERGLN]

,[ISASTORENUMBER]

,[HANSTOREGLN]

)

SELECT T1.RECID AS KEY_,

T1.OMOPERATINGUNITNUMBER AS VALUE,

T2.NAME AS NAME,

OMB.HANOPENINGDATE,

OMB.HANCLOSINGDATE,

OMB.HANBRAND,

OMB.HANBUSINESSUNITTYPES,

OMB.HANCHAIN,

OMB.HANCONCEPT,

OMB.[ADDRESSSTREET],

OMB.[ADDRESSSTATE],

OMB.[ADDRESSCITY],

OMB.[ADDRESSZIPCODE],

OMB.[ADDRESSCOUNTRYREGIONID],

FROM [ods].[DirPartyTable] AS T1 CROSS JOIN [ods].[DirPartyTable] AS T2

LEFT JOIN [ods].[OMOperatingUnit] AS OMB ON OMB.OPERATINGUNITNUMBER = T1.OMOPERATINGUNITNUMBER

WHERE (((T1.OMOPERATINGUNITTYPE = 4)

AND ((T1.RECID = T2.RECID)

))

AND T1.INSTANCERELATIONTYPE IN (10397))

GO


Dataflows and stored procedures were chosen for Fact tables.


Another article will explain that in details.

Data model


We applied a star schema to build the accounting datamart.


The following tables are loaded:


1 Fact table :


· Fact_Financialtransactions


10 dimensions tables :


· Dim_FiscalCalendar

· Dim_Anaplan_MainAccount

· Dim_MainAccount

· Dim_Customer

· Dim_Vendor

· Dim_FiscalCalendar

· Dim_OMBusinessUnit

· Dim_OMDepartment

· Dim_OMFinancialtag

· LedgerPosting





176 views0 comments

Recent Posts

See All

コメント


bottom of page