Connecting Microsoft Business Central to an Azure Data Lake — Part 4

BC to Azure Data Lake

Jesper Theil Hansen · Apr 6, 2025 · 3 min read · Business Central

Series: Part 1 — Scheduling of Export and Sync | Part 2 — Avoiding Sync Collisions | Part 3 — Duplicate Records After Deadlock Error | Part 4 — Archiving Data to Speed Up Sync

After running the data lake solution for some time, it became apparent that both execution times and costs on the lake side were increasing slowly.

Note: This walkthrough is for the Synapse-based data lake flow. Archiving when using Fabric and delta data backing isn't as important since the notebook-based sync is more efficient and doesn't copy and consolidate all existing data every time.

How the Default Sync Works

The default sync pipeline and dataflow in BC2ADLS works like this:

BC exports new data to /delta folder
Pipeline / Dataflow copies delta files to /staging
Pipeline / Dataflow copies all existing current data to /staging
Deltas are merged with / added to existing data in /staging
Resulting new dataset is copied back from /staging to /data

As more and more data gets added, all new syncs copy and merge with the complete dataset — growing over time.

Archiving Solution

Final synced data is in the /data folder. To archive, split the files into two subfolders:

/data/GLEntry-17
  /data
  /archive

The archiving is done with a dataflow called SplitArchiveData and a pipeline called ArchiveData. It also uses an integration dataset called data_dataset_split.

Parameters:

Parameter	Description
`DateSplitColumn`	The name of a date field that determines if the record should be archived
`ArchiveDate`	The cutoff date — records before this date are moved to archive

Test results on 3299 GLEntry records:

After archiving: still 3299 records total when reading recursively from /GLEntry-17 folder
101 records in /data (recent)
2198 records in /archive

Important AL Code Change

The manifest rootLocation needs to be updated so the Consolidation dataflow reads only from /data (not /archive) during syncs.

Original:

DataPartitionPattern.Add('rootLocation', Folder + '/' + EntityName);

Changed to:

if (Folder = 'data') then
  DataPartitionPattern.Add('rootLocation', Folder + '/' + EntityName + '/data')
else
  DataPartitionPattern.Add('rootLocation', Folder + '/' + EntityName);

What to Archive and When

Archive all transaction tables/ledgers after final close of the fiscal year. Tables that rarely change don't need archiving since the delta-check means consolidation only runs when there are changes.

Files Available

All files are available in the Archive branch of the BC2ADLS fork at github.com/jespertheil/bc2adls/tree/Archive:

File	Location	Description
`CDMUtil.Codeunit.al`	`businessCentral\app\src`	The AL code change
`data_dataset_parquet.json`	`synapse\dataset`	Changed dataset pointing to live data for consolidation pipelines
`SplitArchiveData.json`	`synapse\dataflow`	New dataflow that does the actual split
`ArchiveData.json`	`synapse\pipeline`	New pipeline that prepares data and calls the split dataflow
`data_dataset_split.json`	`synapse\dataset`	New dataset pointing to the split folder