Skip to main content

Connecting Microsoft Business Central to an Azure Data Lake — Part 4

BC to Azure Data Lake

Jesper Theil Hansen · Apr 6, 2025 · 3 min read · Business Central

Series: Part 1 — Scheduling of Export and Sync | Part 2 — Avoiding Sync Collisions | Part 3 — Duplicate Records After Deadlock Error | Part 4 — Archiving Data to Speed Up Sync


After running the data lake solution for some time, it became apparent that both execution times and costs on the lake side were increasing slowly.

Note: This walkthrough is for the Synapse-based data lake flow. Archiving when using Fabric and delta data backing isn't as important since the notebook-based sync is more efficient and doesn't copy and consolidate all existing data every time.

How the Default Sync Works

The default sync pipeline and dataflow in BC2ADLS works like this:

  1. BC exports new data to /delta folder
  2. Pipeline / Dataflow copies delta files to /staging
  3. Pipeline / Dataflow copies all existing current data to /staging
  4. Deltas are merged with / added to existing data in /staging
  5. Resulting new dataset is copied back from /staging to /data

As more and more data gets added, all new syncs copy and merge with the complete dataset — growing over time.


Archiving Solution

Final synced data is in the /data folder. To archive, split the files into two subfolders:

/data/GLEntry-17
/data
/archive

The archiving is done with a dataflow called SplitArchiveData and a pipeline called ArchiveData. It also uses an integration dataset called data_dataset_split.

Parameters:

ParameterDescription
DateSplitColumnThe name of a date field that determines if the record should be archived
ArchiveDateThe cutoff date — records before this date are moved to archive

Test results on 3299 GLEntry records:

  • After archiving: still 3299 records total when reading recursively from /GLEntry-17 folder
  • 101 records in /data (recent)
  • 2198 records in /archive

Important AL Code Change

The manifest rootLocation needs to be updated so the Consolidation dataflow reads only from /data (not /archive) during syncs.

Original:

DataPartitionPattern.Add('rootLocation', Folder + '/' + EntityName);

Changed to:

if (Folder = 'data') then
DataPartitionPattern.Add('rootLocation', Folder + '/' + EntityName + '/data')
else
DataPartitionPattern.Add('rootLocation', Folder + '/' + EntityName);

What to Archive and When

Archive all transaction tables/ledgers after final close of the fiscal year. Tables that rarely change don't need archiving since the delta-check means consolidation only runs when there are changes.


Files Available

All files are available in the Archive branch of the BC2ADLS fork at github.com/jespertheil/bc2adls/tree/Archive:

FileLocationDescription
CDMUtil.Codeunit.albusinessCentral\app\srcThe AL code change
data_dataset_parquet.jsonsynapse\datasetChanged dataset pointing to live data for consolidation pipelines
SplitArchiveData.jsonsynapse\dataflowNew dataflow that does the actual split
ArchiveData.jsonsynapse\pipelineNew pipeline that prepares data and calls the split dataflow
data_dataset_split.jsonsynapse\datasetNew dataset pointing to the split folder

Screenshots

Folder structure before archiving

SplitArchiveData dataflow

Archive pipeline configuration

Data and archive folder contents

Integration dataset configuration

AL code change for manifest rootLocation


← Part 3 | ← Back to Blog