TLDR: Long-running batch operations (invoice generation, document export) that process thousands of items need explicit transaction scope management, not implicit per-item commits. Use IUnitOfWorkManager to batch 50–100 items per SaveChanges, Hangfire for orchestration, Service Bus for async distribution, and Durable Functions for multi-phase workflows. Report progress every 5% to avoid database thrashing. We cut invoice generation from 45 minutes to 8 minutes for 50,000 orders—and made progress visible without killing the database.
The Problem: Implicit Transactions Don’t Scale
A naive batch operation to generate billing documents:
1
2
3
4
5
6
7
8
9
10
| public async Task GenerateBillingDocumentsAsync(int[] orderIds, CancellationToken cancellationToken)
{
foreach (var orderId in orderIds)
{
var order = await _context.Orders.FindAsync(orderId);
var document = new BillingDocument { OrderId = orderId, Amount = order.Total };
_context.BillingDocuments.Add(document);
await _context.SaveChangesAsync(cancellationToken); // One SaveChanges per order!
}
}
|
For 50,000 orders:
- 50,000 SaveChanges calls
- 50,000 database roundtrips
- 50,000 audit log entries written per-item
- Each
SaveChanges acquires a transaction, validates constraints, writes audit trail, releases lock - Total time: 45 minutes
The API request times out. Background job thrashes the database. Progress is invisible.
The Solution: Explicit Batching with IUnitOfWorkManager
Define a unit of work manager that groups operations:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| public interface IUnitOfWorkManager
{
Task<T> ExecuteInUnitOfWorkAsync<T>(Func<Task<T>> work);
Task ExecuteInUnitOfWorkAsync(Func<Task> work);
void DisableAudit();
void EnableAudit();
}
public class UnitOfWorkManager : IUnitOfWorkManager
{
private readonly BillingContext _context;
private bool _auditEnabled = true;
public async Task ExecuteInUnitOfWorkAsync(Func<Task> work)
{
try
{
_context.ChangeTracker.QueryTrackingBehavior = QueryTrackingBehavior.TrackAll;
await work();
await _context.SaveChangesAsync();
}
catch
{
_context.ChangeTracker.Clear();
throw;
}
}
public void DisableAudit() => _auditEnabled = false;
public void EnableAudit() => _auditEnabled = true;
public bool IsAuditEnabled => _auditEnabled;
}
|
Now refactor the batch operation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
| public async Task GenerateBillingDocumentsAsync(
int[] orderIds,
IProgress<BatchProgress> progress,
CancellationToken cancellationToken)
{
const int batchSize = 100;
var documents = new List<BillingDocument>();
var totalBatches = (orderIds.Length + batchSize - 1) / batchSize;
var lastProgressReport = 0;
for (int i = 0; i < orderIds.Length; i += batchSize)
{
var batch = orderIds.Skip(i).Take(batchSize).ToArray();
await _unitOfWorkManager.ExecuteInUnitOfWorkAsync(async () =>
{
foreach (var orderId in batch)
{
var order = await _context.Orders
.AsNoTracking()
.FirstOrDefaultAsync(o => o.Id == orderId, cancellationToken);
if (order == null) continue;
var document = new BillingDocument
{
OrderId = orderId,
Amount = order.Total,
CreatedAt = DateTime.UtcNow
};
_context.BillingDocuments.Add(document);
}
});
// Report progress at 5% intervals only, not per-item
var currentProgress = (int)((i + batchSize) / (double)orderIds.Length * 100);
if (currentProgress - lastProgressReport >= 5)
{
progress?.Report(new BatchProgress
{
CompletedItems = i + batchSize,
TotalItems = orderIds.Length,
PercentComplete = currentProgress
});
lastProgressReport = currentProgress;
}
}
}
|
Result:
- 500 SaveChanges calls (instead of 50,000)
- 500 database roundtrips (100× fewer)
- 8 minutes (5.6× faster)
- Progress updated every 5%, not per-item
Architecture: Three Phases
For truly large batches (100,000+ items), separate into phases:
Phase 1: Hangfire (API → Background Job)
The API request initiates the job, then returns immediately:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| [ApiController]
[Route("api/billing")]
public class BillingController : ControllerBase
{
private readonly IBackgroundJobClient _jobClient;
[HttpPost("generate-documents")]
public ActionResult QueueBillingGeneration([FromBody] GenerateBillingRequest req)
{
var jobId = _jobClient.Enqueue<BillingService>(
x => x.GenerateBillingDocumentsAsync(req.OrderIds, null, default));
return Accepted(new { jobId });
}
}
|
The Hangfire job batches 100 orders per UoW, writes documents to the database:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| [BackgroundJob]
public async Task GenerateBillingDocumentsAsync(
int[] orderIds,
IJobCancellationToken cancellationToken)
{
var jobId = GetCurrentJobId(); // Hangfire provides this
var progressStore = new HangfireProgressStore(_jobStore);
var progress = new Progress<BatchProgress>(p =>
{
// Report to Hangfire job store every 5%
progressStore.UpdateProgress(jobId, p.PercentComplete);
});
try
{
await GenerateBillingDocumentsAsync(orderIds, progress, cancellationToken.ShutdownToken);
}
catch (Exception ex)
{
progressStore.MarkFailed(jobId, ex.Message);
throw;
}
}
|
Phase 2: Service Bus (Batch → Distribution)
Once documents are created, enqueue them for distribution to external carriers:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| public async Task EnqueueDocumentsForDistributionAsync(
List<int> documentIds,
CancellationToken cancellationToken)
{
const int batchSize = 50;
var sender = _serviceBusClient.CreateSender("carrier-distribution");
for (int i = 0; i < documentIds.Count; i += batchSize)
{
var batch = documentIds.Skip(i).Take(batchSize).ToArray();
var messages = batch.Select(docId => new ServiceBusMessage
{
Body = new BinaryData(new { DocumentId = docId }),
CorrelationId = Guid.NewGuid().ToString(),
TimeToLive = TimeSpan.FromHours(24),
ScheduledEnqueueTime = DateTime.UtcNow
}).ToArray();
await sender.SendMessagesAsync(messages, cancellationToken);
}
}
|
Service Bus handles fanout, retry, and dead-lettering. If a carrier is temporarily down, the message stays in the queue.
Phase 3: Durable Functions (Orchestration + Retry)
For multi-step workflows (create docs → validate → push to carriers → post accounting entries):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
| [FunctionName("DocumentDistributionOrchestrator")]
public static async Task RunOrchestrator(
[OrchestrationTrigger] IDurableOrchestrationContext context)
{
var documentIds = context.GetInput<int[]>();
var activityOptions = new RetryOptions(
firstRetryInterval: TimeSpan.FromSeconds(5),
maxNumberOfAttempts: 3)
{
Handle = ex => ex is HttpRequestException
};
try
{
// Phase 1: Push to external carriers (with 3 retries)
await context.CallActivityWithRetryAsync(
"PushToCarriersAsync",
activityOptions,
documentIds);
// Phase 2: Post accounting entries (with 3 retries)
await context.CallActivityWithRetryAsync(
"PostAccountingEntriesAsync",
activityOptions,
documentIds);
// Phase 3: Update document status (no retries; fail fast if DB issues)
await context.CallActivityAsync("MarkDocumentsCompleteAsync", documentIds);
}
catch (Exception ex)
{
await context.CallActivityAsync("MarkDocumentsFailedAsync",
new { DocumentIds = documentIds, Error = ex.Message });
throw;
}
}
[FunctionName("PushToCarriersAsync")]
public async Task PushToCarriers([ActivityTrigger] int[] documentIds)
{
foreach (var docId in documentIds)
{
var document = await _context.BillingDocuments.FindAsync(docId);
await _carrierClient.PushAsync(document);
}
}
|
Disabling Audit During Batch Operations
Audit logging on every item multiplies database load. Disable during batching:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| public async Task GenerateBillingDocumentsAsync(int[] orderIds, CancellationToken cancellationToken)
{
_unitOfWorkManager.DisableAudit(); // Skip audit writes per-item
try
{
// ... batching loop ...
}
finally
{
_unitOfWorkManager.EnableAudit();
}
// Post-batch: create single audit entry for the entire operation
_auditService.LogBulkOperation(new BulkAuditEntry
{
Operation = "BillingDocumentGeneration",
ItemCount = orderIds.Length,
CompletedAt = DateTime.UtcNow
});
}
|
This keeps audit trail (compliance requirement) while eliminating per-item overhead.
Real Scenario: 50,000-Order Monthly Invoice Run
Timeline (end-to-end):
- T+0m: API call queues Hangfire job, returns
202 Accepted - T+2m: Hangfire reads 50,000 orders from analytics DB, batches by 100
- T+8m: Creates 50,000 billing documents in primary DB (8 units of work)
- T+8.5m: Enqueues all documents to Service Bus (fire-and-forget)
- T+8.5m–T+12m: Service Bus distributes to 500 carriers in parallel
- T+12m–T+18m: External carriers process, return EDI confirmations
- T+18m–T+22m: Durable Functions post accounting entries per carrier batch
- T+22m: Mark documents complete, update billing account status
Total: 22 minutes (vs 45 minutes without batching + async phases)
Lessons Learned: The Hard Way
1. Batch size is not universal
We used batch size = 50 initially. For orders with many charges, creating 50 documents per UoW overwhelmed the change tracker. Switched to:
- Simple entities (no navigation): 200 per batch
- Complex entities (many relationships): 50 per batch
- Measure with
context.ChangeTracker.Entries().Count() logging
2. Progress reporting at the wrong granularity kills performance
Reporting progress per-item meant 50,000 database writes to the progress table. Switched to 5% intervals: 20 writes for the entire operation. Progress UI polls the job status endpoint instead of hitting the table per-item.
3. Async operations need explicit timeouts
Service Bus enqueue has no timeout; if the queue backs up, the entire job hangs. Added explicit CancellationToken with timeout after 5 minutes.
4. Rollback at scale is expensive
A constraint violation in item #47,382 doesn’t fail the entire batch in EF Core—just that item. But rolling back and retrying means regenerating work for items 1–47,381. Better: validate before batching, or skip invalid items with logging.
5. Load distribution matters
Running all 50,000 generates in a single Hangfire worker overloaded the DB. Split into 10 parallel jobs (5,000 items each) with staggered start times.
Gotchas and Disclaimers
- EF Core 8+: Earlier versions handle change tracker differently; batching thresholds vary.
- Distributed transactions: Service Bus + Durable Functions + Database are not ACID. Expect eventual consistency. Implement idempotent operations.
- Memory leaks:
context.ChangeTracker in loops holds references. Call context.ChangeTracker.Clear() after each batch. - .NET 9+: Recommended for TimeProvider improvements and better async cancellation.
Next Steps
- Measure current batch operation duration and database impact.
- Introduce explicit UoW batching with size = 100 (tune from there).
- Add progress reporting at 5% intervals, not per-item.
- Offload async distribution to Service Bus + Durable Functions.
Ready for featured image.