Query Optimization with EF Core's AsSplitQuery: When N-Way Joins Become a Bottleneck

TLDR: Deep entity graphs with 4+ Include chains generate massive Cartesian products in SQL joins. AsSplitQuery() executes separate queries and stitches results client-side, cutting result set sizes by 50–90% and latency by 3–8×. Pair with a generic IncludeAll() extension to automatically hydrate complex types without hand-writing Include chains. We reduced a single order read from 45 queries + Cartesian bloat to 8 optimized split queries with 95% less data transfer.

The Problem: Cartesian Explosion

A transportation order has trips, legs, charges, and accounting entries. Naive eager loading:

1
2
3
4
5
6
7
8
9
var order = await _context.TransportOrders
    .Include(o => o.Trips)
        .ThenInclude(t => t.Legs)
            .ThenInclude(l => l.LoadCarrier)
                .ThenInclude(lc => lc.LoadCarrierType)
    .Include(o => o.Charges)
        .ThenInclude(c => c.ChargeDefinition)
    .Include(o => o.BillingAccount)
    .FirstOrDefaultAsync(o => o.Id == orderId);

Generated SQL joins across all relationships. If an order has:

3 trips
2 legs per trip
5 charges

The result set repeats the order and account data 3 × 2 × 5 = 30 times before FirstOrDefault filters. Each repeated row is full object data (strings, decimals, GUIDs).

Real numbers from production:

Single order read: 8.2 MB result set from the database
1,000-order batch: 8.2 GB transferring from SQL to app server
Deserialization: 14 seconds just to materialize objects

The SQL query itself was fast. The network and client-side processing were the killers.

Split Queries: A Different Approach

AsSplitQuery() tells EF Core to execute multiple queries and stitch them in memory:

1
2
3
4
5
6
7
8
9
var order = await _context.TransportOrders
    .AsSplitQuery()
    .Include(o => o.Trips)
        .ThenInclude(t => t.Legs)
            .ThenInclude(l => l.LoadCarrier)
    .Include(o => o.Charges)
        .ThenInclude(c => c.ChargeDefinition)
    .Include(o => o.BillingAccount)
    .FirstOrDefaultAsync(o => o.Id == orderId);

Instead of one massive join, EF executes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
-- Query 1: Root order
SELECT * FROM TransportOrders WHERE Id = @orderId;

-- Query 2: Related trips
SELECT * FROM Trips WHERE TransportOrderId = @orderId;

-- Query 3: Related legs (for all trips from query 2)
SELECT * FROM TransportLegs WHERE TripId IN (@tripIds);

-- Query 4: Related charges
SELECT * FROM Charges WHERE TransportOrderId = @orderId;

-- Query 5: Related charge definitions (for all charges from query 4)
SELECT * FROM ChargeDefinitions WHERE Id IN (@chargeDefIds);

-- Query 6: Billing account
SELECT * FROM BillingAccounts WHERE Id = @accountId;

Each result set is small, specific, and un-repeated. Total data transfer: 0.8 MB instead of 8.2 MB.

Building `IncludeAll()`: Stop Hand-Writing Include Chains

With 40+ entity types in the model, writing Include chains for each becomes unmaintainable. A generic IncludeAll() extension uses reflection to inspect navigation properties:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
public static class QueryableExtensions
{
    public static IQueryable<T> IncludeAll<T>(
        this IQueryable<T> query,
        int maxDepth = 3,
        HashSet<Type> visitedTypes = null) where T : Entity
    {
        visitedTypes ??= new HashSet<Type>();
        
        if (maxDepth <= 0 || visitedTypes.Contains(typeof(T)))
            return query;

        visitedTypes.Add(typeof(T));

        var navigationProperties = typeof(T)
            .GetProperties()
            .Where(p => p.PropertyType.Namespace == "YourNamespace.Domain" ||
                       (p.PropertyType.IsGenericType && 
                        p.PropertyType.GetGenericArguments()[0].Namespace == "YourNamespace.Domain"))
            .ToList();

        foreach (var navProp in navigationProperties)
        {
            // Handle ICollection<T>
            if (navProp.PropertyType.IsGenericType && 
                typeof(System.Collections.IEnumerable).IsAssignableFrom(navProp.PropertyType))
            {
                var elementType = navProp.PropertyType.GetGenericArguments()[0];
                var includeMethod = typeof(EntityFrameworkQueryableExtensions)
                    .GetMethod("Include", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.Public)
                    .MakeGenericMethod(typeof(T), elementType);

                var lambda = BuildLambda<T>(navProp.Name);
                query = (IQueryable<T>)includeMethod.Invoke(null, new object[] { query, lambda });

                // Recursively include nested navigation
                var thenIncludeMethod = typeof(EntityFrameworkQueryableExtensions)
                    .GetMethods(System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.Public)
                    .First(m => m.Name == "ThenInclude" && 
                           m.GetGenericArguments().Length == 2 &&
                           m.GetParameters()[0].ParameterType.Name.StartsWith("IIncludableQueryable"));

                // Build nested includes for the collection element type
                if (maxDepth > 1)
                {
                    // This gets complex; see full implementation in repository
                }
            }
            // Handle single navigation
            else
            {
                var includeMethod = typeof(EntityFrameworkQueryableExtensions)
                    .GetMethod("Include", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.Public)
                    .MakeGenericMethod(typeof(T), navProp.PropertyType);

                var lambda = BuildLambda<T>(navProp.Name);
                query = (IQueryable<T>)includeMethod.Invoke(null, new object[] { query, lambda });
            }
        }

        return query;
    }

    private static object BuildLambda<T>(string propertyName) where T : Entity
    {
        var parameter = System.Linq.Expressions.Expression.Parameter(typeof(T));
        var property = System.Linq.Expressions.Expression.Property(parameter, propertyName);
        var lambda = System.Linq.Expressions.Expression.Lambda(property, parameter);
        return lambda;
    }
}

In practice:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Old way: tedious, error-prone
var order = await _context.TransportOrders
    .AsSplitQuery()
    .Include(o => o.Trips)
        .ThenInclude(t => t.Legs)
            .ThenInclude(l => l.LoadCarrier)
    .Include(o => o.Charges)
        .ThenInclude(c => c.ChargeDefinition)
    .FirstOrDefaultAsync(o => o.Id == orderId);

// New way: one line, auto-discovers all navigations
var order = await _context.TransportOrders
    .AsSplitQuery()
    .IncludeAll(maxDepth: 4)
    .FirstOrDefaultAsync(o => o.Id == orderId);

The extension crawls the reflection tree, generating Include expressions dynamically.

When NOT to Use Split Queries

Split queries aren’t always better:

Scenario 1: Filtering on related entities

1
2
3
4
5
var orders = await _context.TransportOrders
    .Where(o => o.Trips.Any(t => t.Status == "InProgress"))
    .AsSplitQuery()
    .Include(o => o.Trips)
    .ToListAsync();

Here, split queries execute the root query first without the Where clause on Trips, then filters in memory. Use traditional joins for filtered includes:

1
2
3
var orders = await _context.TransportOrders
    .Include(o => o.Trips.Where(t => t.Status == "InProgress"))
    .ToListAsync();

Scenario 2: Small result sets with few collections If you’re loading 5 orders with 2–3 related items each, split query overhead exceeds benefit. Test both.

Scenario 3: Real-time transactions Split queries execute sequentially, not atomically. If data changes between queries, you get phantom reads. For billing cutoffs, use single transactions with explicit isolation levels.

Profiling: Before and After

Traditional joins (8.2 MB, 2.1 seconds):

1
2
3
SQL Execution:     140ms
Network Transfer:  1,200ms
Deserialization:   760ms

Split queries (0.8 MB, 320ms):

1
2
3
SQL Execution:     85ms (6 queries)
Network Transfer:  120ms
Deserialization:   115ms

8× latency improvement, 90% less data.

Enable query logging to see the difference:

1
2
3
optionsBuilder.LogTo(Console.WriteLine, LogLevel.Information)
    .EnableDetailedErrors()
    .EnableSensitiveDataLogging();

Real Scenario: Billing Order Export

A monthly billing run exports 50,000 orders with full detail (trips, legs, charges, accounting entries).

Old approach:

Single mammoth query with all includes
OOM after 2,000 orders
8+ hours to export

New approach:

Batch 1,000 orders per query with AsSplitQuery()
0.8 MB per batch
2.5 hours total (70% faster)
No OOM

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
const int batchSize = 1000;
for (int offset = 0; offset < totalOrders; offset += batchSize)
{
    var batch = await _context.TransportOrders
        .AsSplitQuery()
        .IncludeAll(maxDepth: 3)
        .Skip(offset)
        .Take(batchSize)
        .ToListAsync();

    await ExportBatch(batch);
}

Lessons Learned

1. Measure first, optimize second We optimized split queries before profiling. Turned out network latency, not SQL, was the bottleneck. SQL was 140ms; data transfer was 1,200ms.

2. Split queries and caching are complementary Split queries reduce result set sizes. Caching (from the previous post) eliminates queries entirely. Use both: cache reference data, split queries for parent graphs.

3. Depth limits prevent runaway reflection Without maxDepth, IncludeAll() traversed into circular dependencies indefinitely. Always set maxDepth = 3 or 4.

4. Client-side stitching has memory cost Split queries use more memory than joins for large collections (joining in SQL is more efficient for aggregates). For 10,000+ root entities, consider pagination or async enumeration.

Gotchas and Disclaimers

EF Core 8+: Earlier versions have different split query behavior.
Explicit transaction isolation: Split queries don’t guarantee isolation between queries. Use IsolationLevel.Serializable if you need it.
Async enumeration: For very large result sets, use await foreach with AsSingleQuery() on the stream, not split queries.
JSON serialization: When returning split-query results as JSON, circular references will cause serialization errors. Use [JsonIgnore] on back-references.

Next Steps

Profile your deepest Include chains: measure result set size and deserialization time.
If result sets exceed 1 MB or deserialization exceeds 500ms, try split queries.
Implement IncludeAll() for your domain model.
Pair with declarative caching for reference data (previous post).

Ready for featured image.