Featured image of post Query Optimization with EF Core's AsSplitQuery: When N-Way Joins Become a Bottleneck

Query Optimization with EF Core's AsSplitQuery: When N-Way Joins Become a Bottleneck

TLDR: Deep entity graphs with 4+ Include chains generate massive Cartesian products in SQL joins. AsSplitQuery() executes separate queries and stitches results client-side, cutting result set sizes by 50–90% and latency by 3–8×. Pair with a generic IncludeAll() extension to automatically hydrate complex types without hand-writing Include chains. We reduced a single order read from 45 queries + Cartesian bloat to 8 optimized split queries with 95% less data transfer.


The Problem: Cartesian Explosion

A transportation order has trips, legs, charges, and accounting entries. Naive eager loading:

1
2
3
4
5
6
7
8
9
var order = await _context.TransportOrders
    .Include(o => o.Trips)
        .ThenInclude(t => t.Legs)
            .ThenInclude(l => l.LoadCarrier)
                .ThenInclude(lc => lc.LoadCarrierType)
    .Include(o => o.Charges)
        .ThenInclude(c => c.ChargeDefinition)
    .Include(o => o.BillingAccount)
    .FirstOrDefaultAsync(o => o.Id == orderId);

Generated SQL joins across all relationships. If an order has:

  • 3 trips
  • 2 legs per trip
  • 5 charges

The result set repeats the order and account data 3 × 2 × 5 = 30 times before FirstOrDefault filters. Each repeated row is full object data (strings, decimals, GUIDs).

Real numbers from production:

  • Single order read: 8.2 MB result set from the database
  • 1,000-order batch: 8.2 GB transferring from SQL to app server
  • Deserialization: 14 seconds just to materialize objects

The SQL query itself was fast. The network and client-side processing were the killers.


Split Queries: A Different Approach

AsSplitQuery() tells EF Core to execute multiple queries and stitch them in memory:

1
2
3
4
5
6
7
8
9
var order = await _context.TransportOrders
    .AsSplitQuery()
    .Include(o => o.Trips)
        .ThenInclude(t => t.Legs)
            .ThenInclude(l => l.LoadCarrier)
    .Include(o => o.Charges)
        .ThenInclude(c => c.ChargeDefinition)
    .Include(o => o.BillingAccount)
    .FirstOrDefaultAsync(o => o.Id == orderId);

Instead of one massive join, EF executes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
-- Query 1: Root order
SELECT * FROM TransportOrders WHERE Id = @orderId;

-- Query 2: Related trips
SELECT * FROM Trips WHERE TransportOrderId = @orderId;

-- Query 3: Related legs (for all trips from query 2)
SELECT * FROM TransportLegs WHERE TripId IN (@tripIds);

-- Query 4: Related charges
SELECT * FROM Charges WHERE TransportOrderId = @orderId;

-- Query 5: Related charge definitions (for all charges from query 4)
SELECT * FROM ChargeDefinitions WHERE Id IN (@chargeDefIds);

-- Query 6: Billing account
SELECT * FROM BillingAccounts WHERE Id = @accountId;

Each result set is small, specific, and un-repeated. Total data transfer: 0.8 MB instead of 8.2 MB.


Building IncludeAll(): Stop Hand-Writing Include Chains

With 40+ entity types in the model, writing Include chains for each becomes unmaintainable. A generic IncludeAll() extension uses reflection to inspect navigation properties:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
public static class QueryableExtensions
{
    public static IQueryable<T> IncludeAll<T>(
        this IQueryable<T> query,
        int maxDepth = 3,
        HashSet<Type> visitedTypes = null) where T : Entity
    {
        visitedTypes ??= new HashSet<Type>();
        
        if (maxDepth <= 0 || visitedTypes.Contains(typeof(T)))
            return query;

        visitedTypes.Add(typeof(T));

        var navigationProperties = typeof(T)
            .GetProperties()
            .Where(p => p.PropertyType.Namespace == "YourNamespace.Domain" ||
                       (p.PropertyType.IsGenericType && 
                        p.PropertyType.GetGenericArguments()[0].Namespace == "YourNamespace.Domain"))
            .ToList();

        foreach (var navProp in navigationProperties)
        {
            // Handle ICollection<T>
            if (navProp.PropertyType.IsGenericType && 
                typeof(System.Collections.IEnumerable).IsAssignableFrom(navProp.PropertyType))
            {
                var elementType = navProp.PropertyType.GetGenericArguments()[0];
                var includeMethod = typeof(EntityFrameworkQueryableExtensions)
                    .GetMethod("Include", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.Public)
                    .MakeGenericMethod(typeof(T), elementType);

                var lambda = BuildLambda<T>(navProp.Name);
                query = (IQueryable<T>)includeMethod.Invoke(null, new object[] { query, lambda });

                // Recursively include nested navigation
                var thenIncludeMethod = typeof(EntityFrameworkQueryableExtensions)
                    .GetMethods(System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.Public)
                    .First(m => m.Name == "ThenInclude" && 
                           m.GetGenericArguments().Length == 2 &&
                           m.GetParameters()[0].ParameterType.Name.StartsWith("IIncludableQueryable"));

                // Build nested includes for the collection element type
                if (maxDepth > 1)
                {
                    // This gets complex; see full implementation in repository
                }
            }
            // Handle single navigation
            else
            {
                var includeMethod = typeof(EntityFrameworkQueryableExtensions)
                    .GetMethod("Include", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.Public)
                    .MakeGenericMethod(typeof(T), navProp.PropertyType);

                var lambda = BuildLambda<T>(navProp.Name);
                query = (IQueryable<T>)includeMethod.Invoke(null, new object[] { query, lambda });
            }
        }

        return query;
    }

    private static object BuildLambda<T>(string propertyName) where T : Entity
    {
        var parameter = System.Linq.Expressions.Expression.Parameter(typeof(T));
        var property = System.Linq.Expressions.Expression.Property(parameter, propertyName);
        var lambda = System.Linq.Expressions.Expression.Lambda(property, parameter);
        return lambda;
    }
}

In practice:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Old way: tedious, error-prone
var order = await _context.TransportOrders
    .AsSplitQuery()
    .Include(o => o.Trips)
        .ThenInclude(t => t.Legs)
            .ThenInclude(l => l.LoadCarrier)
    .Include(o => o.Charges)
        .ThenInclude(c => c.ChargeDefinition)
    .FirstOrDefaultAsync(o => o.Id == orderId);

// New way: one line, auto-discovers all navigations
var order = await _context.TransportOrders
    .AsSplitQuery()
    .IncludeAll(maxDepth: 4)
    .FirstOrDefaultAsync(o => o.Id == orderId);

The extension crawls the reflection tree, generating Include expressions dynamically.


When NOT to Use Split Queries

Split queries aren’t always better:

Scenario 1: Filtering on related entities

1
2
3
4
5
var orders = await _context.TransportOrders
    .Where(o => o.Trips.Any(t => t.Status == "InProgress"))
    .AsSplitQuery()
    .Include(o => o.Trips)
    .ToListAsync();

Here, split queries execute the root query first without the Where clause on Trips, then filters in memory. Use traditional joins for filtered includes:

1
2
3
var orders = await _context.TransportOrders
    .Include(o => o.Trips.Where(t => t.Status == "InProgress"))
    .ToListAsync();

Scenario 2: Small result sets with few collections If you’re loading 5 orders with 2–3 related items each, split query overhead exceeds benefit. Test both.

Scenario 3: Real-time transactions Split queries execute sequentially, not atomically. If data changes between queries, you get phantom reads. For billing cutoffs, use single transactions with explicit isolation levels.


Profiling: Before and After

Traditional joins (8.2 MB, 2.1 seconds):

1
2
3
SQL Execution:     140ms
Network Transfer:  1,200ms
Deserialization:   760ms

Split queries (0.8 MB, 320ms):

1
2
3
SQL Execution:     85ms (6 queries)
Network Transfer:  120ms
Deserialization:   115ms

8× latency improvement, 90% less data.

Enable query logging to see the difference:

1
2
3
optionsBuilder.LogTo(Console.WriteLine, LogLevel.Information)
    .EnableDetailedErrors()
    .EnableSensitiveDataLogging();

Real Scenario: Billing Order Export

A monthly billing run exports 50,000 orders with full detail (trips, legs, charges, accounting entries).

Old approach:

  • Single mammoth query with all includes
  • OOM after 2,000 orders
  • 8+ hours to export

New approach:

  • Batch 1,000 orders per query with AsSplitQuery()
  • 0.8 MB per batch
  • 2.5 hours total (70% faster)
  • No OOM
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
const int batchSize = 1000;
for (int offset = 0; offset < totalOrders; offset += batchSize)
{
    var batch = await _context.TransportOrders
        .AsSplitQuery()
        .IncludeAll(maxDepth: 3)
        .Skip(offset)
        .Take(batchSize)
        .ToListAsync();

    await ExportBatch(batch);
}

Lessons Learned

1. Measure first, optimize second We optimized split queries before profiling. Turned out network latency, not SQL, was the bottleneck. SQL was 140ms; data transfer was 1,200ms.

2. Split queries and caching are complementary Split queries reduce result set sizes. Caching (from the previous post) eliminates queries entirely. Use both: cache reference data, split queries for parent graphs.

3. Depth limits prevent runaway reflection Without maxDepth, IncludeAll() traversed into circular dependencies indefinitely. Always set maxDepth = 3 or 4.

4. Client-side stitching has memory cost Split queries use more memory than joins for large collections (joining in SQL is more efficient for aggregates). For 10,000+ root entities, consider pagination or async enumeration.


Gotchas and Disclaimers

  • EF Core 8+: Earlier versions have different split query behavior.
  • Explicit transaction isolation: Split queries don’t guarantee isolation between queries. Use IsolationLevel.Serializable if you need it.
  • Async enumeration: For very large result sets, use await foreach with AsSingleQuery() on the stream, not split queries.
  • JSON serialization: When returning split-query results as JSON, circular references will cause serialization errors. Use [JsonIgnore] on back-references.

Next Steps

  1. Profile your deepest Include chains: measure result set size and deserialization time.
  2. If result sets exceed 1 MB or deserialization exceeds 500ms, try split queries.
  3. Implement IncludeAll() for your domain model.
  4. Pair with declarative caching for reference data (previous post).

Ready for featured image.

All rights reserved
Built with Hugo
Theme Stack designed by Jimmy