Today, the Entity Framework Core team announces the fourth preview release of Entity Framework Core 6.0. The main theme of this release is performance – and we’ll concentrate on that below – details on getting EF Core 6.0 preview 4 are at the end of this blog post.
The short and sweet summary:
- EF Core 6.0 performance is now 70% faster on the industry-standard Tech Empower Fortunes benchmark, compared to 5.0.
- This is the full-stack perf improvement, including improvements in the benchmark code, the .NET runtime, etc. EF Core 6.0 itself is 31% faster executing queries.
- Heap allocations have been reduced by 43%.
The runtime perf push
When the EF Core team started the planning process for version 6.0, we knew it was finally time to address an important area. After several years spent delivering new EF Core features, stabilizing the product and progressively narrowing the feature gap with previous versions of Entity Framework, we wanted to put an emphasis on performance and to see exactly where we could go. In previous work iterations, a lot of attention was given to the lower layers of the stack: on the industry-standard TechEmpower Fortunes benchmark, .NET already scored very high, reaching 12th place overall (running on Linux against the PostgreSQL database). But while performance was always in our minds while working on EF Core, we hadn’t done a proper optimization push at that level of the stack.
When working on perf, it’s usually a good idea to have a target you can work to, even if it’s a somewhat arbitrary one. For 6.0, the goal we set ourselves was to get as close as possible to the performance of Dapper on the Fortunes benchmark. For those unfamiliar with it, Dapper is a popular, lightweight, performance-oriented .NET object mapper maintained (and used) by the folks over at Stack Overflow; it requires you to write your own SQL and doesn’t have many of the features of EF Core – it is sometimes referred to as a “micro-ORM” – but is an extremely useful data access tool. The EF Core team loves it, and we don’t think EF Core should be the answer to every .NET data need out there. So being lightweight and performance-oriented as it is, Dapper provided us with inspiration and a number to strive to reach.
At the end of this iteration, the gap between Dapper and EF Core in the TechEmpower Fortunes benchmark narrowed from 55% to around a little under 5%. We hope this shows that EF Core can be a good option for performance-aware applications, and that ORMs and data layers aren’t necessarily “inefficient beasts” which should be avoided. It’s worth mentioning that the benchmark executes a LINQ query – not raw SQL – so many of the benefits of EF Core are being preserved (e.g. statically-typed queries!) while sacrificing very little perf.
A final, general note on performance. The numbers and improvements reported in this article are for a very specific scenario (TechEmpower Fortunes), using non-tracking queries only (no change tracking, no updates); the benchmarks were executed on a high-performance, low-latency setup. Real-world application scenarios will most probably show very different results, as the runtime overhead of executing queries would be dominated by network or database I/O times. In fact, we believe that for many real-world applications, the runtime overhead added by something like EF Core is likely to play a very minor role next to other, more important factors influencing perf. Keep this in mind when thinking about your data access.
With that out of the way, let’s dive into some of the optimizations done for EF Core 6.0. For those interested, the full list of improvements done in this optimization round is available, along with detailed measurements.
Pooling and recycling, DbContext and beyond
Recycling and pooling are central to good performance: they reduce the work needed to create and dispose resources, and typically lower heap allocations as well, which reduces pressure on the garbage collector. All EF Core users are familiar with the DbContext class – this is the main entry point for performing most operations; you can instantiate one (or get one from dependency injection), use it to perform a few database operations (“unit of work”), and then dispose it. While instantiating new DbContexts is fine in a typical application, the overhead of doing so in high-performance scenarios can be significant: DbContext works with a whole set of internal services – via an internal dependency injection mechanism – which coordinate together to make everything work; setting all that up takes time.
For this reason, EF Core has supported DbContext pooling for quite a while. The idea is simple: when you’re done with a DbContext, rather than disposing it (and all its dependent services), EF Core resets its state and then allows it to be reused later. And of course, our benchmark implementation for TechEmpower Fortunes already had this feature turned on; so… why was my profiler showing me considerable time spent creating and wiring together new DbContext instances?
In almost all cases, you want to place an upper bound on the number of instances you pool. An unbounded pool can suddenly fill up with a huge number of objects, which can take up considerably resources (memory or otherwise) and which may stick around indefinitely, depending on your pruning strategy. In the EF Core case, the default upper bound for pooled DbContext instances was 128 – going beyond that number meant falling back to instantiation and disposal. Now, while 128 should be fine for most applications – it’s not common to have 128 contexts active simultaneously – it definitely wasn’t enough for TechEmpower Fortunes; and hiking that number up to 1024 yielded a 23% improvement in benchmarkthroughput. This, of course, isn’t an improvement in EF Core itself, but it did lead us to increase the default, and we will probably start emitting a warning if the upper bound is surpassed. Finally, since DbContext pooling proved to be so important in this case, we made the feature accessible to applications not using dependency injection as well. I think this is a nice example of where even a minor benchmark misconfiguration can feed into useful product improvements.
But DbContext isn’t everything. When executing a query, EF Core makes use of various objects, including ADO.NET objects such as
DbDatareader, and various internal objects for these. When all these instances showed high up in memory profiling, more recycling was clearly in order! As a result, each DbContext now has its own, dedicated set of instances of all these, which it reuses every time. This reusable graph of objects – rooted at the DbContext pool – extends all the way down into the PostgreSQL database provider (Npgsql), a good demonstration of an optimization that reaches across the layers of the stack. This change alone reduces the total bytes allocated for query execution by 22%.
EF Core includes a lot of extension points, which allows users to get information about – and hook into – various stages of query execution. For example, to execute the SQL query against a relational database, EF Core calls DbCommand.ExecuteReaderAsync; it can log an event both before and after this call (allows users to see SQL statements before they get executed, and with their running times afterwards), write a DiagnosticSource event, and call into a user-configured command interceptor which allows the user to manipulate the command before it gets executed. While this provides a powerful and flexible set of extension points, this doesn’t come cheap: a single query execution has 7 events, each with 2 extension points (one before, one after). The cost of continuously checking whether logging is enabled or whether a DiagnosticListener is registered started showing up in profiling sessions!
One initial idea we considered was a global flag to disable all logging; this would be the simplest solution and would also provide the best performance possible. However, this approach had two drawbacks:
- It would be a sort of high-perf opt-in: it needs to be discovered and turned on. Wherever possible, we prefer to improve EF Core for everyone – out of the box.
- It would be all or nothing. If you, say, just want to get SQL statements logged, you can’t do that without paying the price for all the other extension points as well.
The solution we ended up implementing was to check whether any sort of logging or interception is enabled, and if not, suppress logging for that event for 1 second by default. This improved benchmark throughput by around 7% – very close to the global flag solution – while at the same time bringing the perf benefit to all EF Core users, without an opt-in. If, say, a DiagnosticListener is registered at some point during program execution, it may take up to a second for events to start appearing there; that seemed like a very reasonable trade-off for the speed-up.
Opting out of thread-safety checks
While logging suppression offered an internal optimization that’s transparent to users, our third case was different.
As hopefully everyone knows, EF Core’s DbContext isn’t thread-safe; for one thing, it encapsulates a database connection, which itself almost never allows concurrent usage. Now, although concurrent access of a DbContext instance is a programmer bug, EF Core includes an internal thread safety mechanism, which tries to detect when this happens, and throws an informative exception. This goes a long way to help EF Core users find accidental bugs, and also to make new users aware that DbContext isn’t thread-safe. This mechanism works on a best-effort basis – we made no attempt to make it detect all possible concurrency violations, since that would probably hurt performance in a significant way.
Now, the thread safety check mechanism itself didn’t show up as significant when profiling – something else did. To support some query scenarios, this check needs to be reentrant: it’s OK to start a 2nd query, as long as it’s part of the 1st query. And since EF Core supports asynchronous query execution, an
AsyncLocal is used to flow the locking state across the threads that participate in the query. It turned out that using this AsyncLocal caused quite a few heap allocations to occur, and reduced benchmark throughput in a considerable way.
After some discussion, we decided to introduce an opt-out flag from thread-safety checks. Unlike with logging, there is no way for EF Core to know when the checks are needed, and when they aren’t; and we definitely want to prioritize reliability and easier debugging, so turning the check off by default was out of the question. Once users have tested that their application works well in production and they are confident that no concurrency bugs exist, they can choose to disable this particular protection; for our TechEmpower Fortunes benchmark, doing so yielded a 6.7% throughput improvement.
None of the above is a dramatic change, or a fundamental re-designing of EF Core’s internal architecture. Fortunately, the EF Core query pipeline was already conceived with perf in mind: after a first expensive “compilation” when a query is first seen, EF Core caches both the query’s SQL and a code-generated materializer, which is the piece of code responsible for reading results from the database and instantiating your objects from them. This means that once an application reaches steady state, the heavy lifting has already been done, and EF Core has very little work left to do; and that’s how EF Core is able to perform well.
I hope the above has been an interesting read into the optimizations that have gone into EF Core 6.0, and has provided a glimpse into the internals; the full list of optimizations is available for those who want to dive deeper. To make your EF Core application perform better, please take a look at our performance docs, including this new guidance for high-perf scenarios based on this optimization effort.
What’s next? Well, performance work is never done. In addition to the above, EF Core 6.0 will also deliver other types of performance improvements, including various SQL generation improvements and optimized models, which should improve startup times for applications with lots of entities. We also have plans for continued future improvements, especially in areas of EF Core which weren’t covered in this optimization cycle (e.g. the update pipeline, change tracking).
Finally, I’d like to thank Sébastien Ros and the Crank performance infrastructure, without which this optimization work wouldn’t have been possible, and the EF Core team for their patience with me making their code more convoluted.
How to get EF Core 6.0 previews
EF Core is distributed exclusively as a set of NuGet packages. For example, to add the SQL Server provider to your project, you can use the following command using the dotnet tool:
dotnet add package Microsoft.EntityFrameworkCore.SqlServer --version 6.0.0-preview.4.21253.1
This following table links to the preview 4 versions of the EF Core packages and describes what they are used for.
|The main EF Core package that is independent of specific database providers
|Database provider for Microsoft SQL Server and SQL Azure
|SQL Server support for spatial types
|Database provider for SQLite that includes the native binary for the database engine
|Database provider for SQLite without a packaged native binary
|SQLite support for spatial types
|Database provider for Azure Cosmos DB
|The in-memory database provider
|EF Core PowerShell commands for the Visual Studio Package Manager Console; use this to integrate tools like scaffolding and migrations with Visual Studio
|Shared design-time components for EF Core tools
|Lazy-loading and change-tracking proxies
|Decoupled EF Core abstractions; use this for features like extended data annotations defined by EF Core
|Shared EF Core components for relational database providers
|C# analyzers for EF Core