SqlServer.Core: Performance-oriented SQL Server .NET driver

Question

SqlServer.Core: Performance-oriented SQL Server .NET driver

roji opened this issue 3 years ago · comments

An experiment in collaboration with the community to determine what potential there is modern .NET features in a highly performant SQL Server driver.

Arthur Vickers · Answer 1 · Fri Apr 23 2021 23:12:46 GMT+0800 (China Standard Time)

@ErikEJ @Wraith2 @NickCraver @mgravell @Drawaes: Let's get this party started!

@smitpatel, @roji, and I discussed how to get started with Woodstar, and we have created some initial issues for the ideas we came up with. We want this to be collaborative, so please add your ideas, either by commenting on issues or in new issues as appropriate.

Initial ideas:

/cc @davidfowl

Nick Craver · Answer 2 · Sun May 02 2021 23:10:59 GMT+0800 (China Standard Time)

Would it be useful to list current pain points, especially w.r.t. performance in the current driver structure? Discussing connection pooling and what we are/aren't able to see or get metrics on today due to what's exposed and what's internal is something that doesn't hurt so much per connection but is a pain at scale in various ways. Today, we're wrapping connections to make best-guess workarounds for how pooling is behaving, which commands are in flight when a stall happens, etc.

I know out goal is performance, but I think it's worth considering the driver/implementation surface area along these lines as we go, and maybe current pain points is the good thing to have as a starting point on that front?

Wraith · Answer 3 · Mon May 03 2021 02:23:22 GMT+0800 (China Standard Time)

Async strings and blobs. The current internal structure makes their performance non-linear. It is a long project to fix it but it's complicated and there is no mitigation other than using SequentialAccess mode and constructing the string yourself.
Task based internal architecture uses almost entirely imperative task classes rather than langauge supported and this generates a lot of garbage in Tasks, TaskCompletionSources, ContinueWith closures etc. This could all be langauge and valuetask based and perform faster and cleaner
The codebases mixes sync and async paths liberally and confusingly. Along with lots of locking this makes making logical deductions about safety hard. Atomic state changes would be a lot easier to work with
The code is complex and hard to follow with very little explanation for internal choices. e.g., why when connecting to a named address does it choose the last ip4 address returned from the dns to try first? If we change it do we break someone?
Parameters being non-generic forces the use of boxing object containers and leads to reflection based runtime type discovery and conversion.
Connection pool keys being canonicalized and stored as strings incurs overhead because every connection because the string has to be generated each time. A dedicated immutable type would be better.
The SNI (Sql Network Interface) layer) and Tds Layer are disconnected loops and every packet received is copied at least twice, at least half of the byte arrays used are dropped to GC because the lifetime isn't tracked so they aren't known to be safe to re-use.

Plus a whole load of other random bits and pieces that when/if we encounter the same problem in a new implementation I will hopefully spot and warn about so we can avoid the same implementation choice. In general it's about what you expect from a 20 year old codebase at this point. Decisions which were correct at the time are now outdated but the weight of the codebase is so high that changing direction carries unacceptably high risk. Writing from a high level async-first perspective will get us an easier codebase to work with.

Nick Craver · Answer 4 · Mon May 03 2021 04:10:10 GMT+0800 (China Standard Time)

Should async-only be a design goal here, like HttpClient in .NET Core before...the happening?

Wraith · Answer 5 · Mon May 03 2021 04:31:02 GMT+0800 (China Standard Time)

I'd prefer to support as much of the existing external surface area as possible because it will enable widest adoption if we get to production stages. I'd say async first but not async only if that's possible, I think it is possible because postgres manages it afaik.

David Fowler · Answer 6 · Mon May 03 2021 06:34:47 GMT+0800 (China Standard Time)

I'd hope we'd build the core engine with a different API shape than what ADO.NET wants. We can write a slow adapter for that layer but the new APIs shouldn't be bound by it. Also take a look at this issue dotnet/runtime#28882 as a way to avoid boxing for DbParameters.

Wraith · Answer 7 · Mon May 03 2021 06:47:32 GMT+0800 (China Standard Time)

The problem with using another shape is knowing what it is, there can be no consumers until we write one so we don't know what shape is required without a specific and well defined set of consumer requirements, who's the first consumer?

Tim Seaward · Answer 8 · Mon May 03 2021 07:35:19 GMT+0800 (China Standard Time)

Well you have SO people commenting right here. The point I thought is to push the boundaries of performance and not support the widest number of use cases. As David has said slower compatibility could be built on top. In reality though in the case of sync you are just hiding what is really going on as all db operations are async in nature if there is network or any disk access involved.

Wraith · Answer 9 · Mon May 03 2021 07:56:21 GMT+0800 (China Standard Time)

Ok, pure perf it is and then we'll see. That'll allow ignoring all the existing public surface area defined in terms of Task and using ValueTask throughout which will be lovely.

You say that all db operations are async. All IO operations are async and if you meant that then I agree. However because TDS is packetized there can be multiple columns and even rows in a single packet so a single data fetch may not be async it could just be fetched from memory. At the moment in SqlClient you either have to allocate a task every time or you have to use sync and live with the thread blocking sometimes. ValueTask<T> GetFieldValueAsync(int index) will make that problem go away.

We may also want to think about being clever with pre-dispatching packet requests. Currently you run through the packet until you run out of data and then request more and wait for the response. If you can guess based on reading, a string for example, that you won't have enough data you could pre dispatch the next packet request and then it might be ready when you need it. A fundamental limitation of any library like this is network latency and we might be able to hide some of that.

Shay Rojansky · Answer 10 · Mon May 03 2021 20:22:26 GMT+0800 (China Standard Time)

My vote here would be to only do a non-ADO.NET provider if figures clearly show ADO.NET is a serious blocker for performance.
My Npgsql experience is telling me that ADO.NET in itself shouldn't be a (significant) blocker for high-perf scenarios, and doing a non-ADO.NET driver basically means it's unusable by all existing .NET layers (Dapper, EF, LLBLGen Pro...). Of course we can always build an ADO.NET adapter on top of a non-ADO.NET adapter, but I wouldn't necessarily go in the direction of "break the ecosystem" until we see a good reason to.

Note that while doing optimization work on EF Core, I've recently seen that saving an allocation here and there has less of an impact on end RPS perf than expected (by me at least), and in at least some cases ADO.NET can be improved upon too (e.g. dotnet/runtime#17446 for parameter boxing). In other words, if some aspects of ADO.NET do turn out to block perf, I'd rather this effort helped us identify and fix them, rather than doing something completely different.

However because TDS is packetized there can be multiple columns and even rows in a single packet so a single data fetch may not be async it could just be fetched from memory. At the moment in SqlClient you either have to allocate a task every time or you have to use sync and live with the thread blocking sometimes. ValueTask GetFieldValueAsync(int index) will make that problem go away.

This is a good example. If the Task allocation from DbDataReader.GetFieldValue turns out to be significant (and let's see that it's the case, rather than assuming it is), we could simply add a new ValueTask-returning API alongside the existing one. We could then promote that into ADO.NET.

Regardless of the above, I think an async-only driver definitely makes sense, at least as at a first phase (but maybe in general). Needing to support sync already creates various problems in today's world (e.g. no Pipelines support, at least last time I checked).

Wraith · Answer 11 · Mon May 03 2021 21:00:02 GMT+0800 (China Standard Time)

If the Task allocation from DbDataReader.GetFieldValue turns out to be significant (and let's see that it's the case, rather than assuming it is)

Well, I think so. This is from the TE Fortunes benchmark test project that I've been using for perf improvements over the last couple of years.

And since you're a bunch of people who'll know what you're looking at here's the dotTrace file for that so you can explore.
BenchmarkDb.zip

The thing we actually want is the Fortune object and the strings it contains, look how far down the list it is. I'm pretty sure that the entire query result is inside a single packet response so it's all in memory at the point when fields are being fetched. Allocating a Task for every int32 read from an in memory packet isn't the worst things happening but because of the public surface area even if I managed to make the internals ValueTask based I'd still have the allocation at the user surface. I've considered proposing ValueTask<T> ValueGetFieldValueAsync(int i) a couple of times but the name makes me have second thoughts let alone the process or approval and years it would take to get it adopted.

Anyway. You raise another good point @roji. What do we mean by high performance? The reason I started on this was that I wrote a really nicely performing server app and then the perf got destroyed by simply trying to save data to the database. I'd like to be able to plug in the database layer without it being the worst performing thing in the application.

In my experience so far query speed is dominated by network latency and we can't really do a lot about that. This is why memory makes a very small difference quite a lot of the time. What we miss in this scenario is that there will be higher layers doing other work like layout etc.

I think we're looking to get a driver which is as cpu and memory clean in as many aspects as possible so that many machine resources can be used simultaneously, so don't block threads don't waste memory causing time to be lost to GC. Get work done and get out of the way so higher layers can get higher throughput per unit hardware per time.

Shay Rojansky · Answer 12 · Mon May 03 2021 22:25:06 GMT+0800 (China Standard Time)

This is from the TE Fortunes benchmark test project that I've been using for perf improvements over the last couple of years.

We're getting down into the little details, but... I can't see any reason for GetFieldValueAsync to be used in TE Fortunes (or generally a big majority of ADO.NET usage). Unless one is reading very big rows and using CommandBehavior.SequentialAccess, the row is pretty much guaranteed to be buffered in memory; after all, without SequentialAccess one can read columns in any order. In that scenario, using GetFieldValueAsync is pretty much useless, and GetFieldValue should be used instead.

But if we want to add another API just to remove the Task allocation for when huge rows and SequentialAccess are being used, then it really isn't that complicated - mostly a matter of finding the right name. I think it's a pretty low-priority scenario (if you're reading huge rows, that extra allocation isn't likely to matter), but it's trivial to push through.

Re the rest... I do hope Npgsql shows that very good perf is possible while sticking to ADO.NET.

This is why memory makes a very small difference quite a lot of the time.

I'd want to see some backing to that assumption. Of course, I don't mean egregious useless allocations inside the driver (which is what SqlClient does in some cases) - that should simply be avoided or cleaned up. But again, my recent experience with EF Core shows that hunting down every single reasonable allocation has quickly diminishing returns, and therefore probably doesn't justify dumping ADO.NET.

To summarize, .NET does have a standard database API. While it's not the best it could be, there's an entire ecosystem around it, and very good performance has been achieved without dumping it. Could a bit more perf be squeezed by saving a couple allocations that ADO.NET forces us to do? Maybe. Does that justify doing something completely non-standard, where maybe ADO.NET itself could be fixed? I think that remains to be shown.

Tim Seaward · Answer 13 · Mon May 03 2021 23:36:24 GMT+0800 (China Standard Time)

I don't understand the point of this repo if you just want to improve the sql client under ado.net? You might as well work on fixing that under the hood. SslStream was a similar age, very risky, a mix of sync and async and promise based code and we fixed it from the inside. There is no point just making another ado.net sql lib with all those restrictions.

David Fowler · Answer 14 · Mon May 03 2021 23:39:59 GMT+0800 (China Standard Time)

I'm on @Wraith2 's side, that profile shows so much waste, like nuke all of those allocations😄

Bradley Grainger · Answer 15 · Mon May 03 2021 23:44:33 GMT+0800 (China Standard Time)

In MySqlConnector, I've considered providing a "direct" API that exposes the raw MySQL protocol through a .NET API to see how much overhead ADO.NET contributes. However, it always seemed extremely unlikely that it would be used anywhere except the TE benchmarks due to the established ADO.NET ecosystem.

I have no objections to this project pursuing a focused SQL Server solution to see what the maximum possible performance is, but would personally appreciate the effort going into directions that could benefit all ADO.NET providers/users. But please just take this as my 2c, and not a strong attempt to change the project direction.

Some ADO.NET pain points I'd like to see fixed:

No first-class "connection pool" concept. Every ADO.NET user has to create (and throw away) a DbConnection object that behind-the-scenes borrows an actual connection from a pool (and returns it when DbConnection.Dispose is called). This results in a lot of temporary garbage objects when it would be nicer to write pool.BorrowConnection (or similar) instead. Obviously that would be a big change in how ADO.NET is consumed.
- Having a common, well-tested, low-overhead, lock-free connection pool available to use as an ADO.NET driver implementer would be very nice.
Connection Strings. Related to above, the pool is usually implicitly selected by setting DbConnection.ConnectionString. This is frequently parsed with DbConnectionStringBuilder, which adds some unavoidable overhead: mysql-net/MySqlConnector#238. If a "connection pool" object were created, connection string parsing would only have to happen once, instead of per-DbConnection. (Drivers can do some optimisations right now based on connection string equality, of course.)
DbCommand, DbParameterCollection, etc. are a lot of ephemeral objects created for simple SQL statements. Dapper provides an ExecuteScalarAsync<T> extension method on DbConnection; it might be nice to have something similar as part of ADO.NET that could do some optimised parameter binding, execute the SQL, and return the single scalar result.
DbCommand.Prepare doesn't make a lot of sense on an ephemeral object if it's intended to create a reusable server-side object. Usually this ends up storing some state on the underlying object that represents an open server connection.

OTOH, if a lot of these (and similar) suggestions get implemented, maybe it's "not ADO.NET" anymore, and trying to preserve ADO.NET was a false goal? 😀

Tim Seaward · Answer 16 · Mon May 03 2021 23:51:38 GMT+0800 (China Standard Time)

This is my 2c as well. But I see it as an opportunity to write a new fresh DB surface area that while focused on Sql Server initially there is no reason there couldn't be a MySql etc version. Connection pooling is a very good example that I think shoe horning into existing ado.net would extra pain.

I would love to see a break from tradition of the example of pipelines vs streams. Initially built on their own and eventually bridging code to help you use pipelines with stream apis. But for pure perf you want pipelines all the way through.

Shay Rojansky · Answer 17 · Tue May 04 2021 00:22:52 GMT+0800 (China Standard Time)

To be clear, I'm not the boss here or anything - am just stating my opinions!

I don't understand the point of this repo if you just want to improve the sql client under ado.net? You might as well work on fixing that under the hood. SslStream was a similar age, very risky, a mix of sync and async and promise based code and we fixed it from the inside. There is no point just making another ado.net sql lib with all those restrictions.

There are various reasons why this isn't feasible - lots of backwards-compatibility, legacy and stability constraints simply don't make this possible. As a very simple example, SqlClient needs to continue supporting .NET Framework.

But I see it as an opportunity to write a new fresh DB surface area that while focused on Sql Server initially there is no reason there couldn't be a MySql etc version.

So there really are two efforts being discussed here:

Writing a new, modern and efficient driver for SQL Server; this indeed resembles rewriting SslStream while retaining the same public API. There are indications that this would be of immense value in the ecosystem - and this is the original/main thing discussed here.
Writing a new .NET database API, as an alternative to ADO.NET.

I see these as two very separate and orthogonal efforts, each being quite huge. For example, discussing a first-class, cross-database pooling component as @bgrainger mentioned (see dotnet/runtime#24856) is just on its own a potentially huge debate. I believe having focus on one of the above two is important (trying to tackle too frequently backfires), but again, that's just my opinion.

I'm guess I'm also a bit confused on why we're proposing to do a new database API. Are we convinced ADO.NET is a significant blocker to database driver perf, and if so, that we can't fix those problems incrementally? Do we have any data to support that? Or do we just hate the API and want a modern, new API (also legitimate)? Or do we just want to play around without any constraints (also legitimate)? The important point is that any new DB API breaks the ecosystem in both directions - DB drivers would need to be rewritten to support it (Npgsql, MySqlConnector, Oracle), and upper layers would have to do the work to support it as well (EF Core, Dapper, LLBLGen Pro...). The effort would be immense - and I'm not clear on exactly why we'd be proposing that.

@bgrainger FWIW I agree with a lot of your points above, though some things can be handled without API changes. For example, Npgsql does avoid parsing the connection string more than once; DbCommand instances can be recycled by their owning DbConnection, and similarly DbDataReader can be recycled by their owning DbCommand (Npgsql does these things); we can possibly do something similar

Shay Rojansky · Answer 18 · Tue May 04 2021 00:23:48 GMT+0800 (China Standard Time)

But to reiterate - I'm just a guy expressing my opinions. If we end up with a non-ADO.NET driver, and an ADO.NET shim over that, maybe that's fine too.

Shay Rojansky · Answer 19 · Tue May 04 2021 00:26:51 GMT+0800 (China Standard Time)

I'm on @Wraith2 's side, that profile shows so much waste, like nuke all of those allocationssmile

FWIW I'm pretty sure 90+% of these allocations are the SqlClient implementation, rather than anything necessarily having to do with ADO.NET.

Marc Gravell · Answer 20 · Tue May 04 2021 00:38:21 GMT+0800 (China Standard Time)

and doing a non-ADO.NET driver basically means it's unusable by all existing .NET layers (Dapper, EF, LLBLGen Pro...)

FWIW, the resurrection of this thread a few days ago is part of what prompted me to get started on DapperAOT. Pretty sure I'll have code working in the flavor of Dapper (although using the new DapperAOT) in lock-step with any-and-all progress here.

Wraith · Answer 21 · Tue May 04 2021 01:09:07 GMT+0800 (China Standard Time)

The current SqlClient codebase is too dangerous to use a "move fast and break things" approach. The number of users and the fact that we want people to move from System to Microsoft versions requires a slow and measured approach. The unfortunate side effect of that is that the speed of change is slow. I've been tinkering for two years and that trace is still abysmal compared to modern code. It is also important to note that the codebase is difficult to work with.

The existing SqlClient library will not go away. It will not change with sufficient speed to improve performance for the majority of users. I'm fairly sure that the performance of SqlClient lags behind that of other database providers, even if it doesn't I'm sure there can be a better performing implementation.

In writing an SqlCore library we can:

identify if there are new patterns that would be better than existing ADO approaches for high throughput, low overhead, high concurrency, etc scenarios
provide a good non-surprising (no async over sync) async way to access Sql Server data
provide a viable mitigation for existing problems with high string and binary use scenarios

I think all of these things have value.

Things that aren't current goals as I see it:

replace or deprecate ADO, just not practical nor demonstrably of value
create an ADO capable SqlCore driver
create a driver which implements the entire breadth of the existing SqlClient driver (SqlNotification? not now).

We do not have to choose to ignore the existing ADO api surface because we can probably provide most of it built on a new implementation. This doesn't have to be an explicit goal but it may be something to be aware of as a long term nice-to-have so we can flag up changes which would prevent it being possible.

The existing ADO surface has problems such as Task<T> GetFieldValue<T>(int i) which can be worked around trivially with new implementation build using ValueTasks where it is appropriate. So perhaps we should work on a new implementation regardless of the api shape and then when we have something that works for trivial cases review what we've got and discuss further what direction we should take?

I'm happy with working towards specific use cases from StackOverflow if they can be provided. It's a point to work towards and learn about the problem space. Once we have something that is of worth or we collectively decide that the goal is impossible or improbable we can re-assess.

I don't see any problem with the general pattern of access which ado provides. As a sketch

public IAsyncEnumerable<Fortune> GetFortunes()
{
	await using (var connection = GetSqlCoreConnection())
	{
		using (var command = GetSqlCoreCommand(connection, command properties ?))
		{
			using (var reader = await command.GetReader(reader properties))
			{
				while (await reader.GetNextRow())
				{
					yield return new Fortune { Id = await reader.GetValue<int>(0), Message = await reader.GetValue<string>(1) };
				}
			}
		}
	}
}

but that doesn't mean we have to start off with implementing all the interfaces that are in the System.Data namespace to do it.

Wraith · Answer 22 · Tue May 04 2021 01:31:22 GMT+0800 (China Standard Time)

I'm on @Wraith2 's side, that profile shows so much waste, like nuke all of those allocationssmile

FWIW I'm pretty sure 90+% of these allocations are the SqlClient implementation, rather than anything necessarily having to do with ADO.NET.

Yup, but the Task<int> and Task<string> are directly due to Task<T> GetFieldValue<T>(int i) not being ValueTask. You should always use the async call when getting a string because it can easily cross a packet boundary but if you do that you pay for the task even if the string is in-memory and doesn't cross a packet boundary.

I'm on @Wraith2 's side, that profile shows so much waste, like nuke all of those allocations😄

Working on it, slowly... This is the good version, it used to gc every 16ms when I started.
The waste is, as @roji said, largely due to implementation issues but it will take me another 2 years at least to get to make a dent in them. I'm going to do that work but while I do we can also investigate better ways to approach the functionality required with modern techniques. Things learnt on both projects may improve the other.

Shay Rojansky · Answer 23 · Tue May 04 2021 01:59:15 GMT+0800 (China Standard Time)

One more possible idea... If we don't want to be constrained by ADO.NET, but also aren't aiming to create a new abstraction, we could just write a driver with a public API surface that fits SQL Server, TDS, etc. This would be a low-level API which doesn't pretend or aim to be a general database API in any way (and once again, a shim could be built on top of that). It's all really a question of what we're trying to achieve here.

Wraith · Answer 24 · Tue May 04 2021 02:00:47 GMT+0800 (China Standard Time)

I think that would be a good place to start.

David Fowler · Answer 25 · Tue May 04 2021 02:06:16 GMT+0800 (China Standard Time)

Great suggestion @roji !

Arthur Vickers · Answer 26 · Tue May 04 2021 02:25:25 GMT+0800 (China Standard Time)

Bottom line is that we need to measure this. If ADO.NET is adding significant overhead, then we either address that or don't ultimately use ADO.NET. Both of these options are on the table.

However, our current .NET/PostgreSQL solution on TechEmpower is #12 overall and uses ADO.NET. Analysis of this (which @roji has done extensively) indicates that, at least for queries, ADO.NET is not limiting. We have also found this with other experiments done in the past. So even though a few years ago my gut feeling was that we would have to drop ADO.NET, I now think I was wrong about that because the data we have just doesn't show it.

The idea behind issue #12 is to get the upper limit. Once we have that we can put that code into ADO.NET and get an initial read of the overhead. Until we have that data I think we're all guessing.

Arthur Vickers · Answer 27 · Tue May 04 2021 02:29:56 GMT+0800 (China Standard Time)

@Wraith2 I should point out again that the goal of this project is not to replace Microsoft.Data.SqlClient. Rather it is to compliment it with a fast driver that most likely doesn't support many of the features of Microsoft.Data.SqlClient. Microsoft.Data.SqlClient will always be an option for those who need ADO.NET or backwards compatibility.

Andrii Kurdiumov · Answer 28 · Mon May 10 2021 19:18:56 GMT+0800 (China Standard Time)

When playing with NativeAOT and source generators, I would like provide some feedback

I notice that DataReader when read from TDS read data in native format and place them then in object[] and when typed versions of GetValue read from that object, cause boxing/unboxing. I'm not sure that I want direct TDS access, but maybe that would work for my scenario
SqlConnection utilizes a lot of different scenarios which not everybody need in specific application case. Maybe would be possible have some API which can control what code used using API, not Connection String. For example you should opt-in into connection pooling, azure, local instances, AD support, etc.

Hopefully this is not far of the goals of this project. just my 2c

Wraith · Answer 29 · Mon May 10 2021 19:49:56 GMT+0800 (China Standard Time)

I notice that DataReader when read from TDS read data in native format and place them then in object[] and when typed versions of GetValue read from that object, cause boxing/unboxing. I'm not sure that I want direct TDS access, but maybe that would work for my scenario

This is an old .net 1.1 api and because it uses object[] the boxing of valuetypes is unavoidable. Direct access through GetFieldValue<T> should not box for any supported value type. I don't expect GetValues to be something we would use in a high perf scenario.

Andrii Kurdiumov · Answer 30 · Mon May 10 2021 20:05:07 GMT+0800 (China Standard Time)

Thanks for explanation. Will go home, do my homework then.

Marc Gravell · Answer 31 · Mon May 10 2021 21:05:53 GMT+0800 (China Standard Time)

@Wraith2 another consequence of old APIs: the entirity of the async API is Task[<T>] based; for the non-T versions, and the Task<bool> versions (ReadAsync(), NextResultAsync(), etc), that's tolerable - they can be optimized in the "woot, we had the value synchronously" scenario, but the field-based async API is horrible for allocations (GetFieldValueAsync<T>() etc) - to the point where I'm pretty sure most consumers just ignore it, and just use ReadAsync(), then switch to synchronous inside a row. Which might actually be fine, in most scenarios - although ironically, the 2 APIs where you might genuinely expect to need to wait for data - GetChars and GetBytes - don't offer an async API :p

Shay Rojansky · Answer 32 · Mon May 10 2021 21:23:43 GMT+0800 (China Standard Time)

@mgravell I agree that GetFieldValueAsync (and IsDBNullAsync) are problematic in that they return Task (and that's indeed been mentioned as a problem in ADO.NET that we could easily fix by introducing an alternative API (and the lack of something like GetCharsAsync/GetBytesAsync too).

Though you're also right that in most cases, sync column access seems to be the norm as the row is assumed to be already buffered in memory (when CommandBehavior.SequentialAccess isn't specified). This is how Npgsql works - without SequentialAccess, entire rows are pre-buffered in memory when Read/ReadAsync is called, allowing subsequent column access to be very memory-only and very fast. But if we want to improve support for large rows, there's definitely some stuff missing.

(I'd just rather we distinguished between missing/problematic APIs which we can incrementally fix by adding new ones (if we feel like we have to), and an approach where we dump ADO.NET from the get-go)

Marc Gravell · Answer 33 · Mon May 10 2021 21:30:51 GMT+0800 (China Standard Time)

@roji you're right, the "do we don't we" on ADO.NET is the big question. Unfortunately, it is one of those places where you kinda need a minimal "this is what it could do" minimal feature set comparison to get an idea of how much impact the ADO.NET API is having. It seems like a minimal viable API for talking to a database is still a pretty big API, even if it is just "run this non-parameterized query that returns 1 row of 4 columns". I think until we have a baseline there, we could talk in circles, and: to directly address your concern - we could be trying to solve the wrong problem.

Shay Rojansky · Answer 34 · Mon May 10 2021 21:36:35 GMT+0800 (China Standard Time)

@mgravell sure, you're right. I've been thinking we'd take something like the TechEmpower Fortunes scenario as our first, minimal viable surface area: simple querying bringing back ints and strings, no parameters (yet), no transactions and definitely no exotic ADO.NET stuff nobody wants. That doesn't seem huge to me - in ADO.NET it would mean a basic DbConnection that can open/close (and a trivial pooling implementation), a DbCommand with ExecuteReaderAsync and a DbDataReader with ReadAsync and GetFieldValue.

Marc Gravell · Answer 35 · Mon May 10 2021 22:19:29 GMT+0800 (China Standard Time)

@roji that would give us an indication of what the gap is between what SqlClient currently does, and what SqlClient (or a replacement) could do, inside the limits of ADO.NET; however, it doesn't give us any indication of what the baseline is for what a replacement could do without the constraints of ADO.NET. So if someone is taking on the challenge of an MVP of SqlClient, then the perfect outcome would be to have those exact same implementations available in a non-ADO.NET wrapper.

Tim Seaward · Answer 36 · Mon May 10 2021 23:06:52 GMT+0800 (China Standard Time)

I would imagine that something like the following should be reasonably achievable in short order, the only problem I have is the code is a bit dates (around spans and pipelines now as the API shifted) and that ... well SslStream isn't my friend

https://gist.github.com/Drawaes/27311bcee54e3ef524823f1c9bed7179

Shay Rojansky · Answer 37 · Mon May 10 2021 23:06:56 GMT+0800 (China Standard Time)

it doesn't give us any indication of what the baseline is for what a replacement could do without the constraints of ADO.NET

That's a very abstract sentence, as if we were dealing with an unknown, amorphous API 🤣

We all know ADO.NET quite well here, and we know that for a Fortunes-style experiment, the perf issues may be the allocations for the 3 types (DbConnection/DbCommand/DbDataReader, though recent experience shows a few short-lived allocations simply don't have that much impact). Npgsql already mitigates the last two allocation via recycling: disposing NpgsqlDataReader stores the instance on its NpgsqlConnection for the next ExecuteReader, and disposing NpgsqlCommand stores the instance to be returned for the next NpgsqlConnection.CreateCommand. DbConnection remains the nonun-recycled one; in EF we don't have that issue, since we pool DbContexts which themselves own a DbConnection, so all ADO.NET objects are cached all the way down. For non-EF this could also be mitigated by exposing a pooling API which would return DbConnection instances, rather than requiring users to new them up (dotnet/runtime#24856).

So if we really want to argue against ADO.NET, it would be nice to have a concrete discussion rather than general "we don't know what perf we could achieve if we just freed ourselves of all constraints". Or we could go through the exercise of implementing an ADO.NET and a non-ADO.NET Fortunes prototype, which IMHO would most likely show that the single DbConnection allocation (which itself can be mitigated) isn't worth creating a whole new DB API for.

Shay Rojansky · Answer 38 · Mon May 10 2021 23:08:52 GMT+0800 (China Standard Time)

@Drawaes yeah, that looks like a good start (I'd definitely forget about SslStream in a first minimal prototype...). Do you already have something establishing the connection and performing that query against SQL Server?

Tim Seaward · Answer 39 · Mon May 10 2021 23:09:17 GMT+0800 (China Standard Time)

You can't forget about SslStream, is kinda the point, you need TLS in the SQL protocol

Shay Rojansky · Answer 40 · Mon May 10 2021 23:10:03 GMT+0800 (China Standard Time)

@Drawaes are you saying it's not possible to communicate with SQL Server at all without TLS? I was just suggesting excluding TLS in initial prototyping, assuming that's possible.

Wraith · Answer 41 · Mon May 10 2021 23:11:03 GMT+0800 (China Standard Time)

@Drawaes are you saying it's not possible to communicate with SQL Server at all without TLS? I was just suggesting excluding TLS in initial prototyping, assuming that's possible.

Only local servers. Any Azure server requires tls. Practically speaking tls is a hard requirement for any non-toy implementation.

Shay Rojansky · Answer 42 · Mon May 10 2021 23:13:39 GMT+0800 (China Standard Time)

I think setting up a working prototype against a local server can be a good starting point (for simplicity, we probably have enough work), but if you guys want to tackle TLS immediately I'm not objecting...

Marc Gravell · Answer 43 · Mon May 10 2021 23:14:29 GMT+0800 (China Standard Time)

Just saying... if TLS is the problem and the purpose is to assess the impact of the ADO.NET layer, it doesn't actually need to be SQL Server at the bottom. A "raw PostgreSQL vs PostgreSQL in ADO.NET" might serve just as well to give an indication of what the slack is that we're playing with from the ADO.NET layer :) (and no, that's not meant as a "so @roji can go and do it all")

Tim Seaward · Answer 44 · Mon May 10 2021 23:15:39 GMT+0800 (China Standard Time)

The problem with ignoring TLS is the way TLS is interleaved with TDS

It is better to know how you are going to put it in from day one (you have to handshake, hand off to TLS, and then hand back to TDS)

Wraith · Answer 45 · Mon May 10 2021 23:15:43 GMT+0800 (China Standard Time)

Requiring TLS influences the way all IO is performed. Without the need for it we can go pure pipelines over a socket and have really nice code, with it we have to go up to SslStream. We know we will need SslStream so we should start there. It'd be really nice if we had an SslSocket but we don't...

Tim Seaward · Answer 46 · Mon May 10 2021 23:18:43 GMT+0800 (China Standard Time)

As someone who advocated for a TlsPipeline for years, even wrote a demo one at the start of pipelines, I whole heartly agree it would be nice

dotnet/corefxlab#991

dotnet/corefxlab#1062

Wraith · Answer 47 · Mon May 10 2021 23:22:34 GMT+0800 (China Standard Time)

Aside from a lot of nit-picking it looks like it was generally a viable approach. So perhaps something to advocate for in .net7 or beyond?

Marc Gravell · Answer 48 · Mon May 10 2021 23:27:55 GMT+0800 (China Standard Time)

Without the need for it we can go pure pipelines over a socket and have really nice code

last time I checked, we still didn't have a supported client pipelines API (although David has teased me with the idea a few times); another pet gripe of mine, hence the intentionally passive-aggressive name of "Pipelines.Sockets.Unofficial", but my point is: (unless things have changed and I missed it) you might not even get that out-of-the-box

Wraith · Answer 49 · Mon May 10 2021 23:29:29 GMT+0800 (China Standard Time)

To pick up on the main discussions: Is it a reasonable idea to implement a "trivial" client akin to the fortunes query with a similar api shape ( connections, commands, readers) but without sticking to the ado interface directly (so we could have ValueTask<bool> ReadNextRow()) and see what that takes and where it gets us?

Wraith · Answer 50 · Mon May 10 2021 23:34:11 GMT+0800 (China Standard Time)

Without the need for it we can go pure pipelines over a socket and have really nice code

last time I checked, we still didn't have a supported client pipelines API (although David has teased me with the idea a few times); another pet gripe of mine, hence the intentionally passive-aggressive name of "Pipelines.Sockets.Unofficial", but my point is: (unless things have changed and I missed it) you might not even get that out-of-the-box

From what I've seen if we can demonstrate a need for those official apis by making aspnetcore capable of being faster and throw in a TE benchmarks buzzword then it's a no-brainer as far as getting the change into net7.

Shay Rojansky · Answer 51 · Mon May 10 2021 23:37:31 GMT+0800 (China Standard Time)

@Wraith2 if this is about ReadNextRow, note that ADO.NET ReadAsync returns non-generic Task, which is already relatively allocation-friendly (no allocation for sync completion), unlike e.g. GetFieldValueAsync. But sure, we can definitely go with new standard APIs and see where that leads. If we still have a similar API shape, we can even have non-standard methods alongside the standard ADO.NET ones, for easy comparison.

Arthur Vickers · Answer 52 · Mon May 10 2021 23:37:39 GMT+0800 (China Standard Time)

From what I've seen if we can demonstrate a need for those official apis by making aspnetcore capable of being faster and throw in a TE benchmarks buzzword then it's a no-brainer as far as getting the change into net7.

We shouldn't expect any work like this in .NET 7. For the moment, if we want these things, then we'll need to implement them.

Marc Gravell · Answer 53 · Mon May 10 2021 23:38:03 GMT+0800 (China Standard Time)

ValueTask<bool> by itself isn't a huge win - bools are already easy to optimize and even if the code forgets: this is now done automatically in Task.FromResult from .NET 6 onwards; the real win to me is the general API shape - in particular, how commands are specified, how parameter values are supplied (for example, there's no reason that the command definition and parameter values need to be in the same instance, and splitting the two may have big wins), making the schema API queryable without allocations (IDbColumnSchemaGenerator is better than the DataTable API, but still not free), etc.

Wraith · Answer 54 · Mon May 10 2021 23:43:01 GMT+0800 (China Standard Time)

If we have ValueTask<T> GetValue<T>(int index) then making that ADO compatible is as simple as AsTask(). So we can do the valuetask version for direct consumption and not block out ADO with a wrapper. The point I'm trying to make is that I think we should be unconstrained by ADO to start with and add it later with as thin a possible conversion layer.

here's no reason that the command definition and parameter values need to be in the same instance, and splitting the two may have big wins

Interesting, I hadn't got as far as considering that but I can see what you mean. swapping out parameter sets with a single assignment would be far less trouble than it is now.

Marc Gravell · Answer 55 · Mon May 10 2021 23:47:25 GMT+0800 (China Standard Time)

swapping out parameter sets with a single assignment would be far less trouble than it is now.

indeed, and no "here's a spare DbCommand I left around that's already configured with the correct 26 properties and 12 parameter instances, I'll swap it in/out when I see it" - just a static readonly field for the definition, and you're done, just supply parameter values (in and out) at runtime; the how for that gets very tricky, of course - it is hard (but not impossible) to avoid object without getting into a bit of a mess

Wraith · Answer 56 · Mon May 10 2021 23:53:03 GMT+0800 (China Standard Time)

it is hard (but not impossible) to avoid object without getting into a bit of a mess

Simply having a Parameter<T> to type the storage location gets a lot of object code out of the way.
There's a lot that can be simplified. We don't need a whole set of INullable primitive types now that we have language null support and for the first pass we don't need to support UDT or xml types which would leave us with statically known types. I'm not sure how you're going to have a single object containing the strongly typed parameter values without needing to generate a poco/entity for each but even then it's still interesting.

Marc Gravell · Answer 57 · Mon May 10 2021 23:58:07 GMT+0800 (China Standard Time)

I'm not sure how you're going to have a single object containing the strongly typed parameter values without needing to generate a poco/entity for each but even then it's still interesting.

It is a shame that neither ITuple nor the ValueTuple<...> family has a GetValue<T>(int index), or you could just use that; i.e. cmd.Execute(connection, ("abc", 12, DateTime.Now));

Nick Craver · Answer 58 · Tue May 11 2021 00:54:21 GMT+0800 (China Standard Time)

I've been observing here, but for what it's worth I completely agree that an unconstrained replacement approach seems like the best first path, then seeing about an ADO.NET adapter later. It can answer the questions we want about ADO.NET overhead and potentially provide the best of both worlds in the end as well.

I've been going through the Dapper backlog the past few days seeing what people are having trouble with across various ADO.NET providers and will try and find time to summarize that in an issue this week. For SQL Server, it's mostly hooking and custom type conversion coming back off the reader, but thus far we can't (as an agnostic consumer) use the optimized version of certain APIs because they're inconsistent across providers today. I think that unfortunately ADO.NET is less uniform than we'd like to think it is because we still observe many issues with sync-over-async and such that confuses users today.

Wraith · Answer 59 · Tue May 11 2021 03:48:38 GMT+0800 (China Standard Time)

For SQL Server, it's mostly hooking and custom type conversion coming back off the reader,

Can you open issues for those on the SqlClient repo? that way if we ever get to the end of the backlog and end up being able to add features sensibly we'll have the info on what's needed.

we can't (as an agnostic consumer) use the optimized version of certain APIs because they're inconsistent across providers today.

Somewhere there's a feature matrix that I think bgrainger setup which had details of the major ADO providers and their capabilities. It sounds like it might be worth expanding or annotating that information with any unpleasant surprises that users may have, do you remember where that was @roji? Again if there are any SqlClient surprises please make sure there's a tracking issue on the repo, I can't fix huge problems but if it's little stuff I may be able to improve it.

Do we have general agreement on:

I've been observing here, but for what it's worth I completely agree that an unconstrained replacement approach seems like the best first path, then seeing about an ADO.NET adapter later.

And if so how do we move forward with implementing it?

Shay Rojansky · Answer 60 · Tue May 11 2021 04:41:42 GMT+0800 (China Standard Time)

Somewhere there's a feature matrix that I think bgrainger setup which had details of the major ADO providers and their capabilities. It sounds like it might be worth expanding or annotating that information with any unpleasant surprises that users may have, do you remember where that was @roji?

That would be https://github.com/mysql-net/AdoNetApiTest (latest results are here).

And if so how do we move forward with implementing it?

That's the question :) @Drawaes you mention at some point that you may have some prototype that can serve as a starting point?

Chris Tracy · Answer 61 · Tue May 25 2021 02:16:42 GMT+0800 (China Standard Time)

I'm seeing a lot of discussion about query parameters, and single-page results. I'm haven't seen anything regarding SqlBulkInsert. Getting more than 10 rows into SQL Server requires an ancient massive-allocation API (takes DataTable or primitives boxed in an object[] per row). Modern apps deal with more than 1 row at a time.

I'd like to see dotnet team build a list of MSSQL highest perf features, like insert into in-memory tables with persistence turned off, and design a brand new set of APIs for those scenarios. And designed for Span, Memory, low-allocation (including nullable w/o DBNull) and modern serialization like serde.

I see Forturnes as a lowest-common-denominator scenario(s). By ignoring high-performance potential of MSSQL I think you're losing high-performance app devs to other languages and databases. Npgsql has a much better bulk-insert API than SqlBulkInsert. Python as Apache Arrow (heart of pandas big data library and used heavily by Dremio.io data lake engine). Postgres ecosystem has Materialize.io (all in-memory).

I'm already start to see my job as an enterprise developer is increasingly to bridge CRUD app data with big data. But, as a C# developer I'm researching which 3rd-party nuget library can generate a DataTable the fastest.

I think with foundational work RE ValueTask, Span, pooling, codegen tooling, etc. .NET is well positioned to be a leader in high-perf data access (non-trivial size). But it looks like this thread so far lacks the imagination to get there. If we only get 10% of the way there every three years .NET will never catch up to perf we see in other ecosystems, like pandas for example.

Chris Tracy · Answer 62 · Tue May 25 2021 02:17:14 GMT+0800 (China Standard Time)

Scenarios

fastest possible ingest (probably using in-memory tables)
MSSQL modern serde to/from entities (probably codegen)
reduce allocations: MSSQL to/from JSON
MSSQL to/from protobuf (gRPC)
MSSQL to/from compressed columnar data formats like parquet, Apache Arrow Flight
MSSQL to/from nd-json (very little support in .NET, but used widely in big data)
near-real-time CDC and/or sync with temporal tables (like Debezium), with goal of reducing latency as close to zero as possible to post an event to Kafka or apply delta to another DB

Wraith · Answer 63 · Tue May 25 2021 03:12:51 GMT+0800 (China Standard Time)

Most of those items are above driver level, going to another format is something that would be handled outside the driver itself. However if there are facilities that are needed to implement those features then those will need to be identified and made available.

Marc Gravell · Answer 64 · Tue May 25 2021 03:32:34 GMT+0800 (China Standard Time)

But, as a C# developer I'm researching which 3rd-party nuget library can generate a DataTable the fastest.

You don't need a DataTable; an IDataReader is sufficient, and tools like FastMember can do a List<T> to IDataReader map for you. It could be hugely improved, of course, but: no DataTable required.

Richard Moeskops · Answer 65 · Fri Jun 04 2021 16:26:25 GMT+0800 (China Standard Time)

How about data compression between SQL-client and SQL-server?

I did some (non representative) tests:

I read 7000 records from our employee database. This took 2.1 seconds with EF.Core 5 non-tracked.
I converted those 7000 records to json, compressed them with LZX level 12 (K4os nuget package) and stored them in another table in a single field. This took 200ms.
I read the compressed data, decompressed it and deserialized it. This took 20ms.

I know this test is awfully skewed, because the 2 read queries do different things. But the point is that I got the data I needed 100 times faster and this is mainly due to compression and the resulting reduced network overhead.

(There are expensive commercial products available making similar claims when using compression.)

If SQL-server could compress the data, if asked by the client to do so. Huge performance gaines could be possible.
It would be a small performance costs on the SQL-server side, but a bit of is recovered because less data needs to be TLS encoded.

Of course one should look at all possible optimisations, but the slowest part in queries is often the network.

Marc Gravell · Answer 66 · Fri Jun 04 2021 16:39:52 GMT+0800 (China Standard Time)

If SQL-server could compress the data, if asked by the client to do so. Huge performance gaines could be possible.

I think adding new features to SQL Server is so far from the field of play here that we can't see it with a telescope.

Wraith · Answer 67 · Fri Jun 04 2021 16:45:40 GMT+0800 (China Standard Time)

It's a nice idea but as @mgravell said it's one that needs to start with the SQL Server team and then come to the driver once they've implemented the functionality. The data would need to be compressed only in flight or indexing would suffer.

Shay Rojansky · Answer 68 · Fri Jun 04 2021 17:58:16 GMT+0800 (China Standard Time)

If SQL-server could compress the data, if asked by the client to do so. Huge performance gaines could be possible.

I think adding new features to SQL Server is so far from the field of play here that we can't see it with a telescope.

Agreed... This should (at least for now) be scoped to client-side improvements. There's definitely more than enough work on that side before we begin proposing SQL Server changes.

kwbBaker · Answer 69 · Tue Jul 13 2021 07:42:31 GMT+0800 (China Standard Time)

Sorry to risk diluting a deep discussion, but ... could someone change the title of this item so that it reads more like a feature or a user story? Seeing it on the .NET themes roadmap (https://themesof.net/) as just "SqlServer.Core #6" when all the others are fully-spelled-out (features, tasks, stories, or epics) is kind of jarring. Should be simple to do, right? Or is this still just an investigation that cannot be pinned down to feature/story yet?

Shay Rojansky · Answer 70 · Tue Jul 13 2021 18:15:34 GMT+0800 (China Standard Time)

@kwbskf done

GSPP · Answer 71 · Wed Oct 06 2021 15:23:40 GMT+0800 (China Standard Time)

(1)

I wonder how it would be possible to quantify the potential throughput gains. Internal driver overheads in SqlClient might stem from design choices, and those overheads might be smeared out across the code base. The profiler call tree might not highlight any particular hot point, yet an entirely different library design could potentially deliver big gains.

An example of this is JSON serialization. If you were to profile Newtonsoft.Json, you'd probably find it rather optimized. Yet, the new BCL JSON serializer achieves a 2x gain on Newtonsoft with a fresh design.

The IDataReader interface feels like it would architecturally add quite a bit of overhead because it forces you to operate row-by-row and column-by-column. The library has no way to optimize the whole operation. A while ago I made a proposal for how expression trees could be used to create highly efficient deserialization code. The key point is that the library is put in a position where it can optimize reading an entire row or even multiple rows in one call, placing the result directly into an instance of a user class. I'm linking to it here: dotnet/runtime#25739

(2)

Even a highly optimized async codepath has considerable CPU overhead. The async machinery can be quite expensive. In my estimation, it will be necessary to provide a truly synchronous code path. This code path will be the one providing the lowest CPU usage for API users that want to optimize for that.

With database access, the degree of parallelism usually is low because the database cannot take more than a handful of parallel queries or else it will become overloaded. For that reason, the thread saving benefits of async are less relevant in common scenarios. It seems that the main point of this new library would be to lower CPU usage which makes a synchronous codepath quite important.

To make an example: If your database server has 8 cores (and disregarding disks for simplicity) it cannot help throughput to make more than 8 parallel queries. Having a thread pool of 8 threads with synchronous IO is a perfectly adequate solution in this scenario, and it will be the fastest solution. (I'm simplifying a lot here to make the general point.)

The BCL team recently found the need to add synchronous APIs to HttpClient for two reasons: Satisfying synchronous interfaces, and it plainly uses less CPU.

(3)

There's probably value in exposing a zero-overhead TDS connection class that has no additional layers. No pooling, no retry, no threading, not even a connection string. It's usage would basically be:

new TdsConnection(ip, port, new TdsConnectionOptions() { ... })

This API would provide raw access to some abstraction of the protocol. I'm sure there are some attractive features in there that are not widely known. For example, the recently exposed SqlCommandSet is a TDS feature. Also, the bulk insert API has some features that are not exposed.

Shay Rojansky · Answer 72 · Wed Oct 06 2021 21:03:05 GMT+0800 (China Standard Time)

I wonder how it would be possible to quantify the potential throughput gains. Internal driver overheads in SqlClient might stem from design choices, and those overheads might be smeared out across the code base. The profiler call tree might not highlight any particular hot point, yet an entirely different library design could potentially deliver big gains.

One approach we've been thinking about is to get an upper bound for performance by doing an absolute-minimum implementation of a minimal "driver" (#12) - send a minimal query and provide some sort of bare API for getting the results. See here for an Npgsql experiment; we've got a similar prototype in the works for SQL Server but nothing public yet.

The IDataReader interface feels like it would architecturally add quite a bit of overhead because it forces you to operate row-by-row and column-by-column. The library has no way to optimize the whole operation.

This isn't really true... when the database sends back a resultset, it typically first sends some sort of metadata on the resultsets, which allows the driver to parse them (e.g. which column has which type); the driver can use that information to pregenerate any sort of code that will make subsequent row/column accesses faster. In fact, when a statement is prepared, this can even happen once for multiple query executions. This seems quite orthogonal to deserializing user POCOs vs. providing a DbDataReader-like interface to rows and columns.

A while ago I made a proposal for how expression trees could be used to create highly efficient deserialization code. The key point is that the library is put in a position where it can optimize reading an entire row or even multiple rows in one call, placing the result directly into an instance of a user class. I'm linking to it here: dotnet/runtime#25739

Here are some problems I see with this:

There are various dynamic scenarios where there's no user POCO to map to, because the query result shape isn't known in advance. Since these need to be supported, some sort of row/column API will be necessary, and so the POCO mapping you're proposing would be an extra thing.
See above for a discussion on whether such a high-perf driver should implement ADO.NET or not. Even if the driver doesn't implement ADO.NET (a bit along the lines of your proposal in (3) above), I feel it's important to at least have an optional ADO.NET adapter on top of it; anything else would basically make it unusable in the .NET ecosystem, as all ORMs would need to support support the custom API (EF Core, Dapper, LLBLGen Pro...). A driver mapping to a user POCO doesn't seem like it would be adaptable to ADO.NET via a layer on top, so once again a separate row/column API would be needed.
In general, mapping to user POCO always opens up lots of questions around customizations - do the database column names correspond exactly to the POCO property names? If not, how does one specify a mapping? All that stuff is typically handled by upper layers (e.g. EF Core, Dapper), and this proposal would introduce them to the lowest level of database interaction - this doesn't seem like a good idea.
Runtime code generation is generally something that's a bit "on the way out", with the recent focus on trimming and platforms where it isn't supported (e.g. iOS). To support those you'd have to move the code generation to build-time, e.g. with source generators, which is quite a different kind of project; see DapperAOT for a possibly similar approach.
Finally and most importantly, as I wrote in dotnet/runtime#25739, I'm not convinced simply introducing expression trees and deserializing to user POCOs necessary provides a meaningful perf advantage here. I won't repeat the arguments from the other issue, but in general most of the work of decoding wire results from the database would need to be done either way.
- However, if you're convinced of this direction, the best way forward is to prove the perf advantage via a prototype. The Npgsql.Core prototype can serve as a basis for such an experiment, or if you'd prefer to work against SqlServer, we'll have a similar thing up at some point. If you can show an expression-tree POCO-deserializing driver working considerably faster than an equivalent rows/columns one, that would provide good motivation to go in that direction.

Wraith · Answer 73 · Wed Oct 06 2021 21:52:16 GMT+0800 (China Standard Time)

I wonder how it would be possible to quantify the potential throughput gains. Internal driver overheads in SqlClient might stem from design choices, and those overheads might be smeared out across the code base. The profiler call tree might not highlight any particular hot point, yet an entirely different library design could potentially deliver big gains.

The SqlClient cpu call graphs are a bit deeper than I'd like but in general cpu isn't a major problem (outside some known async problems). We spend most of the time waiting for network io . If I see ways to conserve cpu I implement them but in general avoiding allocations to reduce time spent outside user code yields far better returns at the moment. Fundementally, reading a forward only stream of self describing data isn't a hard cpu problem.

Anecdotally; calculating connection pool keys is one of the most cpu expensive things we do because of some pinvokes.

Providing a forward only reader implementation that requires the user to operate as if it were in sequential mode would significantly reduce internal complexity as would generics in some places.

Even a highly optimized async codepath has considerable CPU overhead.

It's also a tradeoff of cpu against time where cpu is a resource you can get more of relatively easily but time is linear and constrained, We usually judge that it is better to do as much work in a given allocation of time as is possible.

Bradley Grainger · Answer 74 · Wed Oct 06 2021 22:23:00 GMT+0800 (China Standard Time)

One approach we've been thinking about is to get an upper bound for performance by doing an absolute-minimum implementation of a minimal "driver" (#12) - send a minimal query and provide some sort of bare API for getting the results. See here for an Npgsql experiment; we've got a similar prototype in the works for SQL Server but nothing public yet.
...

However, if you're convinced of this direction, the best way forward is to prove the perf advantage via a prototype. The Npgsql.Core prototype can serve as a basis for such an experiment, or if you'd prefer to work against SqlServer, we'll have a similar thing up at some point. If you can show an expression-tree POCO-deserializing driver working considerably faster than an equivalent rows/columns one, that would provide good motivation to go in that direction.

If you happen to be more familiar with MySQL, I have a small prototype of what a "direct" .NET connection to MySQL would look like here: https://github.com/bgrainger/FrameworkBenchmarks/tree/mysql-direct/frameworks/CSharp/aspnetcore/Benchmarks/MySqlDirect; it could possibly be used as a base for experiments.

There may also be useful code in https://github.com/neuecc/MySqlSharp.

Shay Rojansky · Answer 75 · Wed Oct 06 2021 22:49:23 GMT+0800 (China Standard Time)

@Wraith2

Anecdotally; calculating connection pool keys is one of the most cpu expensive things we do because of some pinvokes.

Interesting...! Shameless hijacking: what are these pinvokes about? I have an ADO.NET thing I'd like to push in .NET 7.0 which may help a lot here... Will keep you posted.

Wraith · Answer 76 · Wed Oct 06 2021 22:54:25 GMT+0800 (China Standard Time)

They're the pinvokes behind Environment.UserName and Environment.DomainName on windows. It is possible to get both bits of information at once from one call but the BCL doesn't expose a way to do it so we call the same api twice and wastes buffers and cycles. It also always causes 3 string allocations to get a single string back. It's on my private trello list of "things to investigate" but trying to argue that we should do direct pinvokes into windows security related apis is going to get some pushback and it isn't worth taking my time trying to fight it at the moment.

Shay Rojansky · Answer 77 · Wed Oct 06 2021 23:10:18 GMT+0800 (China Standard Time)

@Wraith2 ok, thanks - do these at least only get called when the Username isn't present in the connection string?

Wraith · Answer 78 · Wed Oct 06 2021 23:21:53 GMT+0800 (China Standard Time)

It's not the database username it's the calling user. It depends on the SNI strategy. In native we use the windows identity SID in managed we use current username and password but we can't cache because of impersonation so we must always create and compare so even if we can find we've got the same user as last time we have to invoke and allocate the strings.
The pool implementation is probably the most complicated or confusing code in SqlClient, while I've wrestled with some of the other parts and has some success the pool beats me every time I go near it.

Shay Rojansky · Answer 79 · Wed Oct 06 2021 23:30:22 GMT+0800 (China Standard Time)

OK, thanks for the details @Wraith2. I plan to work on a proposal to basically allow users to directly reference a "DbDataSource" (naming temporary!), which typically would represent a pool; this would allow applications to resolve the pool from the connection string exactly once at startup, and then request connections from that. However, if you really want to check the system username on every open (because of impersonation), this may not help so much.

Kalle Olavi Niemitalo · Answer 80 · Wed Oct 06 2021 23:40:42 GMT+0800 (China Standard Time)

Can you first check whether the thread is impersonating, and if it is not, then use the previously allocated strings (or no strings at all)? Something with GetCurrentThreadToken and GetTokenInformation, perhaps. Whether that would help depends on how common impersonation is, I guess. And this would still involve "direct pinvokes into windows security related apis".

Wraith · Answer 81 · Wed Oct 06 2021 23:56:07 GMT+0800 (China Standard Time)

Impersonation and Mars are things that could be left out of the Core implementation quite easily. They'll still have to be supported in SqlClient but if you're looking for a fast minimal sql implementation I don't think you want them.

IanRingrose · Answer 82 · Sun Jan 01 2023 01:06:45 GMT+0800 (China Standard Time)

public IAsyncEnumerable<Fortune> GetFortunes()
{
	await using (var connection = GetSqlCoreConnection())
	{
		using (var command = GetSqlCoreCommand(connection, command properties ?))
		{
			using (var reader = await command.GetReader(reader properties))
			{
				while (await reader.GetNextRow())
				{
					yield return new Fortune { Id = await reader.GetValue<int>(0), Message = await reader.GetValue<string>(1) };
				}
			}
		}
	}
}

We know that a single row is nearly always small enough to hold in memory and that the user often knows their rows are small. I would normally rather not have await reader.GetNextRow() complete until all fields are in memory ready to read. This can then totally advoid the overhead of having GetValue() returning a task that then needs checking to see of it is completed. (Think of what these unnecessary conditions checks do to the CPU instruction pipeline)

Wraith · Answer 83 · Sun Jan 01 2023 04:41:39 GMT+0800 (China Standard Time)

We know that a single row is nearly always small enough to hold in memory

How do we know this? I know that mine often are but working with the sqlclient driver has shown me bug replications where people do things I would never consider.

the user often knows their rows are small.

Yes. However through maintenance this has a habit of changing. If a small row is 3 items and the driver optimizes for that what happens when 2 years later someone adds 3 more fields. Do we see a slowdown? Is that consequence going to be obvious from the code?

This can then totally avoid the overhead of having GetValue() returning a task that then needs checking to see of it is completed

I'm almost certain we would use ValueTask as the return type. If for some reaon we could not then a custom poolable awaitable would likely be used. Any non-null valued data type including a single byte in the tds stream can span multiple packets (because there is always a header so all records are multibyte). That means that every read can cause a network delay to be observable, the faster you can consume the data the more likely you are to hit an io wait. This needs to be a task like return type.

Shay Rojansky · Answer 84 · Mon Jan 02 2023 01:50:58 GMT+0800 (China Standard Time)

I agree that the majority use case is small rows which can be easily buffered in memory, and that this scenario should be optimized for (of course while keeping good support for large rows). This is also the reasoning for ADO.NET's CommandBehavior.SequentialAccess, which is an opt-in behavior for large rows.

I would normally rather not have await reader.GetNextRow() complete until all fields are in memory ready to read. This can then totally advoid the overhead of having GetValue() returning a task that then needs checking to see of it is completed.

I'm a bit confused... If GetNextRow (ADO.NET's DbDataReader.Read) doesn't wait to buffer all columns, then the subsequent GetValue calls (ADO.NET's DbDataReader.GetFieldValue) do potentially have to do I/O (since the row hasn't been completely read and buffered yet). So either you (possibly) wait longer up-front in GetNextRow - and then yuo're guaranteed that subsequent GetValue calls just accessed buffered results - or GetNextRow returns early without buffering, and then subsequent GetValue calls may need to wait.

Stepping back... I think the low-level approach here would be to not do (implicit) row buffering. Crucially, this means it's the user's responsibility to read column values in the order in which they get delivered from the database; you can't read value 3 and then read value 1 (since we're just streaming columns). You also can't read the same value twice. This corresponds pretty much exactly to how CommandBehavior.SequentialAccess works today in ADO.NET.

Row buffering can be considered a "convenience" feature layered on top of this, allowing users to seek back in the row and read the same value multiple times. I agree it's still not necessary to pre-buffer the entire row in GetNextRow: I/O can simply be performed later - as needed - from GetValue, which would return ValueTask (so calls to values which happen to be already buffered are very cheap).

the user often knows their rows are small.

Yes. However through maintenance this has a habit of changing. If a small row is 3 items and the driver optimizes for that what happens when 2 years later someone adds 3 more fields. Do we see a slowdown? Is that consequence going to be obvious from the code?

I do think that programmers can be expected to know the general shape of their data (does a certain table contain huge rows or not), and that it's reasonable to expect stability here. If you don't want to make any assumptions (e.g. a dynamic data scenario), you can always use "sequential access" and stream columns - I don't think there's any downside to doing so, aside from a more difficult API to work with. Otherwise, if you use an API which does buffering (the ADO.NET default), and you encounter a huge row, your memory requirements will increase accordingly.

IanRingrose · Answer 85 · Wed Jan 04 2023 22:55:17 GMT+0800 (China Standard Time)

We know that a single row is nearly always small enough to hold in memory

How do we know this? I know that mine often are but working with the sqlclient driver has shown me bug replications where people do things I would never consider.

the user often knows their rows are small.

Yes. However through maintenance this has a habit of changing. If a small row is 3 items and the driver optimizes for that what happens when 2 years later someone adds 3 more fields. Do we see a slowdown? Is that consequence going to be obvious from the code?

I am thinking of a mode of operation when the user states upfront the rows are small and an exception is throw if the rows are large.

"Small row" being defined as under the size of a network packet so it is known that a row will never need more then two networks pockets being buffered in a "zero copy" implementation.

=======
Another option would be for ReadRowAndAllValue() generic method that returns a triplet of all the values. I don't know if this is practical to implement.

IanRingrose · Answer 86 · Wed Jan 04 2023 23:04:45 GMT+0800 (China Standard Time)

Should an optional callback based interface be provided so the user implementation OnNextRow() and OnNextStringValue(id, span) etc?

This would make it cheaper to read into an array(s) of structs for code that needs fast bulk loading. At present it can be faster to get SQLServer to write to a file and then read the file from .net, this should never be the case with a database access API.

Shay Rojansky · Answer 87 · Wed Jan 04 2023 23:16:19 GMT+0800 (China Standard Time)

I am thinking of a mode of operation when the user states upfront the rows are small and an exception is throw if the rows are large.

"Small row" being defined as under the size of a network packet so it is known that a row will never need more then two networks pockets being buffered in a "zero copy" implementation.

I think it's a good idea to stop and think exactly what kind of special optimizations we're thinking about for small rows vs. large rows. I think that the existing ADO.NET model isn't too bad here: an opt-in mode (SequentialAccess) for streaming the row and a default mode for, basically, buffering it. I don't know what "zero copy" means in this context: the driver needs to read raw bytes in any case, and decode them to a user-readable value (int, string); there shouldn't be any extra copying beyond that going on regardless of whether the row is small or not. The only impact that should have is whether we keep earlier columns in memory so the user can access them after later ones.

Another option would be for ReadRowAndAllValue() generic method that returns a triplet of all the values. I don't know if this is practical to implement.

That sounds like a very allocate-y API to get the row.

Should an optional callback based interface be provided so the user implementation OnNextRow() and OnNextStringValue(id, span) etc?
This would make it cheaper to read into an array(s) of structs for code that needs fast bulk loading. [...]

Not sure exactly why this is inherently faster than an API where application code calls ReadRow (it's basically just push vs. pull).

IanRingrose · Answer 88 · Wed Jan 04 2023 23:21:51 GMT+0800 (China Standard Time)

Should an optional callback based interface be provided so the user implementation OnNextRow() and OnNextStringValue(id, span) etc?
This would make it cheaper to read into an array(s) of structs for code that needs fast bulk loading. [...]

Not sure exactly why this is inherently faster than an API where application code calls ReadRow (it's basically just push vs. pull).

It advoids C# having to feed as many returned tasks into the generated state machine.

Wraith · Answer 89 · Thu Jan 05 2023 01:05:58 GMT+0800 (China Standard Time)

"Small row" being defined as under the size of a network packet so it is known that a row will never need more then two networks packets being buffered in a "zero copy" implementation.

The number of row values in a single packet depends on the data values, not even the types. The ability to identify if you had a specific set of data which did not span a packet is prohibitively difficult to calculate. You can also change the packet size in the connection string.

I agree with @roji that the current high level way of accessing the data is reasonable and covers most scenarios that we need. All other methods considered have clear downsides like requiring sequential access at all times of large upfront investment of developer time to ensure they avoid packet boundaries which should be handled by the driver transparently.

Shay Rojansky · Answer 90 · Thu Jan 05 2023 01:42:23 GMT+0800 (China Standard Time)

"Small row" being defined as under the size of a network packet so it is known that a row will never need more then two networks packets being buffered in a "zero copy" implementation.

The number of row values in a single packet depends on the data values, not even the types. The ability to identify if you had a specific set of data which did not span a packet is prohibitively difficult to calculate. You can also change the packet size in the connection string.

@Wraith2 at least from my perspective, the idea here wouldn't be to automatically calculate anything, but rather for the developer to use one API or another based on whether they want to stream the row or not, based on their knowledge of their data domain. In other words, pretty much what CommandBehavior.SequentialAccess already means in ADO.NET.

I definitely do think we need to provide a row sequential/streaming mode, which avoids having to buffer entire rows in memory; otherwise huge rows must require huge amounts of memory, which isn't feasible. This is already supposed to be done in all major ADO.NET drivers (via CommandBehavior.SequentialAccess).

Should an optional callback based interface be provided so the user implementation OnNextRow() and OnNextStringValue(id, span) etc?
This would make it cheaper to read into an array(s) of structs for code that needs fast bulk loading. [...]

Not sure exactly why this is inherently faster than an API where application code calls ReadRow (it's basically just push vs. pull).

It advoids C# having to feed as many returned tasks into the generated state machine.

I'm not sure we should shape the entire row-reading API around saving on the async state machine box for when ReadRow needs to complete asynchronously (note that at least in reasonable scenarios, many/most invocations of ReadRow return synchronously since (small) rows are already buffered in memory).

Shay Rojansky · Answer 91 · Tue Jul 02 2024 05:48:05 GMT+0800 (China Standard Time)

See #22 for a summary of the Woodstar experiment.