Fix database context loss on transparent reconnection#4130
Fix database context loss on transparent reconnection#4130paulmedynski wants to merge 4 commits intomainfrom
Conversation
Adds unit tests that reproduce the scenario described in issue #4108: after switching the database via USE [db] or ChangeDatabase(), a transparent reconnection silently reverts connection.Database to InitialCatalog instead of preserving the switched database. Tests use a custom DatabaseContextQueryEngine (handles USE [db] with proper EnvChange tokens) and a DisconnectableTdsServer that can sever TCP connections while keeping the listener up for reconnection. Three failing tests demonstrate the bug: - UseDatabase_ConnectionDropped_DatabaseContextPreservedAfterReconnect - ChangeDatabase_ConnectionDropped_DatabaseContextPreservedAfterReconnect - UseDatabase_ConnectionDropped_Pooled_DatabaseContextPreservedAfterReconnect Three passing tests verify baselines: - UseDatabaseCommand_UpdatesConnectionDatabaseProperty - ChangeDatabase_UpdatesConnectionDatabaseProperty - UseDatabase_ConnectionDropped_NoRetry_ThrowsOnNextCommand Also adds: - DisconnectAllClients() to GenericTdsServer<T> / DisconnectAll() to ServerEndPointHandler<T> for test infrastructure - Analysis documents under plans/database_context/ - Coding style updates (regions, line width guidance)
There was a problem hiding this comment.
Pull request overview
Adds simulated-server coverage and supporting test infrastructure to reproduce #4108 (database context reverting after transparent reconnection), along with internal analysis docs and small repo policy/tooling updates.
Changes:
- Added new simulated TDS server unit tests covering
USE [db]/ChangeDatabase()and post-disconnect reconnection behavior. - Extended the TDS test server utilities to forcibly disconnect active clients while keeping the listener running.
- Added analysis documents under
plans/database_context/, updated coding-style guidance, and introduced a markdownlint configuration.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Microsoft.Data.SqlClient/tests/UnitTests/SimulatedServerTests/DatabaseContextReconnectionTests.cs | New unit tests and a custom query engine to simulate USE [db] and validate database context across reconnects. |
| src/Microsoft.Data.SqlClient/tests/tools/TDS/TDS.Servers/GenericTdsServer.cs | Adds DisconnectAllClients() helper to drop all active client connections. |
| src/Microsoft.Data.SqlClient/tests/tools/TDS/TDS.EndPoint/TDSServerEndPoint.cs | Adds DisconnectAll() implementation to dispose all active endpoint connections without stopping the listener. |
| policy/coding-style.md | Updates style guidance (line wrapping and #region usage). |
| plans/database_context/00-overview.md | Overview of the database-context reconnection investigation. |
| plans/database_context/01-architecture.md | Architecture notes on session/database tracking and recovery. |
| plans/database_context/02-flows.md | Enumerates reconnection flows and whether DB context is preserved. |
| plans/database_context/03-issues.md | Lists identified issues/gaps and severity. |
| plans/database_context/04-recommendations.md | Proposed fixes and test recommendations. |
| plans/database_context/05-reconnection-and-retry-mechanisms.md | Catalogues retry/reconnect mechanisms and DB-context implications. |
| .markdownlint.jsonc | Adds markdownlint config aligned to the repo’s line-length policy. |
| /// Tests for database context preservation across reconnections. | ||
| /// Reproduces the scenario from dotnet/SqlClient#4108: after executing USE [db] and then | ||
| /// losing the connection, the reconnected session should retain the switched database. | ||
| /// Baseline tests (no reconnection) pass. The three reconnection tests are expected to | ||
| /// FAIL until issue #4108 is fixed — they demonstrate that <c>connection.Database</c> | ||
| /// silently reverts to <c>InitialCatalog</c> after the physical connection is replaced. |
There was a problem hiding this comment.
The PR description/title indicate this PR fixes #4108, but this test file’s header comment states the reconnection tests are “expected to FAIL until issue #4108 is fixed”. Either include the product fix in this PR, or adjust the PR title/description and/or skip/quarantine the failing tests so CI stays green.
| [Fact] | ||
| public void UseDatabase_ConnectionDropped_DatabaseContextPreservedAfterReconnect() | ||
| { | ||
| using DisconnectableTdsServer server = new(); | ||
| SqlConnectionStringBuilder builder = CreateConnectionStringBuilder(server.Port); | ||
|
|
||
| using SqlConnection connection = new(builder.ConnectionString); |
There was a problem hiding this comment.
These reconnection tests are currently regular [Fact] tests, but the file comments indicate they are expected to fail until the product bug is fixed. Please mark them as skipped/quarantined (e.g., [Fact(Skip=...)], or Trait(Category,"failing")/"flaky" per repo conventions) or land them together with the driver fix; otherwise this PR will break CI.
| // Issue #4108 core assertion: regardless of whether reconnection succeeded or | ||
| // failed, the Database property must not have silently reverted to the initial catalog. | ||
| Assert.NotEqual(InitialDatabase, connection.Database); | ||
|
|
||
| if (reconnected) | ||
| { | ||
| Assert.Equal(SwitchedDatabase, connection.Database); | ||
| } |
There was a problem hiding this comment.
This test allows the post-disconnect command to throw, but still asserts connection.Database != InitialDatabase even when ExecuteNonQuery() failed. If the connection transitions to Closed/Broken after the exception, SqlConnection.Database is allowed to reflect the connection string’s Initial Catalog, making this assertion unreliable and potentially failing for the wrong reason. Consider asserting the database only when reconnection succeeds; otherwise assert the exception/state explicitly.
| Assert.NotEqual(InitialDatabase, connection.Database); | ||
|
|
||
| if (reconnected) | ||
| { | ||
| Assert.Equal(SwitchedDatabase, connection.Database); | ||
| } |
There was a problem hiding this comment.
Same issue as the USE reconnection test: the test treats reconnection failure as acceptable, but then asserts connection.Database != InitialDatabase unconditionally. That can produce false failures if the connection is no longer open after the exception. Gate the database assertion on reconnected == true, and otherwise assert the expected failure mode (exception/connection state).
| Assert.NotEqual(InitialDatabase, connection.Database); | ||
|
|
||
| if (reconnected) | ||
| { | ||
| Assert.Equal(SwitchedDatabase, connection.Database); | ||
| } |
There was a problem hiding this comment.
Same unconditional post-exception assertion issue here (Assert.NotEqual(InitialDatabase, connection.Database)). If the reconnect attempt fails and the connection ends up Closed/Broken, Database may legitimately report the Initial Catalog, causing a false negative. Only assert the database context when reconnection succeeds; otherwise assert the expected exception/state.
| Assert.NotEqual(InitialDatabase, connection.Database); | |
| if (reconnected) | |
| { | |
| Assert.Equal(SwitchedDatabase, connection.Database); | |
| } | |
| if (reconnected) | |
| { | |
| // On successful reconnection, the database context should remain switched. | |
| Assert.NotEqual(InitialDatabase, connection.Database); | |
| Assert.Equal(SwitchedDatabase, connection.Database); | |
| } | |
| else | |
| { | |
| // If reconnection failed, the connection should not be open; database context is undefined. | |
| Assert.NotEqual(ConnectionState.Open, connection.State); | |
| } |
| connection.Open(); | ||
|
|
||
| using (SqlCommand cmd = new($"USE [{SwitchedDatabase}]", connection)) | ||
| { | ||
| cmd.ExecuteNonQuery(); | ||
| } | ||
|
|
||
| Assert.Equal(SwitchedDatabase, connection.Database); | ||
|
|
||
| server.DisconnectAllClients(); | ||
|
|
||
| bool reconnected = false; | ||
| try | ||
| { | ||
| using SqlCommand cmd = new("SELECT 1", connection); | ||
| cmd.ExecuteNonQuery(); | ||
| reconnected = true; | ||
| } | ||
| catch (SqlException) | ||
| { | ||
| // Reconnection may fail — acceptable. | ||
| } | ||
|
|
||
| Assert.NotEqual(InitialDatabase, connection.Database); | ||
|
|
||
| if (reconnected) | ||
| { | ||
| Assert.Equal(SwitchedDatabase, connection.Database); | ||
| } | ||
|
|
||
| // Clean up the pool for this connection string so it doesn't leak into other tests. | ||
| SqlConnection.ClearPool(connection); |
There was a problem hiding this comment.
SqlConnection.ClearPool(connection) is intended to prevent leaking pooled connections into other tests, but it only runs at the end of the test. If an assertion throws earlier, the pool won’t be cleared and can contaminate subsequent tests. Wrap the pooled portion in a try/finally so pool cleanup always executes.
| connection.Open(); | |
| using (SqlCommand cmd = new($"USE [{SwitchedDatabase}]", connection)) | |
| { | |
| cmd.ExecuteNonQuery(); | |
| } | |
| Assert.Equal(SwitchedDatabase, connection.Database); | |
| server.DisconnectAllClients(); | |
| bool reconnected = false; | |
| try | |
| { | |
| using SqlCommand cmd = new("SELECT 1", connection); | |
| cmd.ExecuteNonQuery(); | |
| reconnected = true; | |
| } | |
| catch (SqlException) | |
| { | |
| // Reconnection may fail — acceptable. | |
| } | |
| Assert.NotEqual(InitialDatabase, connection.Database); | |
| if (reconnected) | |
| { | |
| Assert.Equal(SwitchedDatabase, connection.Database); | |
| } | |
| // Clean up the pool for this connection string so it doesn't leak into other tests. | |
| SqlConnection.ClearPool(connection); | |
| try | |
| { | |
| connection.Open(); | |
| using (SqlCommand cmd = new($"USE [{SwitchedDatabase}]", connection)) | |
| { | |
| cmd.ExecuteNonQuery(); | |
| } | |
| Assert.Equal(SwitchedDatabase, connection.Database); | |
| server.DisconnectAllClients(); | |
| bool reconnected = false; | |
| try | |
| { | |
| using SqlCommand cmd = new("SELECT 1", connection); | |
| cmd.ExecuteNonQuery(); | |
| reconnected = true; | |
| } | |
| catch (SqlException) | |
| { | |
| // Reconnection may fail — acceptable. | |
| } | |
| Assert.NotEqual(InitialDatabase, connection.Database); | |
| if (reconnected) | |
| { | |
| Assert.Equal(SwitchedDatabase, connection.Database); | |
| } | |
| } | |
| finally | |
| { | |
| // Clean up the pool for this connection string so it doesn't leak into other tests. | |
| SqlConnection.ClearPool(connection); | |
| } |
|
We see the expected 3 unit tests failing here: https://sqlclientdrivers.visualstudio.com/public/_build/results?buildId=145859&view=results The next commit will contain fixes to the codebase, and those tests should pass. |
…s a "use <database>" command if the client and server's notion of the current database differs.
… reproduces the issue.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/Microsoft.Data.SqlClient/ref/Microsoft.Data.SqlClient.csproj:58
- MSBuild property
PowerShellCommandbecomes self-referential here: it’s first set topowershell.exe/pwshand then overwritten with a value that expands$(PowerShellCommand) ..., which can create a circular property expansion at build time. Consider splitting into two properties (e.g.,PowerShellExe+PowerShellArgs/TrimDocsCommand) or use a differently named property for the command line so the executable selection isn’t overwritten.
<PowerShellCommand Condition="'$(OS)' == 'Windows_NT'">powershell.exe</PowerShellCommand>
<PowerShellCommand Condition="'$(OS)' != 'Windows_NT'">pwsh</PowerShellCommand>
<PowerShellCommand>
$(PowerShellCommand)
-NonInteractive
-ExecutionPolicy Unrestricted
-Command "$(RepoRoot)tools\intellisense\TrimDocs.ps1 -inputFile '$(DocumentationFile)' -outputFile '$(DocumentationFile)'"
</PowerShellCommand>
| /// silent database context loss across transparent reconnections. | ||
| /// | ||
| /// The default value of this switch is false. | ||
| /// </summary> | ||
| public static bool VerifyRecoveredDatabaseContext => | ||
| AcquireAndReturn( | ||
| VerifyRecoveredDatabaseContextString, | ||
| defaultValue: false, | ||
| ref s_verifyRecoveredDatabaseContext); |
There was a problem hiding this comment.
The PR description and plans/database_context/00-overview.md state this switch defaults to true, but the implementation sets defaultValue: false (and unit tests assert false). Please align the code + tests + docs/PR description on the intended default, since it changes whether the defensive USE correction runs by default.
| server are on the correct database after recovery, regardless of server behavior. | ||
|
|
||
| The fix is gated behind the `Switch.Microsoft.Data.SqlClient.VerifyRecoveredDatabaseContext` | ||
| AppContext switch (default: `true`). Manual tests set it to `false` to confirm the server-only path | ||
| fails without the fix. |
There was a problem hiding this comment.
This doc says VerifyRecoveredDatabaseContext defaults to true, but LocalAppContextSwitches.VerifyRecoveredDatabaseContext currently defaults to false. Please update this document (or the switch default) so the documented behavior matches the implementation.
|
|
||
| Key handling for the database field: | ||
|
|
||
| ```cpp | ||
| // Parse RecoveryDatabase, BYTE len + WCHAR data | ||
| if (0 != cchByteLen) | ||
| { | ||
| pTdsSS->SetRecoverDb( | ||
| reinterpret_cast<const WCHAR *>(pbCurr + sizeof(BYTE)), cchByteLen); | ||
| } | ||
| ``` |
There was a problem hiding this comment.
This document appears to quote and summarize non-public SQL Server engine source files (paths like /sql/ntdbms/...) and includes verbatim C++ snippets. Please remove proprietary/internal source excerpts and replace with high-level descriptions and/or references to publicly available specs (e.g., MS-TDS), to avoid leaking copyrighted/confidential material into this repo.
Fixes #4108
Summary
After switching the database via
USE [db]orSqlConnection.ChangeDatabase(), a transparent reconnection (session recovery) silently revertsconnection.DatabasetoInitialCataloginstead of preserving the switched database context. This causes subsequent queries to run against the wrong database with no error or indication.Root Cause
In
CompleteLogin()(SqlConnectionInternal.cs), after a successful session recovery the client unconditionally trusts the server'sENV_CHANGE(ENV_DATABASE)response. If the server does not properly restore the database context — sending the initial catalog instead, or omitting the databaseENV_CHANGEentirely —CurrentDatabasesilently ends up wrong. The_recoverySessionData(which carries the correct database) is discarded without ever being compared againstCurrentDatabase.Server-Side Analysis
We examined the SQL Server engine's session recovery implementation (
featureext.cpp,login.cpp,session.cpp). The server correctly implements database context recovery:ParseSessionDataChunkparses the recovery database from the client's TDS feature requestFDetermineSessionDbuses a 3-source priority algorithm where the recovery database (Source #0) is mandatory for recovered connections — no silent fallbackFRecoverSessionStateFromTDSrestores other session state (user options, identity, lock timeout, etc.) separately from database/languageThe root cause is entirely client-side:
CompleteLogin()did not verify the server'sENV_CHANGEresponse matched the recovery target.Fix
In
CompleteLogin(), after session recovery is acknowledged and encryption is verified:_recoverySessionData._database(the database at disconnect time), falling back to_recoverySessionData._initialDatabase._recoverySessionData(as before).CurrentDatabase(set by the server'sENV_CHANGE).USE [database]command over the wire to force the server to the correct database.CurrentDatabaseto the recovered database as a final safety net.This guarantees both client and server are on the same database after recovery. When they already match (the normal case with a well-behaved server), no
USEis sent — zero overhead.The fix is gated behind the
Switch.Microsoft.Data.SqlClient.VerifyRecoveredDatabaseContextAppContext switch (default:true).Changes
Production code
SqlConnectionInternal.cs—CompleteLogin(): Added database context verification and correctiveUSEcommand after session recoveryLocalAppContextSwitches.cs— AddedVerifyRecoveredDatabaseContextAppContext switchUnit tests (simulated TDS server, no SQL Server required)
DatabaseContextReconnectionTests.cs— 10 tests covering:USE [db]andChangeDatabase()updateconnection.DatabaseENV_CHANGE— fix issues correctiveUSEENV_CHANGEentirely — fix issues correctiveUSEConnectRetryCount=0— error surfaces, no silent context lossDatabaseContextQueryEnginehandlesUSE [db]commands;DisconnectableTdsServerwithRecoveryDatabaseBehaviorenum controls server behavior during recoveryManual tests (require live SQL Server)
DatabaseContextReconnectionTest.cs— 9 integration tests:KILL→ reconnect → verify context preserved (non-pooled, pooled, MARS)Analysis documents
plans/database_context/— 7 documents covering architecture, reconnection flows, 7 identified issues (ranked by severity), recommendations, and SQL Server engine analysisOther
policy/coding-style.md— guidance on#regionusage and line-width.markdownlint.jsoncDisconnectAll()toServerEndPointHandler<T>andDisconnectAllClients()toGenericTdsServer<T>Checklist