Skip to content

Fix database context loss on transparent reconnection#4130

Draft
paulmedynski wants to merge 4 commits intomainfrom
dev/paul/issue-4108
Draft

Fix database context loss on transparent reconnection#4130
paulmedynski wants to merge 4 commits intomainfrom
dev/paul/issue-4108

Conversation

@paulmedynski
Copy link
Copy Markdown
Contributor

@paulmedynski paulmedynski commented Apr 2, 2026

Fixes #4108

Summary

After switching the database via USE [db] or SqlConnection.ChangeDatabase(), a transparent reconnection (session recovery) silently reverts connection.Database to InitialCatalog instead of preserving the switched database context. This causes subsequent queries to run against the wrong database with no error or indication.

Root Cause

In CompleteLogin() (SqlConnectionInternal.cs), after a successful session recovery the client unconditionally trusts the server's ENV_CHANGE(ENV_DATABASE) response. If the server does not properly restore the database context — sending the initial catalog instead, or omitting the database ENV_CHANGE entirely — CurrentDatabase silently ends up wrong. The _recoverySessionData (which carries the correct database) is discarded without ever being compared against CurrentDatabase.

Server-Side Analysis

We examined the SQL Server engine's session recovery implementation (featureext.cpp, login.cpp, session.cpp). The server correctly implements database context recovery:

  • ParseSessionDataChunk parses the recovery database from the client's TDS feature request
  • FDetermineSessionDb uses a 3-source priority algorithm where the recovery database (Source #0) is mandatory for recovered connections — no silent fallback
  • FRecoverSessionStateFromTDS restores other session state (user options, identity, lock timeout, etc.) separately from database/language

The root cause is entirely client-side: CompleteLogin() did not verify the server's ENV_CHANGE response matched the recovery target.

Fix

In CompleteLogin(), after session recovery is acknowledged and encryption is verified:

  1. Read the expected database from _recoverySessionData._database (the database at disconnect time), falling back to _recoverySessionData._initialDatabase.
  2. Null _recoverySessionData (as before).
  3. Compare the expected database against CurrentDatabase (set by the server's ENV_CHANGE).
  4. If they differ, issue a USE [database] command over the wire to force the server to the correct database.
  5. Set CurrentDatabase to the recovered database as a final safety net.

This guarantees both client and server are on the same database after recovery. When they already match (the normal case with a well-behaved server), no USE is sent — zero overhead.

The fix is gated behind the Switch.Microsoft.Data.SqlClient.VerifyRecoveredDatabaseContext AppContext switch (default: true).

Changes

Production code

  • SqlConnectionInternal.csCompleteLogin(): Added database context verification and corrective USE command after session recovery
  • LocalAppContextSwitches.cs — Added VerifyRecoveredDatabaseContext AppContext switch

Unit tests (simulated TDS server, no SQL Server required)

  • DatabaseContextReconnectionTests.cs — 10 tests covering:
    • Baseline (2 tests): USE [db] and ChangeDatabase() update connection.Database
    • Proper recovery (3 tests): Server correctly restores DB — client reflects it (non-pooled via USE, non-pooled via ChangeDatabase, pooled)
    • Buggy recovery — wrong DB (2 tests): Server sends wrong DB in ENV_CHANGE — fix issues corrective USE
    • Buggy recovery — omitted ENV_CHANGE (2 tests): Server omits DB ENV_CHANGE entirely — fix issues corrective USE
    • No retry (1 test): ConnectRetryCount=0 — error surfaces, no silent context loss
  • Test infrastructure: DatabaseContextQueryEngine handles USE [db] commands; DisconnectableTdsServer with RecoveryDatabaseBehavior enum controls server behavior during recovery

Manual tests (require live SQL Server)

  • DatabaseContextReconnectionTest.cs — 9 integration tests:
    • USE and ChangeDatabase via KILL → reconnect → verify context preserved (non-pooled, pooled, MARS)
    • Stress loops (100 iterations) for USE and ChangeDatabase
    • CREATE TABLE after reconnect — DDL lands in correct database (strongest proof)
    • Stress CREATE TABLE loop (50 iterations) with variable workload and session state pollution
    • Multiple database switches before kill — last switch wins
    • Double kill — recovery data re-snapshotted correctly on second reconnection
    • Async code paths (ExecuteNonQueryAsync, ExecuteScalarAsync)

Analysis documents

  • plans/database_context/ — 7 documents covering architecture, reconnection flows, 7 identified issues (ranked by severity), recommendations, and SQL Server engine analysis

Other

  • Updated policy/coding-style.md — guidance on #region usage and line-width
  • Added .markdownlint.jsonc
  • Added DisconnectAll() to ServerEndPointHandler<T> and DisconnectAllClients() to GenericTdsServer<T>

Checklist

  • Tests added or updated
  • Public API changes documented (N/A — no public API changes)
  • Verified against customer repro
  • Ensure no breaking changes introduced
Adds unit tests that reproduce the scenario described in issue #4108:
after switching the database via USE [db] or ChangeDatabase(), a
transparent reconnection silently reverts connection.Database to
InitialCatalog instead of preserving the switched database.

Tests use a custom DatabaseContextQueryEngine (handles USE [db] with
proper EnvChange tokens) and a DisconnectableTdsServer that can sever
TCP connections while keeping the listener up for reconnection.

Three failing tests demonstrate the bug:
- UseDatabase_ConnectionDropped_DatabaseContextPreservedAfterReconnect
- ChangeDatabase_ConnectionDropped_DatabaseContextPreservedAfterReconnect
- UseDatabase_ConnectionDropped_Pooled_DatabaseContextPreservedAfterReconnect

Three passing tests verify baselines:
- UseDatabaseCommand_UpdatesConnectionDatabaseProperty
- ChangeDatabase_UpdatesConnectionDatabaseProperty
- UseDatabase_ConnectionDropped_NoRetry_ThrowsOnNextCommand

Also adds:
- DisconnectAllClients() to GenericTdsServer<T> / DisconnectAll() to
  ServerEndPointHandler<T> for test infrastructure
- Analysis documents under plans/database_context/
- Coding style updates (regions, line width guidance)
@paulmedynski paulmedynski added this to the 7.1.0-preview1 milestone Apr 2, 2026
Copilot AI review requested due to automatic review settings April 2, 2026 11:43
@github-project-automation github-project-automation bot moved this to To triage in SqlClient Board Apr 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds simulated-server coverage and supporting test infrastructure to reproduce #4108 (database context reverting after transparent reconnection), along with internal analysis docs and small repo policy/tooling updates.

Changes:

  • Added new simulated TDS server unit tests covering USE [db] / ChangeDatabase() and post-disconnect reconnection behavior.
  • Extended the TDS test server utilities to forcibly disconnect active clients while keeping the listener running.
  • Added analysis documents under plans/database_context/, updated coding-style guidance, and introduced a markdownlint configuration.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/Microsoft.Data.SqlClient/tests/UnitTests/SimulatedServerTests/DatabaseContextReconnectionTests.cs New unit tests and a custom query engine to simulate USE [db] and validate database context across reconnects.
src/Microsoft.Data.SqlClient/tests/tools/TDS/TDS.Servers/GenericTdsServer.cs Adds DisconnectAllClients() helper to drop all active client connections.
src/Microsoft.Data.SqlClient/tests/tools/TDS/TDS.EndPoint/TDSServerEndPoint.cs Adds DisconnectAll() implementation to dispose all active endpoint connections without stopping the listener.
policy/coding-style.md Updates style guidance (line wrapping and #region usage).
plans/database_context/00-overview.md Overview of the database-context reconnection investigation.
plans/database_context/01-architecture.md Architecture notes on session/database tracking and recovery.
plans/database_context/02-flows.md Enumerates reconnection flows and whether DB context is preserved.
plans/database_context/03-issues.md Lists identified issues/gaps and severity.
plans/database_context/04-recommendations.md Proposed fixes and test recommendations.
plans/database_context/05-reconnection-and-retry-mechanisms.md Catalogues retry/reconnect mechanisms and DB-context implications.
.markdownlint.jsonc Adds markdownlint config aligned to the repo’s line-length policy.
Comment on lines +18 to +23
/// Tests for database context preservation across reconnections.
/// Reproduces the scenario from dotnet/SqlClient#4108: after executing USE [db] and then
/// losing the connection, the reconnected session should retain the switched database.
/// Baseline tests (no reconnection) pass. The three reconnection tests are expected to
/// FAIL until issue #4108 is fixed — they demonstrate that <c>connection.Database</c>
/// silently reverts to <c>InitialCatalog</c> after the physical connection is replaced.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description/title indicate this PR fixes #4108, but this test file’s header comment states the reconnection tests are “expected to FAIL until issue #4108 is fixed”. Either include the product fix in this PR, or adjust the PR title/description and/or skip/quarantine the failing tests so CI stays green.

Copilot uses AI. Check for mistakes.
Comment on lines +200 to +206
[Fact]
public void UseDatabase_ConnectionDropped_DatabaseContextPreservedAfterReconnect()
{
using DisconnectableTdsServer server = new();
SqlConnectionStringBuilder builder = CreateConnectionStringBuilder(server.Port);

using SqlConnection connection = new(builder.ConnectionString);
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These reconnection tests are currently regular [Fact] tests, but the file comments indicate they are expected to fail until the product bug is fixed. Please mark them as skipped/quarantined (e.g., [Fact(Skip=...)], or Trait(Category,"failing")/"flaky" per repo conventions) or land them together with the driver fix; otherwise this PR will break CI.

Copilot uses AI. Check for mistakes.
Comment on lines +238 to +245
// Issue #4108 core assertion: regardless of whether reconnection succeeded or
// failed, the Database property must not have silently reverted to the initial catalog.
Assert.NotEqual(InitialDatabase, connection.Database);

if (reconnected)
{
Assert.Equal(SwitchedDatabase, connection.Database);
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test allows the post-disconnect command to throw, but still asserts connection.Database != InitialDatabase even when ExecuteNonQuery() failed. If the connection transitions to Closed/Broken after the exception, SqlConnection.Database is allowed to reflect the connection string’s Initial Catalog, making this assertion unreliable and potentially failing for the wrong reason. Consider asserting the database only when reconnection succeeds; otherwise assert the exception/state explicitly.

Copilot uses AI. Check for mistakes.
Comment on lines +315 to +320
Assert.NotEqual(InitialDatabase, connection.Database);

if (reconnected)
{
Assert.Equal(SwitchedDatabase, connection.Database);
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as the USE reconnection test: the test treats reconnection failure as acceptable, but then asserts connection.Database != InitialDatabase unconditionally. That can produce false failures if the connection is no longer open after the exception. Gate the database assertion on reconnected == true, and otherwise assert the expected failure mode (exception/connection state).

Copilot uses AI. Check for mistakes.
Comment on lines +366 to +371
Assert.NotEqual(InitialDatabase, connection.Database);

if (reconnected)
{
Assert.Equal(SwitchedDatabase, connection.Database);
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same unconditional post-exception assertion issue here (Assert.NotEqual(InitialDatabase, connection.Database)). If the reconnect attempt fails and the connection ends up Closed/Broken, Database may legitimately report the Initial Catalog, causing a false negative. Only assert the database context when reconnection succeeds; otherwise assert the expected exception/state.

Suggested change
Assert.NotEqual(InitialDatabase, connection.Database);
if (reconnected)
{
Assert.Equal(SwitchedDatabase, connection.Database);
}
if (reconnected)
{
// On successful reconnection, the database context should remain switched.
Assert.NotEqual(InitialDatabase, connection.Database);
Assert.Equal(SwitchedDatabase, connection.Database);
}
else
{
// If reconnection failed, the connection should not be open; database context is undefined.
Assert.NotEqual(ConnectionState.Open, connection.State);
}
Copilot uses AI. Check for mistakes.
Comment on lines +343 to +374
connection.Open();

using (SqlCommand cmd = new($"USE [{SwitchedDatabase}]", connection))
{
cmd.ExecuteNonQuery();
}

Assert.Equal(SwitchedDatabase, connection.Database);

server.DisconnectAllClients();

bool reconnected = false;
try
{
using SqlCommand cmd = new("SELECT 1", connection);
cmd.ExecuteNonQuery();
reconnected = true;
}
catch (SqlException)
{
// Reconnection may fail — acceptable.
}

Assert.NotEqual(InitialDatabase, connection.Database);

if (reconnected)
{
Assert.Equal(SwitchedDatabase, connection.Database);
}

// Clean up the pool for this connection string so it doesn't leak into other tests.
SqlConnection.ClearPool(connection);
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SqlConnection.ClearPool(connection) is intended to prevent leaking pooled connections into other tests, but it only runs at the end of the test. If an assertion throws earlier, the pool won’t be cleared and can contaminate subsequent tests. Wrap the pooled portion in a try/finally so pool cleanup always executes.

Suggested change
connection.Open();
using (SqlCommand cmd = new($"USE [{SwitchedDatabase}]", connection))
{
cmd.ExecuteNonQuery();
}
Assert.Equal(SwitchedDatabase, connection.Database);
server.DisconnectAllClients();
bool reconnected = false;
try
{
using SqlCommand cmd = new("SELECT 1", connection);
cmd.ExecuteNonQuery();
reconnected = true;
}
catch (SqlException)
{
// Reconnection may fail — acceptable.
}
Assert.NotEqual(InitialDatabase, connection.Database);
if (reconnected)
{
Assert.Equal(SwitchedDatabase, connection.Database);
}
// Clean up the pool for this connection string so it doesn't leak into other tests.
SqlConnection.ClearPool(connection);
try
{
connection.Open();
using (SqlCommand cmd = new($"USE [{SwitchedDatabase}]", connection))
{
cmd.ExecuteNonQuery();
}
Assert.Equal(SwitchedDatabase, connection.Database);
server.DisconnectAllClients();
bool reconnected = false;
try
{
using SqlCommand cmd = new("SELECT 1", connection);
cmd.ExecuteNonQuery();
reconnected = true;
}
catch (SqlException)
{
// Reconnection may fail — acceptable.
}
Assert.NotEqual(InitialDatabase, connection.Database);
if (reconnected)
{
Assert.Equal(SwitchedDatabase, connection.Database);
}
}
finally
{
// Clean up the pool for this connection string so it doesn't leak into other tests.
SqlConnection.ClearPool(connection);
}
Copilot uses AI. Check for mistakes.
@paulmedynski paulmedynski moved this from To triage to In progress in SqlClient Board Apr 2, 2026
@paulmedynski paulmedynski added the Hotfix 7.0.1 When this PR merges, automatically open a PR to cherry-pick to the 7.0.1 branch label Apr 2, 2026
@paulmedynski
Copy link
Copy Markdown
Contributor Author

We see the expected 3 unit tests failing here:

https://sqlclientdrivers.visualstudio.com/public/_build/results?buildId=145859&view=results

The next commit will contain fixes to the codebase, and those tests should pass.

mdaigle and others added 2 commits April 2, 2026 15:34
…s a "use <database>" command if the client and server's notion of the current database differs.
@paulmedynski paulmedynski removed the Hotfix 7.0.1 When this PR merges, automatically open a PR to cherry-pick to the 7.0.1 branch label Apr 2, 2026
Copilot AI review requested due to automatic review settings April 2, 2026 21:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/Microsoft.Data.SqlClient/ref/Microsoft.Data.SqlClient.csproj:58

  • MSBuild property PowerShellCommand becomes self-referential here: it’s first set to powershell.exe/pwsh and then overwritten with a value that expands $(PowerShellCommand) ..., which can create a circular property expansion at build time. Consider splitting into two properties (e.g., PowerShellExe + PowerShellArgs/TrimDocsCommand) or use a differently named property for the command line so the executable selection isn’t overwritten.
      <PowerShellCommand Condition="'$(OS)' == 'Windows_NT'">powershell.exe</PowerShellCommand>
      <PowerShellCommand Condition="'$(OS)' != 'Windows_NT'">pwsh</PowerShellCommand>
      <PowerShellCommand>
        $(PowerShellCommand)
          -NonInteractive
          -ExecutionPolicy Unrestricted
          -Command "$(RepoRoot)tools\intellisense\TrimDocs.ps1 -inputFile '$(DocumentationFile)' -outputFile '$(DocumentationFile)'"
      </PowerShellCommand>
Comment on lines +648 to +656
/// silent database context loss across transparent reconnections.
///
/// The default value of this switch is false.
/// </summary>
public static bool VerifyRecoveredDatabaseContext =>
AcquireAndReturn(
VerifyRecoveredDatabaseContextString,
defaultValue: false,
ref s_verifyRecoveredDatabaseContext);
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description and plans/database_context/00-overview.md state this switch defaults to true, but the implementation sets defaultValue: false (and unit tests assert false). Please align the code + tests + docs/PR description on the intended default, since it changes whether the defensive USE correction runs by default.

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +43
server are on the correct database after recovery, regardless of server behavior.

The fix is gated behind the `Switch.Microsoft.Data.SqlClient.VerifyRecoveredDatabaseContext`
AppContext switch (default: `true`). Manual tests set it to `false` to confirm the server-only path
fails without the fix.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc says VerifyRecoveredDatabaseContext defaults to true, but LocalAppContextSwitches.VerifyRecoveredDatabaseContext currently defaults to false. Please update this document (or the switch default) so the documented behavior matches the implementation.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +55

Key handling for the database field:

```cpp
// Parse RecoveryDatabase, BYTE len + WCHAR data
if (0 != cchByteLen)
{
pTdsSS->SetRecoverDb(
reinterpret_cast<const WCHAR *>(pbCurr + sizeof(BYTE)), cchByteLen);
}
```
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document appears to quote and summarize non-public SQL Server engine source files (paths like /sql/ntdbms/...) and includes verbatim C++ snippets. Please remove proprietary/internal source excerpts and replace with high-level descriptions and/or references to publicly available specs (e.g., MS-TDS), to avoid leaking copyrighted/confidential material into this repo.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants