The Wayback Machine - https://web.archive.org/web/20201011072331/https://github.com/MagicStack/asyncpg/pull/295
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch executemany #295

Open
wants to merge 1 commit into
base: master
from
Open

Conversation

@fantix
Copy link
Member

@fantix fantix commented May 29, 2018

Refs #289, this is a rewrite of bind_execute_many():

  • Batch bind args into 4 x 32KB buffers, sent with writelines() once flow control turns green.
  • Send one SYNC at the very last - making use of the implicit transaction for the whole call to executemany().
  • Abort immediately on either client error (with explicit rollback) or server error.
  • Support async iterable.
  • Add executemany() to prepared statement.
  • test_server_failure_during_writes is failing on Windows
  • Fix unstable test_timeout (test_execute.TestExecuteMany)

This is a breaking change due to the implicit transaction.

executemany

The 3-level infinite loop keeps filling buffers and send them under flow control, the only exit is raising StopIteration with 3 possible values:

  • True means there was data sent in the last loop
  • False means the last loop sent nothing
  • exception means the data source raised an exception

Please note, during the loop, if server-side error is detected before exhausting the data source, it is also considered as StopIteration(True), ending the process early with a SYNC_MESSAGE.

(I wrote a post about the details of implicit transaction.)

UPDATED - pgbench results of inserting 1000 rows per query with executemany() on Python 3.6 of 2.2GHz 2015 MacBook Air (best out of 5 runs):

asyncpg 0.18.2

710 queries in 30.31 seconds
Latency: min 341.88ms; max 636.29ms; mean 425.022ms; std: 39.782ms (9.36%)
Latency distribution: 25% under 401.67ms; 50% under 414.26ms; 75% under 435.37ms; 90% under 478.39ms; 99% under 576.638ms; 99.99% under 636.299ms
Queries/sec: 23.42
Rows/sec: 23424.32

This PR:

4125 queries in 30.02 seconds
Latency: min 23.14ms; max 734.91ms; mean 72.723ms; std: 49.226ms (67.69%)
Latency distribution: 25% under 59.958ms; 50% under 65.414ms; 75% under 71.538ms; 90% under 80.95ms; 99% under 175.375ms; 99.99% under 734.912ms
Queries/sec: 137.39
Rows/sec: 137389.64

Most branches are covered by tests, overall coverage 85% (1352 miss) -> 85% (1356 miss).

@fantix fantix changed the title [WIP] Batch executemany Batch executemany May 29, 2018
asyncpg/protocol/coreproto.pyx Outdated Show resolved Hide resolved
asyncpg/protocol/coreproto.pyx Outdated Show resolved Hide resolved
fantix added a commit to fantix/asyncpg that referenced this pull request May 30, 2018
asyncpg/protocol/coreproto.pyx Outdated Show resolved Hide resolved
Copy link
Member

@elprans elprans left a comment

Other than the transaction status check, LGTM! Thanks!

asyncpg/protocol/protocol.pyx Outdated Show resolved Hide resolved
asyncpg/protocol/protocol.pyx Outdated Show resolved Hide resolved
asyncpg/protocol/protocol.pyx Outdated Show resolved Hide resolved
@fantix
Copy link
Member Author

@fantix fantix commented Jun 23, 2018

Update - I've met a racing condition issue during fixing tests. Trying to find time to fix that.

@elprans
Copy link
Member

@elprans elprans commented Aug 28, 2018

Hey @fantix! Are you still working on this?

@fantix
Copy link
Member Author

@fantix fantix commented Aug 28, 2018

Hi! Yes I’m trying to find some time to fix the racing issue, sorry for the delay!

@elprans
Copy link
Member

@elprans elprans commented Sep 18, 2018

@fantix, I just merged #361 and #362, which may help improve the situation with the race condition you were seeing.

@fantix
Copy link
Member Author

@fantix fantix commented Oct 25, 2018

Working on this now - rebased and I'll review the changes, and deal with the racing issue.

@fantix fantix force-pushed the fantix:t289_batch_executemany branch 5 times, most recently from f62a347 to 06b412b Nov 26, 2018
@fantix
Copy link
Member Author

@fantix fantix commented Nov 26, 2018

Sorry for the delay! This PR is updated now. Benchmark shows the improvement is consistent. I've also added a diagram to explain the details.

@fantix fantix force-pushed the fantix:t289_batch_executemany branch 7 times, most recently from 46ecc07 to 6c206e9 Nov 28, 2018
@elprans elprans mentioned this pull request Mar 22, 2019
@elprans
Copy link
Member

@elprans elprans commented May 8, 2019

@fantix The test_timeout test is flaky because of pipelining. All commands, including the final Sync can actually go in under 500ms, so the changes are persisted despite the TimeoutError. You might get better luck by passing a generator with time.sleep() in it to executemany().

@elprans
Copy link
Member

@elprans elprans commented Oct 3, 2019

Hey @fantix, I made a fix to the timeout issue here: bbf4d55 Take a look when you have time, squash into your PR, and and let's merge this! I'm planning to cut 0.19.0 this week ideally.

@fantix
Copy link
Member Author

@fantix fantix commented Oct 3, 2019

Got it, thanks a lot!

Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take
advantage of `writelines()`. A single `Sync` is sent at last, so that
all args live in the same transaction.

Closes: #289
@fantix fantix force-pushed the fantix:t289_batch_executemany branch from 6c206e9 to 4be5198 Oct 7, 2019
@Andrei-Pozolotin
Copy link

@Andrei-Pozolotin Andrei-Pozolotin commented Dec 29, 2019

curious: what is missing to make this accepted?

@victoraugustolls
Copy link

@victoraugustolls victoraugustolls commented Jan 8, 2020

@elprans is this okay to be merged? I would be happy to help if necessary, looking forward to it as I have a similar problem as #420 !

@gjeusel
Copy link

@gjeusel gjeusel commented Feb 23, 2020

In awaiting for the PR to be merged, here is a neat trick to get some performances in bulk upsert: credits to Schinckel

It consists in:

  • creating a temporary table
  • copy_records_to_table (binary copy) onto this table
  • insert into the original table with the "ON CONFLICT" statement, selecting from the temporary one
@Syniex
Copy link

@Syniex Syniex commented Aug 2, 2020

Is there a reason it doesn't get merged?

@robd003
Copy link

@robd003 robd003 commented Sep 29, 2020

@elprans any chance this can get merged in after the conflict in asyncpg/protocol/protocol.pyx is fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
8 participants
You can’t perform that action at this time.