Revisions to How does a PostgreSQL cursor actually work? What are their tradeoffs?

deleted 1 character in body

Source Link

edited Oct 17 at 14:02

30.9k
3
59
110

I recently read about database cursors and how they are often presented as a magic solution for fetching large datasets. This seems like it could solve my exact problem.

This is a confused idea on a number of levels. It's a common misconception that using a cursor is a choice you make when running a query. In reality, there's always a cursor involved. The only choice you are making is whether you control it explicitly or not. When you query a large set of results, the JDBC library will fetch results in batches. The database will keep an open cursor on your results until you finish reading all the results (batches) or cancel the query.

But moreover, cursors have nothing to do with your issue and can't solve it. Cursors exist on the database. Your 'cursor' object Java in your program is just an abstract representation of that database object. It's not really 'the cursor'.

An OutOfMemoryError (OOME) is a client side-side issue. Specifically, it is thrown by a JVM when your program tries to allocate new Objects and there is not enough space in the heap even after a full garbage collection. In other words, the OOME is being caused be causebecause you are trying to store all the results in the heap and you either: don't have enough RAM or, your max heap size has been reached.

The simplest possible solution is to increase your max heap size. If you have enough available memory on your system, this should resolve it.

This may work but it's not really considered a great solution. It uses a lot of system resources (RAM) and it's really slow. Instead, if you can, you should try to process the data as you retrieve it. For example, let's say you were trying to find the sum of some field in a table. Instead of storing all the results and then calculating the sum, you could have a running total. This will be faster and use a tiny fraction of the memory. I've seen a lot of confusion around this approach where people think you need to explicitly use a cursor to do this. That might explain where you are getting this idea from.

If you are using a some sort of tooling that takes you a step (or more) from interacting with the DB drivers, take a look at the documentation for things like 'paging', 'batching' or 'streaming' results.

If you really need all of these results in memory at one time and you simply don't have the space to store them in memory, make sure you aren't storing unnecessary information or storing it in an inefficient way. For example, if you have a lot of UUIDs in your data and they are in string format, you could convert them to UUID objects. A string uuid in the standard 36 char format should take up 76 bytes and a UUID object is more like 16 (plus a little overhead.) It might not seem like much but an 80%a 75% reduction adds up if you have a lot of UUIDs. If you have a lot of repeating values, make sure you aren't storing a separate object for each one. This can be a simple as a local HashMap or if you need to get fancy, you could use the Flyweight pattern.

I recently read about database cursors and how they are often presented as a magic solution for fetching large datasets. This seems like it could solve my exact problem.

This is a confused idea on a number of levels. It's a common misconception that using a cursor is a choice you make when running a query. In reality, there's always a cursor involved. The only choice you are making is whether you control it explicitly or not. When you query a large set of results, the JDBC library will fetch results in batches. The database will keep an open cursor on your results until you finish reading all the results (batches) or cancel the query.

But moreover, cursors have nothing to do with your issue and can't solve it. Cursors exist on the database. Your 'cursor' object Java in your program is just an abstract representation of that database object. It's not really 'the cursor'.

An OutOfMemoryError (OOME) is a client side issue. Specifically, it is thrown by a JVM when your program tries to allocate new Objects and there is not enough space in the heap even after a full garbage collection. In other words, the OOME is being caused be cause you are trying to store all the results in the heap and you either: don't have enough RAM or, your max heap size has been reached.

The simplest possible solution is to increase your max heap size. If you have enough available memory on your system, this should resolve it.

This may work but it's not really considered a great solution. It uses a lot of system resources (RAM) and it's really slow. Instead, if you can, you should try to process the data as you retrieve it. For example, let's say you were trying to find the sum of some field in a table. Instead of storing all the results and then calculating the sum, you could have a running total. This will be faster and use a tiny fraction of the memory. I've seen a lot of confusion around this approach where people think you need to explicitly use a cursor to do this. That might explain where you are getting this idea from.

If you are using a some sort of tooling that takes you a step (or more) from interacting with the DB drivers, take a look at the documentation for things like 'paging', 'batching' or 'streaming' results.

If you really need all of these results in memory at one time and you simply don't have the space to store them in memory, make sure you aren't storing unnecessary information or storing it in an inefficient way. For example, if you have a lot of UUIDs in your data and they are in string format, you could convert them to UUID objects. A string uuid in the standard 36 char format should take up 76 bytes and a UUID object is more like 16 (plus a little overhead.) It might not seem like much but an 80% reduction adds up if you have a lot of UUIDs. If you have a lot of repeating values, make sure you aren't storing a separate object for each one. This can be a simple as a local HashMap or if you need to get fancy, you could use the Flyweight pattern.

I recently read about database cursors and how they are often presented as a magic solution for fetching large datasets. This seems like it could solve my exact problem.

This is a confused idea on a number of levels. It's a common misconception that using a cursor is a choice you make when running a query. In reality, there's always a cursor involved. The only choice you are making is whether you control it explicitly or not. When you query a large set of results, the JDBC library will fetch results in batches. The database will keep an open cursor on your results until you finish reading all the results (batches) or cancel the query.

But moreover, cursors have nothing to do with your issue and can't solve it. Cursors exist on the database. Your 'cursor' object Java in your program is just an abstract representation of that database object. It's not really 'the cursor'.

An OutOfMemoryError (OOME) is a client-side issue. Specifically, it is thrown by a JVM when your program tries to allocate new Objects and there is not enough space in the heap even after a full garbage collection. In other words, the OOME is being caused because you are trying to store all the results in the heap and you either: don't have enough RAM or, your max heap size has been reached.

The simplest possible solution is to increase your max heap size. If you have enough available memory on your system, this should resolve it.

This may work but it's not really considered a great solution. It uses a lot of system resources (RAM) and it's really slow. Instead, if you can, you should try to process the data as you retrieve it. For example, let's say you were trying to find the sum of some field in a table. Instead of storing all the results and then calculating the sum, you could have a running total. This will be faster and use a tiny fraction of the memory. I've seen a lot of confusion around this approach where people think you need to explicitly use a cursor to do this. That might explain where you are getting this idea from.

If you are using a some sort of tooling that takes you a step (or more) from interacting with the DB drivers, take a look at the documentation for things like 'paging', 'batching' or 'streaming' results.

If you really need all of these results in memory at one time and you simply don't have the space to store them in memory, make sure you aren't storing unnecessary information or storing it in an inefficient way. For example, if you have a lot of UUIDs in your data and they are in string format, you could convert them to UUID objects. A string uuid in the standard 36 char format should take up 76 bytes and a UUID object is more like 16 (plus a little overhead.) It might not seem like much but a 75% reduction adds up if you have a lot of UUIDs. If you have a lot of repeating values, make sure you aren't storing a separate object for each one. This can be a simple as a local HashMap or if you need to get fancy, you could use the Flyweight pattern.

deleted 1 character in body

Source Link

edited Oct 17 at 13:13

JimmyJames

30.9k
3
59
110

I recently read about database cursors and how they are often presented as a magic solution for fetching large datasets. This seems like it could solve my exact problem.

This is a confused idea on a number of levels. It's a common misconception that using a cursor is a choice you make when running a query. In reality, there's always a cursor involved. The only choice you are making is whether you control it explicitly or not. When you query a large set of results, the JDBC library will fetch results in batches. The database will keep an open cursor on your results until you finish reading all the results (batches) or cancel the query.

But more overmoreover, cursors have nothing to do with your issue and can't solve it. Cursors exist on the database. Your 'cursor' object Java in your program is just an abstract representation of that database object. It's not really 'the cursor'.

An OutOfMemoryError (OOME) is a client side issue. Specifically, it is thrown by a JVM when your program tries to allocate new Objects and there is not enough space in the heap even after a full garbage collection. In other words, the OOME is being caused be cause you are trying to store all the results in the heap and you either: don't have enough RAM or, your max heap size has been reached.

The simplest possible solution is to increase your max heap size. If you have enough available memory on your system, this should resolve it.

This may work but it's not really considered a great solution. It uses a lot of system resources (RAM) and it's really slow. Instead, if you can, you should try to process the data as you retrieve it. For example, let's say you were trying to find the sum of some field in a table. Instead of storing all the results and then calculating the sum, you could have a running total. This will be faster and use a tiny fraction of the memory. I've seen a lot of confusion around this approach where people think you need to explicitly use a cursor to do this. That might explain where you are getting this idea from.

If you are using a some sort of tooling that takes you a step (or more) from interacting with the DB drivers, take a look at the documentation for things like 'paging', 'batching' or 'streaming' results.

If you really need all of these results in memory at one time and you simply don't have the space to store them in memory, make sure you aren't storing unnecessary information or storing it in an inefficient way. For example, if you have a lot of UUIDs in your data and they are in string format, you could convert them to UUID objects. A string uuid in the standard 36 char format should take up 76 bytes and a UUID object is more like 16 (plus a little overhead.) It might not seem like much but an 80% reduction adds up if you have a lot of UUIDs. If you have a lot of repeating values, make sure you aren't storing a separate object for each one. This can be a simple as a local HashMap or if you need to get fancy, you could use the Flyweight pattern.

I recently read about database cursors and how they are often presented as a magic solution for fetching large datasets. This seems like it could solve my exact problem.

This is a confused idea on a number of levels. It's a common misconception that using a cursor is a choice you make when running a query. In reality, there's always a cursor involved. The only choice you are making is whether you control it explicitly or not. When you query a large set of results, the JDBC library will fetch results in batches. The database will keep an open cursor on your results until you finish reading all the results (batches) or cancel the query.

But more over, cursors have nothing to do with your issue and can't solve it. Cursors exist on the database. Your 'cursor' object Java in your program is just an abstract representation of that database object. It's not really 'the cursor'.

An OutOfMemoryError (OOME) is a client side issue. Specifically, it is thrown by a JVM when your program tries to allocate new Objects and there is not enough space in the heap even after a full garbage collection. In other words, the OOME is being caused be cause you are trying to store all the results in the heap and you either: don't have enough RAM or, your max heap size has been reached.

The simplest possible solution is to increase your max heap size. If you have enough available memory on your system, this should resolve it.

This may work but it's not really considered a great solution. It uses a lot of system resources (RAM) and it's really slow. Instead, if you can, you should try to process the data as you retrieve it. For example, let's say you were trying to find the sum of some field in a table. Instead of storing all the results and then calculating the sum, you could have a running total. This will be faster and use a tiny fraction of the memory. I've seen a lot of confusion around this approach where people think you need to explicitly use a cursor to do this. That might explain where you are getting this idea from.

If you are using a some sort of tooling that takes you a step (or more) from interacting with the DB drivers, take a look at the documentation for things like 'paging', 'batching' or 'streaming' results.

If you really need all of these results in memory at one time and you simply don't have the space to store them in memory, make sure you aren't storing unnecessary information or storing it in an inefficient way. For example, if you have a lot of UUIDs in your data and they are in string format, you could convert them to UUID objects. A string uuid in the standard 36 char format should take up 76 bytes and a UUID object is more like 16 (plus a little overhead.) It might not seem like much but an 80% reduction adds up if you have a lot of UUIDs. If you have a lot of repeating values, make sure you aren't storing a separate object for each one. This can be a simple as a local HashMap or if you need to get fancy, you could use the Flyweight pattern.

I recently read about database cursors and how they are often presented as a magic solution for fetching large datasets. This seems like it could solve my exact problem.

This is a confused idea on a number of levels. It's a common misconception that using a cursor is a choice you make when running a query. In reality, there's always a cursor involved. The only choice you are making is whether you control it explicitly or not. When you query a large set of results, the JDBC library will fetch results in batches. The database will keep an open cursor on your results until you finish reading all the results (batches) or cancel the query.

But moreover, cursors have nothing to do with your issue and can't solve it. Cursors exist on the database. Your 'cursor' object Java in your program is just an abstract representation of that database object. It's not really 'the cursor'.

An OutOfMemoryError (OOME) is a client side issue. Specifically, it is thrown by a JVM when your program tries to allocate new Objects and there is not enough space in the heap even after a full garbage collection. In other words, the OOME is being caused be cause you are trying to store all the results in the heap and you either: don't have enough RAM or, your max heap size has been reached.

The simplest possible solution is to increase your max heap size. If you have enough available memory on your system, this should resolve it.

This may work but it's not really considered a great solution. It uses a lot of system resources (RAM) and it's really slow. Instead, if you can, you should try to process the data as you retrieve it. For example, let's say you were trying to find the sum of some field in a table. Instead of storing all the results and then calculating the sum, you could have a running total. This will be faster and use a tiny fraction of the memory. I've seen a lot of confusion around this approach where people think you need to explicitly use a cursor to do this. That might explain where you are getting this idea from.

If you are using a some sort of tooling that takes you a step (or more) from interacting with the DB drivers, take a look at the documentation for things like 'paging', 'batching' or 'streaming' results.

If you really need all of these results in memory at one time and you simply don't have the space to store them in memory, make sure you aren't storing unnecessary information or storing it in an inefficient way. For example, if you have a lot of UUIDs in your data and they are in string format, you could convert them to UUID objects. A string uuid in the standard 36 char format should take up 76 bytes and a UUID object is more like 16 (plus a little overhead.) It might not seem like much but an 80% reduction adds up if you have a lot of UUIDs. If you have a lot of repeating values, make sure you aren't storing a separate object for each one. This can be a simple as a local HashMap or if you need to get fancy, you could use the Flyweight pattern.

added 764 characters in body

Source Link

edited Oct 16 at 23:03

JimmyJames

30.9k
3
59
110

I recently read about database cursors and how they are often presented as a magic solution for fetching large datasets. This seems like it could solve my exact problem.

This is a confused idea on a number of levels. It's a common misconception that using a cursor is a choice you make when running a query. In reality, there's always a cursor involved. The only choice you are making is whether you control it explicitly or not. When you query a large set of results, the JDBC library will fetch results in batches. The database will keep an open cursor on your results until you finish reading all the results (batches) or cancel the query.

But more over, cursors have nothing to do with your issue and can't solve it. Cursors exist on the database. Your 'cursor' object Java in your program is just an abstract representation of that database object. It's not really 'the cursor'.

An OutOfMemoryError (OOME) is a client side issue. Specifically, it is thrown by a JVM when your program tries to allocate new Objects and there is not enough space in the heap even after a full garbage collection. In other words, the OOME is being caused be cause you are trying to store all the results in the heap and you either: don't have enough RAM or, your max heap size has been reached.

The simplest possible solution is to increase your max heap size. If you have enough available memory on your system, this should resolve it.

This may work but it's not really considered a great solution. It uses a lot of system resources (RAM) and it's really slow. Instead, if you can, you should try to process the data as you retrieve it. For example, let's say you were trying to find the sum of some field in a table. Instead of storing all the results and then calculating the sum, you could have a running total. This will be faster and use a tiny fraction of the memory. I've seen a lot of confusion around this approach where people think you need to explicitly use a cursor to do this. That might explain where you are getting this idea from.

If you are using a some sort of tooling that takes you a step (or more) from interacting with the DB drivers, take a look at the documentation for things like 'paging', 'batching' or 'streaming' results.

If you really need all of these results in memory at one time and you simply don't have the space to store them in memory, make sure you aren't storing unnecessary information or storing it in an inefficient way. For example, if you have a lot of UUIDs in your data and they are in string format, you could convert them to UUID objects. A string uuid in the standard 36 char format should take up 76 bytes and a UUID object is more like 16 (plus a little overhead.) It might not seem like much but an 80% reduction adds up if you have a lot of UUIDs. If you have a lot of repeating values, make sure you aren't storing a separate object for each one. This can be a simple as a local HashMap or if you need to get fancy, you could use the Flyweight pattern.

I recently read about database cursors and how they are often presented as a magic solution for fetching large datasets. This seems like it could solve my exact problem.

This is a confused idea on a number of levels. It's a common misconception that using a cursor is a choice you make when running a query. In reality, there's always a cursor involved. The only choice you are making is whether you control it explicitly or not. When you query a large set of results, the JDBC library will fetch results in batches. The database will keep an open cursor on your results until you finish reading all the results (batches) or cancel the query.

But more over, cursors have nothing to do with your issue and can't solve it. Cursors exist on the database. Your 'cursor' object Java in your program is just an abstract representation of that database object. It's not really 'the cursor'.

An OutOfMemoryError (OOME) is a client side issue. Specifically, it is thrown by a JVM when your program tries to allocate new Objects and there is not enough space in the heap even after a full garbage collection. In other words, the OOME is being caused be cause you are trying to store all the results in the heap and you either: don't have enough RAM or, your max heap size has been reached.

The simplest possible solution is to increase your max heap size. If you have enough available memory on your system, this should resolve it.

This may work but it's not really considered a great solution. It uses a lot of system resources (RAM) and it's really slow. Instead, if you can, you should try to process the data as you retrieve it. For example, let's say you were trying to find the sum of some field in a table. Instead of storing all the results and then calculating the sum, you could have a running total. This will be faster and use a tiny fraction of the memory. I've seen a lot of confusion around this approach where people think you need to explicitly use a cursor to do this. That might explain where you are getting this idea from.

If you are using a some sort of tooling that takes you a step (or more) from interacting with the DB drivers, take a look at the documentation for things like 'paging', 'batching' or 'streaming' results.

If you really need all of these results in memory at one time and you simply don't have the space to store them in memory, make sure you aren't storing unnecessary information or storing it in an inefficient way. For example, if you have a lot of UUIDs in your data and they are in string format, you could convert them to UUID objects. A string uuid in the standard 36 char format should take up 76 bytes and a UUID object is more like 16 (plus a little overhead.) It might not seem like much but an 80% reduction adds up if you have a lot of UUIDs.

I recently read about database cursors and how they are often presented as a magic solution for fetching large datasets. This seems like it could solve my exact problem.

This is a confused idea on a number of levels. It's a common misconception that using a cursor is a choice you make when running a query. In reality, there's always a cursor involved. The only choice you are making is whether you control it explicitly or not. When you query a large set of results, the JDBC library will fetch results in batches. The database will keep an open cursor on your results until you finish reading all the results (batches) or cancel the query.

But more over, cursors have nothing to do with your issue and can't solve it. Cursors exist on the database. Your 'cursor' object Java in your program is just an abstract representation of that database object. It's not really 'the cursor'.

An OutOfMemoryError (OOME) is a client side issue. Specifically, it is thrown by a JVM when your program tries to allocate new Objects and there is not enough space in the heap even after a full garbage collection. In other words, the OOME is being caused be cause you are trying to store all the results in the heap and you either: don't have enough RAM or, your max heap size has been reached.

The simplest possible solution is to increase your max heap size. If you have enough available memory on your system, this should resolve it.

This may work but it's not really considered a great solution. It uses a lot of system resources (RAM) and it's really slow. Instead, if you can, you should try to process the data as you retrieve it. For example, let's say you were trying to find the sum of some field in a table. Instead of storing all the results and then calculating the sum, you could have a running total. This will be faster and use a tiny fraction of the memory. I've seen a lot of confusion around this approach where people think you need to explicitly use a cursor to do this. That might explain where you are getting this idea from.

If you are using a some sort of tooling that takes you a step (or more) from interacting with the DB drivers, take a look at the documentation for things like 'paging', 'batching' or 'streaming' results.

If you really need all of these results in memory at one time and you simply don't have the space to store them in memory, make sure you aren't storing unnecessary information or storing it in an inefficient way. For example, if you have a lot of UUIDs in your data and they are in string format, you could convert them to UUID objects. A string uuid in the standard 36 char format should take up 76 bytes and a UUID object is more like 16 (plus a little overhead.) It might not seem like much but an 80% reduction adds up if you have a lot of UUIDs. If you have a lot of repeating values, make sure you aren't storing a separate object for each one. This can be a simple as a local HashMap or if you need to get fancy, you could use the Flyweight pattern.

added 764 characters in body

Source Link

edited Oct 16 at 22:53

JimmyJames

30.9k
3
59
110

Loading

Source Link

answered Oct 16 at 19:33

JimmyJames

30.9k
3
59
110

Loading

Stack Exchange Network

Return to Answer