Newest 'distributed-computing' Questions

Advice

2 votes

3 replies

76 views

Check correctness of a web client against multiple independent sources

Context I am developing a high-stakes application that uses Byzantine State Machine Replication server-side, distributing a copy of the application state to multiple independent servers in such a way ...

Matteo Monti

9,150

asked May 9 at 23:09

0 votes

0 answers

80 views

Ray scheduler isn't getting assigned any tasks/actors

I have a Ray cluster, I run it through these commands: Scheduler: nohup ray start --head \ --node-ip-address=<HOST_IP> \ --port=6379 \ --resources='{"is_worker": 1}' --...

cuneyttyler

1,424

asked Mar 2 at 16:39

3 votes

1 answer

89 views

Distributed computing with Ray/Python has no parallelism at all on GCP

I am trying to setup a distributed environment for NLP processing. I use Ray and Python on GCP. I have a master and several workers. What happens is that when I run 1 worker or 8 workers, it takes ...

cuneyttyler

1,424

asked Mar 1 at 0:43

1 vote

0 answers

37 views

ChunkInTransaction error when using mr function to write to partitioned table with different partition scheme

I have a minute-level K-line table with a large amount of data, partitioned by date and stock code using value partitioning. Now I need to calculate a daily factor - computing the return skewness for ...

Jane

59

asked Feb 6 at 1:55

Advice

0 votes

2 replies

90 views

How to make devices discover each other using WIFI

What's the best way to allow programs to discover each other on the network? Let's say we are writing a system that tracks the usage of computers over the network. We have an agent program that sends ...

Isembart

13

asked Dec 10, 2025 at 20:02

Advice

2 votes

2 replies

91 views

Efficient MPI Parallelization Strategies for Localized PDE Subproblems within a Globally Decomposed Domain

I am working on a global PDE problem that is solved using a standard domain-decomposition strategy (e.g., Scotch, METIS). This part of the computation is well balanced across all MPI processes. ...

hrx71

1

asked Dec 6, 2025 at 12:46

0 votes

1 answer

48 views

Upsert! Operation Throws "A table can't contain duplicate column names" Error

I have a base table A and a result table B in DolphinDB. Table B was initially empty and is used to store calculated results based on table A. When trying to insert the calculated results into table B,...

RORO

1

asked Oct 24, 2025 at 9:52

0 votes

0 answers

237 views

vLLM + Ray multi-node tensor-parallel deployment completely blocked by pending placement groups and raylet heartbeat failures

Environment: Ray version: 2.x vLLM version: 0.9.2 Python version: 3.9 OS / Container base: Linux (CentOS-based UBI8 in Kubernetes) Cloud / Infrastructure: AWS based Kubernetes cluster (pods scheduled ...

NullUser

9

asked Aug 5, 2025 at 17:38

3 votes

1 answer

152 views

In Apache Ignite the Replication mode and Partition mode does not work all together

I’m working with Apache Ignite 2.17.0. I load database tables into Ignite caches and run SQL queries using the SQLFieldsQuery API. Recently, I modified the cache configuration for some tables to use ...

kushal Baldev

809

asked Jul 29, 2025 at 17:31

0 votes

0 answers

67 views

Get two different nodes to access and distribute the same SQL table in Apache spark?

I have the following code to test. I created a table on worker 1. Then I tried to read the table on worker 2 and it got TABLE_OR_VIEW_NOT_FOUND. Worker 2 is in the some computer as Master. I ran the ...

Rick C. Ferreira

1

asked Jun 16, 2025 at 19:25

3 votes

2 answers

357 views

How Ray async actors handle calls to sync methods

I'm working with Ray async actors and I want to understand exactly what happens—at a deep technical level—when a synchronous method is called on such an actor. I know that calling a synchronous method ...

hegash

893

asked May 26, 2025 at 11:00

0 votes

0 answers

211 views

How to set up MS-MPI multi-machine communication between two Windows 11 systems?

I'm trying to set up a multi-machine communication environment using MS-MPI on two Windows 11 laptops, but I'm encountering some issues. Here are the details of my setup: Environment Details: ...

user29094781

1

asked Apr 5, 2025 at 6:29

1 vote

1 answer

193 views

Distributed REST API Calls using SPARK with maintaining consistency

I have a Spark DataFrame created from a Delta table, with one column of type STRUCT(JSON). For each row in this DataFrame, I need to make a REST API call using the JSON payload in the column. ...

uds0128

53

asked Mar 2, 2025 at 18:42

0 votes

1 answer

1k views

Clearing Cached Data on Databricks Cluster

The problem I am facing is that my "used" memory is only around 16GB, however the cached memory takes up so much space, that I am forced to use a compute with higher memory (64GB). So I ...

Manav Karthikeyan

53

asked Jan 17, 2025 at 14:31

1 vote

0 answers

111 views

Segmentation Fault During Validation with MirroredStrategy on Multiple GPUs

I am training a model using TensorFlow 2.18.0 with the tf.distribute.MirroredStrategy across two GPUs. The training works fine on a single GPU, but when I try to run it on two GPUs, it ends with a ...

TGD

56

asked Jan 13, 2025 at 7:42

Collectives™ on Stack Overflow

Check correctness of a web client against multiple independent sources

Ray scheduler isn't getting assigned any tasks/actors

Distributed computing with Ray/Python has no parallelism at all on GCP

ChunkInTransaction error when using mr function to write to partitioned table with different partition scheme

How to make devices discover each other using WIFI

Efficient MPI Parallelization Strategies for Localized PDE Subproblems within a Globally Decomposed Domain

Upsert! Operation Throws "A table can't contain duplicate column names" Error

vLLM + Ray multi-node tensor-parallel deployment completely blocked by pending placement groups and raylet heartbeat failures

In Apache Ignite the Replication mode and Partition mode does not work all together

Get two different nodes to access and distribute the same SQL table in Apache spark?

How Ray async actors handle calls to sync methods

How to set up MS-MPI multi-machine communication between two Windows 11 systems?

Distributed REST API Calls using SPARK with maintaining consistency

Clearing Cached Data on Databricks Cluster

Segmentation Fault During Validation with MirroredStrategy on Multiple GPUs

Hot Network Questions