2,854 questions
Advice
2
votes
3
replies
76
views
Check correctness of a web client against multiple independent sources
Context
I am developing a high-stakes application that uses Byzantine State Machine Replication server-side, distributing a copy of the application state to multiple independent servers in such a way ...
0
votes
0
answers
80
views
Ray scheduler isn't getting assigned any tasks/actors
I have a Ray cluster, I run it through these commands:
Scheduler:
nohup ray start --head \
--node-ip-address=<HOST_IP> \
--port=6379 \
--resources='{"is_worker": 1}'
--...
3
votes
1
answer
89
views
Distributed computing with Ray/Python has no parallelism at all on GCP
I am trying to setup a distributed environment for NLP processing. I use Ray and Python on GCP. I have a master and several workers. What happens is that when I run 1 worker or 8 workers, it takes ...
1
vote
0
answers
37
views
ChunkInTransaction error when using mr function to write to partitioned table with different partition scheme
I have a minute-level K-line table with a large amount of data, partitioned by date and stock code using value partitioning. Now I need to calculate a daily factor - computing the return skewness for ...
Advice
0
votes
2
replies
90
views
How to make devices discover each other using WIFI
What's the best way to allow programs to discover each other on the network?
Let's say we are writing a system that tracks the usage of computers over the network.
We have an agent program that sends ...
Advice
2
votes
2
replies
91
views
Efficient MPI Parallelization Strategies for Localized PDE Subproblems within a Globally Decomposed Domain
I am working on a global PDE problem that is solved using a standard domain-decomposition strategy (e.g., Scotch, METIS). This part of the computation is well balanced across all MPI processes.
...
0
votes
1
answer
48
views
Upsert! Operation Throws "A table can't contain duplicate column names" Error
I have a base table A and a result table B in DolphinDB. Table B was initially empty and is used to store calculated results based on table A. When trying to insert the calculated results into table B,...
0
votes
0
answers
237
views
vLLM + Ray multi-node tensor-parallel deployment completely blocked by pending placement groups and raylet heartbeat failures
Environment:
Ray version: 2.x
vLLM version: 0.9.2
Python version: 3.9
OS / Container base: Linux (CentOS-based UBI8 in Kubernetes)
Cloud / Infrastructure: AWS based Kubernetes cluster (pods scheduled ...
3
votes
1
answer
152
views
In Apache Ignite the Replication mode and Partition mode does not work all together
I’m working with Apache Ignite 2.17.0. I load database tables into Ignite caches and run SQL queries using the SQLFieldsQuery API.
Recently, I modified the cache configuration for some tables to use ...
0
votes
0
answers
67
views
Get two different nodes to access and distribute the same SQL table in Apache spark?
I have the following code to test. I created a table on worker 1. Then I tried to read the table on worker 2 and it got TABLE_OR_VIEW_NOT_FOUND. Worker 2 is in the some computer as Master.
I ran the ...
3
votes
2
answers
357
views
How Ray async actors handle calls to sync methods
I'm working with Ray async actors and I want to understand exactly what happens—at a deep technical level—when a synchronous method is called on such an actor.
I know that calling a synchronous method ...
0
votes
0
answers
211
views
How to set up MS-MPI multi-machine communication between two Windows 11 systems?
I'm trying to set up a multi-machine communication environment using MS-MPI on two Windows 11 laptops, but I'm encountering some issues. Here are the details of my setup:
Environment Details:
...
1
vote
1
answer
193
views
Distributed REST API Calls using SPARK with maintaining consistency
I have a Spark DataFrame created from a Delta table, with one column of type STRUCT(JSON). For each row in this DataFrame, I need to make a REST API call using the JSON payload in the column. ...
0
votes
1
answer
1k
views
Clearing Cached Data on Databricks Cluster
The problem I am facing is that my "used" memory is only around 16GB, however the cached memory takes up so much space, that I am forced to use a compute with higher memory (64GB).
So I ...
1
vote
0
answers
111
views
Segmentation Fault During Validation with MirroredStrategy on Multiple GPUs
I am training a model using TensorFlow 2.18.0 with the tf.distribute.MirroredStrategy across two GPUs. The training works fine on a single GPU, but when I try to run it on two GPUs, it ends with a ...