The Wayback Machine - https://web.archive.org/web/20230407122150/https://github.com/apache/spark
Skip to content

apache/spark

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

### What changes were proposed in this pull request?
This pr aims to upgrade rocksdbjni from 7.10.2 to 8.0.0.

### Why are the changes needed?
This version bring some bug fix about `Get` and `MultiGet `, the full release notes as follows:

- https://github.com/facebook/rocksdb/releases/tag/v8.0.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GitHub Actions
- Manual test `RocksDBBenchmark`:

**7.10.2**

```
[INFO] Running org.apache.spark.util.kvstore.RocksDBBenchmark
                                                count   mean    min     max     95th
dbClose                                         4       0.362   0.307   0.510   0.510
dbCreation                                      4       70.556  3.823   272.036 272.036
naturalIndexCreateIterator                      1024    0.005   0.002   1.396   0.007
naturalIndexDescendingCreateIterator            1024    0.007   0.007   0.063   0.009
naturalIndexDescendingIteration                 1024    0.006   0.004   0.236   0.009
naturalIndexIteration                           1024    0.006   0.004   0.054   0.010
randomDeleteIndexed                             1024    0.028   0.019   0.246   0.038
randomDeletesNoIndex                            1024    0.014   0.012   0.041   0.018
randomUpdatesIndexed                            1024    0.084   0.033   30.028  0.095
randomUpdatesNoIndex                            1024    0.033   0.029   0.759   0.037
randomWritesIndexed                             1024    0.120   0.034   54.254  0.124
randomWritesNoIndex                             1024    0.038   0.032   1.918   0.043
refIndexCreateIterator                          1024    0.004   0.004   0.017   0.006
refIndexDescendingCreateIterator                1024    0.003   0.003   0.027   0.004
refIndexDescendingIteration                     1024    0.007   0.005   0.114   0.009
refIndexIteration                               1024    0.007   0.005   0.045   0.010
sequentialDeleteIndexed                         1024    0.024   0.018   1.944   0.028
sequentialDeleteNoIndex                         1024    0.015   0.012   0.039   0.019
sequentialUpdatesIndexed                        1024    0.044   0.036   0.910   0.057
sequentialUpdatesNoIndex                        1024    0.037   0.032   0.868   0.046
sequentialWritesIndexed                         1024    0.047   0.040   2.261   0.056
sequentialWritesNoIndex                         1024    0.041   0.033   3.577   0.045
```

**8.0.0**

```
[INFO] Running org.apache.spark.util.kvstore.RocksDBBenchmark
                                                count   mean    min     max     95th
dbClose                                         4       0.320   0.233   0.562   0.562
dbCreation                                      4       71.171  3.778   272.587 272.587
naturalIndexCreateIterator                      1024    0.006   0.002   1.460   0.009
naturalIndexDescendingCreateIterator            1024    0.007   0.006   0.063   0.008
naturalIndexDescendingIteration                 1024    0.008   0.004   0.377   0.013
naturalIndexIteration                           1024    0.006   0.004   0.060   0.010
randomDeleteIndexed                             1024    0.030   0.020   0.338   0.052
randomDeletesNoIndex                            1024    0.016   0.013   0.050   0.020
randomUpdatesIndexed                            1024    0.087   0.032   29.873  0.096
randomUpdatesNoIndex                            1024    0.036   0.032   0.592   0.041
randomWritesIndexed                             1024    0.121   0.033   54.702  0.123
randomWritesNoIndex                             1024    0.040   0.034   1.530   0.047
refIndexCreateIterator                          1024    0.005   0.003   0.023   0.007
refIndexDescendingCreateIterator                1024    0.003   0.003   0.026   0.005
refIndexDescendingIteration                     1024    0.007   0.005   0.051   0.009
refIndexIteration                               1024    0.007   0.005   0.052   0.010
sequentialDeleteIndexed                         1024    0.021   0.017   0.133   0.025
sequentialDeleteNoIndex                         1024    0.015   0.012   0.041   0.018
sequentialUpdatesIndexed                        1024    0.046   0.036   2.035   0.055
sequentialUpdatesNoIndex                        1024    0.040   0.028   0.798   0.050
sequentialWritesIndexed                         1024    0.049   0.042   2.578   0.055
sequentialWritesNoIndex                         1024    0.035   0.029   3.229   0.039
```

- Checked core module UTs with rocksdb live ui

```
export LIVE_UI_LOCAL_STORE_DIR=/${basedir}/spark-ui
build/mvn clean install -pl core -am -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest -fn
```

All test passed

Closes #40639 from LuciferYang/SPARK-43007.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
30029c9

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
R
March 22, 2023 10:20

Apache Spark

Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

https://spark.apache.org/

GitHub Actions Build AppVeyor Build PySpark Coverage PyPI Downloads

Online Documentation

You can find the latest Spark documentation, including a programming guide, on the project web page. This README file only contains basic setup instructions.

Building Spark

Spark is built using Apache Maven. To build Spark and its example programs, run:

./build/mvn -DskipTests clean package

(You do not need to do this if you downloaded a pre-built package.)

More detailed documentation is available from the project site, at "Building Spark".

For general development tips, including info on developing Spark using an IDE, see "Useful Developer Tools".

Interactive Scala Shell

The easiest way to start using Spark is through the Scala shell:

./bin/spark-shell

Try the following command, which should return 1,000,000,000:

scala> spark.range(1000 * 1000 * 1000).count()

Interactive Python Shell

Alternatively, if you prefer Python, you can use the Python shell:

./bin/pyspark

And run the following command, which should also return 1,000,000,000:

>>> spark.range(1000 * 1000 * 1000).count()

Example Programs

Spark also comes with several sample programs in the examples directory. To run one of them, use ./bin/run-example <class> [params]. For example:

./bin/run-example SparkPi

will run the Pi example locally.

You can set the MASTER environment variable when running examples to submit examples to a cluster. This can be a mesos:// or spark:// URL, "yarn" to run on YARN, and "local" to run locally with one thread, or "local[N]" to run locally with N threads. You can also use an abbreviated class name if the class is in the examples package. For instance:

MASTER=spark://host:7077 ./bin/run-example SparkPi

Many of the example programs print usage help if no params are given.

Running Tests

Testing first requires building Spark. Once Spark is built, tests can be run using:

./dev/run-tests

Please see the guidance on how to run tests for a module, or individual tests.

There is also a Kubernetes integration test, see resource-managers/kubernetes/integration-tests/README.md

A Note About Hadoop Versions

Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported storage systems. Because the protocols have changed in different versions of Hadoop, you must build Spark against the same version that your cluster runs.

Please refer to the build documentation at "Specifying the Hadoop Version and Enabling YARN" for detailed guidance on building for a particular distribution of Hadoop, including building for particular Hive and Hive Thriftserver distributions.

Configuration

Please refer to the Configuration Guide in the online documentation for an overview on how to configure Spark.

Contributing

Please review the Contribution to Spark guide for information on how to get started contributing to the project.