The Wayback Machine - https://web.archive.org/web/20220829025438/https://github.com/apache/arrow/commits
Skip to content
Permalink
master
Switch branches/tags

Commits on Aug 28, 2022

  1. ARROW-17536: [Packaging][RPM][Gandiva] Fix build error on CentOS Stre…

    …am 9 (#13984)
    
    LLVM is built with gcc-toolset-12 and it can't be used with the
    default g++. We also need to use gcc-toolset-12.
    
    Error message:
    
        /usr/bin/ld: .../libgandiva.so.1000: undefined reference to
        `std::__glibcxx_assert_fail(char const*, int, char const*, char const*)'
    
    Authored-by: Sutou Kouhei <kou@clear-code.com>
    Signed-off-by: Sutou Kouhei <kou@clear-code.com>
    kou committed Aug 28, 2022

Commits on Aug 27, 2022

  1. ARROW-17463: [R] Avoid unnecessary projections (#13954)

    Before:
    
    ```
    > mtcars |> arrow_table() |> count(cyl) |> explain()
    ExecPlan with 6 nodes:
    5:SinkNode{}
      4:ProjectNode{projection=[cyl, n]}
        3:ProjectNode{projection=[cyl, n]}
          2:GroupByNode{keys=["cyl"], aggregates=[
          	hash_sum(n, {skip_nulls=true, min_count=1}),
          ]}
            1:ProjectNode{projection=["n": 1, cyl]}
              0:TableSourceNode{}
    ```
    
    After:
    
    ```
    ExecPlan with 5 nodes:
    4:SinkNode{}
      3:ProjectNode{projection=[cyl, n]}
        2:GroupByNode{keys=["cyl"], aggregates=[
        	hash_sum(n, {skip_nulls=true, min_count=1}),
        ]}
          1:ProjectNode{projection=["n": 1, cyl]}
            0:TableSourceNode{}
    ```
    
    Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
    Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
    nealrichardson committed Aug 27, 2022

Commits on Aug 26, 2022

  1. ARROW-17453: [Go][C++][Parquet] Inconsistent Data with Repetition Lev…

    …els (#13982)
    
    Both the C++ and Go parquet implementations assumed that if the max repetition level was 0, that there were no bytes to be skipped when initializing decoders for `DataPageV2` but the Parquet files produced by Athena in this case had repetition bytes to be skipped before getting the definition level bytes. Since the byte wasn't skipped, the wrong values were decoded for Definition levels.
    
    In the case of the Go implementation, it made additional assumptions that proved to be incorrect on top of the same bug.
    
    This fixes both of them to properly respect the repetition level byte length reported in the DataPageV2 header.
    
    Authored-by: Matt Topol <zotthewizard@gmail.com>
    Signed-off-by: Matt Topol <zotthewizard@gmail.com>
    zeroshade committed Aug 26, 2022
  2. ARROW-17247: [C++][Docs] Include visibilty to ExecPlan APIs in Acero …

    …Docs (#13741)
    
    This PR includes a documentation update to visibility of `exec.h` components. 
    
    Authored-by: Vibhatha Abeykoon <vibhatha@gmail.com>
    Signed-off-by: Weston Pace <weston.pace@gmail.com>
    vibhatha committed Aug 26, 2022
  3. ARROW-16340: [C++][Python] Move all Python related code into PyArrow (#…

    …13311)
    
    This PR moves `src/arrow/python` directory into `pyarrow` and arranges PyArrow to build it. The build on the Python side is made in two steps:
    
    1. `_run_cmake_pyarrow_cpp()` where the C++ part of the pyarrow is build first (the part that was moved in the refactoring)
    2. `_run_cmake()` where pyarrow is built as before
    
    No changes are needed in the build process from the user side to successfully build pyarrow after this refactoring. The test for PyArrow CPP will however be moved into Cython and can currently be run with:
    
    ```shell
    >>> pushd python/build/dist/temp 
    >>> ctest
    ```
    
    Lead-authored-by: Alenka Frim <frim.alenka@gmail.com>
    Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
    Signed-off-by: Sutou Kouhei <kou@clear-code.com>
    AlenkaF and jorisvandenbossche committed Aug 26, 2022

Commits on Aug 25, 2022

  1. ARROW-17455: [Go] Function and Kernel execution architecture (#13964)

    In addition to implementing the function execution architecture, this also shifts some files around for better package naming and scoping. A new package `exec` is created inside of `compute/internal/` to make it easier to scope other internal-only methods / functionality such as the kernel implementations themselves.
    
    This is a fairly large change because so much is interconnected, but it also includes extensive tests covering ~85% of the `internal/exec` package and nearly 80% of the entire `compute` package. More tests will be added as functionality is added.
    
    Authored-by: Matt Topol <zotthewizard@gmail.com>
    Signed-off-by: Matt Topol <zotthewizard@gmail.com>
    zeroshade committed Aug 25, 2022
  2. MINOR: [CI][C++] Update parquet-testing submodule (#13968)

    I've realized there were some changes on the parquet-testing submodule. This PR updates it to the latest version.
    
    Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
    Signed-off-by: Antoine Pitrou <antoine@python.org>
    raulcd committed Aug 25, 2022
  3. ARROW-17518: [CI][Doc][Python] Update glob to detect arrow developmen…

    …t version from git (#13966)
    
    Reproduced locally:
    ```
    $ git describe --dirty --tags --long --match "apache-arrow-[0-9].*"
    apache-arrow-9.0.0.dev-641-g0d5bb92-dirty
    $ git describe --dirty --tags --long --match "apache-arrow-[0-9]*.*"
    apache-arrow-10.0.0.dev-114-g0d5bb92-dirty
    ```
    
    Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
    Signed-off-by: Antoine Pitrou <antoine@python.org>
    raulcd committed Aug 25, 2022
  4. ARROW-15277: [C++][Python] Use ChunkedArray::Make for chunked_array (#…

    …13950)
    
    Supersedes and will close #12096 
    
    [ARROW-15277](https://issues.apache.org/jira/browse/ARROW-15277)
    
    Lead-authored-by: Miles Granger <miles59923@gmail.com>
    Co-authored-by: Eduardo Ponce <edponce00@gmail.com>
    Co-authored-by: Antoine Pitrou <antoine@python.org>
    Signed-off-by: Antoine Pitrou <antoine@python.org>
    3 people committed Aug 25, 2022
  5. ARROW-17433: [CI][C++] Use Visual Studio 2019 on AppVeyor (#13903)

    We can use /external:I for Boost to suppress warnings from Boost
    with Visual Studio 2019 or later.
    
    
    Lead-authored-by: Sutou Kouhei <kou@clear-code.com>
    Co-authored-by: Antoine Pitrou <antoine@python.org>
    Signed-off-by: Sutou Kouhei <kou@clear-code.com>
    kou and pitrou committed Aug 25, 2022
  6. ARROW-17511: [C++] Add support for xsimd 9.0.0 (#13958)

    Authored-by: Sutou Kouhei <kou@clear-code.com>
    Signed-off-by: Sutou Kouhei <kou@clear-code.com>
    kou committed Aug 25, 2022

Commits on Aug 24, 2022

  1. ARROW-17434: [Java][CI] Add build Windows support for Java (#13918)

    Authored-by: david dali susanibar arce <davi.sarces@gmail.com>
    Signed-off-by: Sutou Kouhei <kou@clear-code.com>
    davisusanibar committed Aug 24, 2022
  2. MINOR: [CI][C++] Update testing submodule (#13963)

    PR #13781 moved back the testing submodule to an old changeset by mistake, which subsequently broke multiple CI builds.
    
    (added a dummy C++ change to trigger CI)
    
    Authored-by: Antoine Pitrou <antoine@python.org>
    Signed-off-by: Antoine Pitrou <antoine@python.org>
    pitrou committed Aug 24, 2022
  3. ARROW-16949: [Doc] Add Glossary to the New Contributor's Guide (#13951)

    Authored-by: Alenka Frim <frim.alenka@gmail.com>
    Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
    AlenkaF committed Aug 24, 2022
  4. ARROW-17500: [Go] Kernel and KernelContext interfaces (#13946)

    This implements the interface for Kernels along with a `KernelSignature` struct and the type matching for `InputType` and `OutputType` via a `TypeMatcher` interface. Along with tests for all of the above.
    
    Authored-by: Matt Topol <zotthewizard@gmail.com>
    Signed-off-by: Matt Topol <zotthewizard@gmail.com>
    zeroshade committed Aug 24, 2022
  5. ARROW-17431: [Java] MapBinder to bind Arrow Map type to DB column (#1…

    …3941)
    
    Typical real life Arrow datasets contain map of primitive type. This PR introduce MapBinder mapping of primitive types map entries
    
    Authored-by: igor.suhorukov <igor.suhorukov@gmail.com>
    Signed-off-by: David Li <li.davidm96@gmail.com>
    igor-suhorukov committed Aug 24, 2022
  6. ARROW-17510: [CI][C++][Windows][MSVC] Use ccache (#13957)

    ccache supports MSVC since 4.6.0:
    
    https://ccache.dev/releasenotes.html#_ccache_4_6
    
    > Added support for caching calls to Microsoft Visual C++ (MSVC) and
    > clang-cl (MSVC compatibility for Clang).
    
    * no ccache: 22m23s: https://github.com/apache/arrow/runs/7983003808
    * not cached: 39m35s: https://github.com/kou/arrow/runs/7984875241
    * cached: 9m31s: https://github.com/kou/arrow/runs/7985401473
    
    Authored-by: Sutou Kouhei <kou@clear-code.com>
    Signed-off-by: Antoine Pitrou <antoine@python.org>
    kou committed Aug 24, 2022
  7. MINOR: [C++] Fix StringFormatter type error in localfs_benchmark (#13932

    )
    
    since size_t is passed as the first argument to the StringFormatter, it
    needs to be templated with Int64Type instead of Int32Type
    
    Lead-authored-by: Aldrin M <octalene.dev@pm.me>
    Co-authored-by: Aldrin Montana <octalene.dev@pm.me>
    Signed-off-by: Antoine Pitrou <antoine@python.org>
    drin committed Aug 24, 2022
  8. ARROW-17322: [Docs] Documenting issue lifecycle for bugs and feature …

    …requests (#13781)
    
    These changes aim to document the issue lifecycle - how to create new issues (largely already existed) as well as how to engage with and understand issue state later on. This also serves to document proposed processes:
    
    1. Assigned issues idle more than 90 days may be unassigned (discussed and approved on ML)
    2. Unassigned issues in "In Progress" status should be set to "Open"
    3. Establishing consistent usage of Status and Resolution fields.
    
    Authored-by: Todd Farmer <todd@fivefarmers.com>
    Signed-off-by: Antoine Pitrou <antoine@python.org>
    toddfarmer committed Aug 24, 2022
  9. ARROW-16141: [R] Update rhub/fedora-clang-devel for upstreamed changes (

    #12824)
    
    In ARROW-15857 (#12734) we fixed the nightly failures on rhub/fedora-clang-devel by a kludge modifying the default makefile, but also upstreamed the fixes (rstudio/sass#104 and r-hub/rhub-linux-builders#60). These upstreams are now both released, so we can remove the kludge from modification of the docker image.
    
    Lead-authored-by: Dewey Dunnington <dewey@voltrondata.com>
    Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
    Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>
    paleolimbot and paleolimbot committed Aug 24, 2022
  10. MINOR: [R] Remove trailing whitespace in flight tests (#13952)

    After #13267 there is some trailing whitespace left over in one of the files.
    
    Authored-by: Dewey Dunnington <dewey@voltrondata.com>
    Signed-off-by: Sutou Kouhei <kou@clear-code.com>
    paleolimbot committed Aug 24, 2022

Commits on Aug 23, 2022

  1. ARROW-17494: [C++] Fix substrait tests linkage on static builds (#13939)

    Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
    Signed-off-by: Sutou Kouhei <kou@clear-code.com>
    raulcd committed Aug 23, 2022
  2. MINOR: [R][Docs] Fix the Rd file of infer_type (#13878)

    Authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
    Signed-off-by: Jonathan Keane <jkeane@gmail.com>
    eitsupi committed Aug 23, 2022
  3. MINOR: [C++] Replace std::random_shuffle with std::shuffle (#13948)

    `std::random_shuffle` was deprecated in C++14 and removed altogether in C++17 in favor of `std::shuffle`. The codebase only uses these shuffling functions in tests and benchmarks and `std::shuffle` is used in all instances except this one.
    
    Authored-by: Bryce Mecum <petridish@gmail.com>
    Signed-off-by: Antoine Pitrou <antoine@python.org>
    amoeba committed Aug 23, 2022
  4. ARROW-17389: [Python] Properly exclude tests when PYARROW_INSTALL_TES…

    …TS=0 (#13904)
    
    Authored-by: Miles Granger <miles59923@gmail.com>
    Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
    milesgranger committed Aug 23, 2022
  5. ARROW-17451: [CI][Java] Use manylinux2014 image for JNI (#13920)

    Because our official .jar packages are built in manylinux2014 image.
    
    Authored-by: Sutou Kouhei <kou@clear-code.com>
    Signed-off-by: Sutou Kouhei <kou@clear-code.com>
    kou committed Aug 23, 2022

Commits on Aug 22, 2022

  1. ARROW-17496: [Go] Fix Nightly Build (#13943)

    Turns out that the `pragma_table_info` function in modernc.org/sqlite's package doesn't work correctly in go1.17 either, only in go1.18. As this is only used for testing and the example sqlite flightsql server, rather than anything needed in the flightsql package itself, the bulid failure is easily solved by marking the example and its tests to be only built in go1.18.
    
    As we already have a git workflow that runs with go1.18, the CI will still continue to test the example code, but mamba builds using go1.17 won't break anymore.
    
    Authored-by: Matt Topol <zotthewizard@gmail.com>
    Signed-off-by: Matt Topol <zotthewizard@gmail.com>
    zeroshade committed Aug 22, 2022
  2. ARROW-17499: [Go] Shift MakeArrayOfNull to array Package (#13944)

    Authored-by: Matt Topol <zotthewizard@gmail.com>
    Signed-off-by: Matt Topol <zotthewizard@gmail.com>
    zeroshade committed Aug 22, 2022
  3. ARROW-17489: [R] Nightly builds failing due to test referencing unrel…

    …ease stringr functions (#13937)
    
    Authored-by: Nic Crane <thisisnic@gmail.com>
    Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>
    thisisnic committed Aug 22, 2022
  4. ARROW-16690: [R][FlightRPC] Additional max_chunksize parameter in do_…

    …put method (#13267)
    
    **Summary**
    An additional parameter in Flight do_put to specify chunk size in R.
    
    **Problem**
    Currently, all data is sent through in a single message. It's a likely scenario that users will want the ability to control the batch sizes without building a custom do_put method.
    
    **Solution**
    Additional (optional) parameter to specify chunk size.
    
    Lead-authored-by: Christopher.Dunderdale <Christopher.Dunderdale@dyna-mo.com>
    Co-authored-by: Christopher Dunderdale <47271795+thatstatsguy@users.noreply.github.com>
    Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>
    thatstatsguy committed Aug 22, 2022
  5. ARROW-17475: [Go] Function interface and Registry impl (#13924)

    Authored-by: Matt Topol <zotthewizard@gmail.com>
    Signed-off-by: Matt Topol <zotthewizard@gmail.com>
    zeroshade committed Aug 22, 2022
  6. ARROW-17482: [Go] Remove ValueDescr types (#13930)

    Authored-by: Matt Topol <zotthewizard@gmail.com>
    Signed-off-by: Matt Topol <zotthewizard@gmail.com>
    zeroshade committed Aug 22, 2022
  7. ARROW-17479: [Go] Add ArraySpan and utilities (#13929)

    Relating to the building of the functionality for Compute in Go with Arrow, this is the implementation of ArraySpan / ExecValue / ExecResult etc.
    
    It was able to be separated out from the function interface definitions, so I was able to make this PR while #13924 is still being reviewed
    
    Authored-by: Matt Topol <zotthewizard@gmail.com>
    Signed-off-by: Matt Topol <zotthewizard@gmail.com>
    zeroshade committed Aug 22, 2022
  8. ARROW-17131: [Python] add StructType().field(): returns a field by na…

    …me or index (#13652)
    
    Authored-by: anjakefala <anja@voltrondata.com>
    Signed-off-by: David Li <li.davidm96@gmail.com>
    anjakefala committed Aug 22, 2022

Commits on Aug 21, 2022

  1. ARROW-17476: [Release][Packaging] Make binary uploader reusable from …

    …datafusion-c (#13923)
    
    Binary uploader is dev/release/05-binary-upload.sh and
    dev/release/post-02-binary.sh. We need to customize .deb package
    name.
    
    This also adds missing environment variable entries.
    
    Authored-by: Sutou Kouhei <kou@clear-code.com>
    Signed-off-by: Sutou Kouhei <kou@clear-code.com>
    kou committed Aug 21, 2022
Older