The Wayback Machine - https://web.archive.org/web/20230803204841/https://github.com/apache/arrow
Skip to content

apache/arrow

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

… Arrays from MATLAB data (#36978)

### Rationale for this change

As discussed in #36855, we think it would be better to move the recommended APIs for the MATLAB Interface directly under the top-level `arrow.*` package. This should help simplify the interface, and will make it easier for users to switch between multiple language bindings. We have already moved the `type` convenience constructors to the `arrow` package.  Now we want to add a gateway function that creates arrays to mirror `PyArrow`. As part of this change, we will modify the array constructors to accept `libmexclass.proxy.Proxy` objects - similar to how the `arrow.type.<Type>` constructors accept  `libmexclass.proxy.Proxy` objects.

### What changes are included in this PR?

1. Added `arrow.array()` gateway function that can be used to construct arrays:

```matlab
>> arrowArray = arrow.array([1 2 3 4]);
>> class(arrowArray)

ans =

    'arrow.array.Float64Array'

>> arrowArray = arrow.array(["A" "B" "C"]);
>> class(arrowArray)

ans =

    'arrow.array.StringArray'

```

2. Added a static `fromMATLAB()` method to all  subclasses of`arrow.array.Array`. 

```matlab
>> array = arrow.array.StringArray.fromMATLAB(["A" "B" "C"])

array = 

[
  "A",
  "B",
  "C"
]

>> array = arrow.array.TimestampArray.fromMATLAB(datetime(2023, 8, 1))

array = 

[
  2023-08-01 00:00:00.000000
]

```

As part of this change, users can no longer use the `arrow.array.Array` subclass constructors to create arrays. Instead, they can use either `arrow.array()` or the static `fromMATLAB` method.

### Are these changes tested?

Updated the existing tests to account for the API changes and added the following new test classes:

1. arrow/internal/validate/tType.m
2. arrow/internal/validate/tShape.m
3. arrow/internal/validate/tRealNumeric.m
4. arrow/internal/validate/tNonsparse.m
5. arrow/internal/validate/tNumeric.m
6. arrow/array/tArray.m

### Are there any user-facing changes?

Yes, we changed the signature of all `arrow.array.Array` subclasses to accept scalar `libmexclass.proxy.Proxy` classes. NOTE: The MATLAB interface is still under active development. 

### Future Directions

1. In a followup PR, we plan on adding a new name-value pair to `arrow.array()` called `Type`, which can be set to an `arrow.type.Type` object. This will let users specify what kind of arrow array they would like to create from MATLAB data.

* Closes: #36953

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
36ddbb5

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time

Apache Arrow

Fuzzing Status License Twitter Follow

Powering In-Memory Analytics

Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.

Major components of the project include:

Arrow is an Apache Software Foundation project. Learn more at arrow.apache.org.

What's in the Arrow libraries?

The reference Arrow libraries contain many distinct software components:

  • Columnar vector and table-like containers (similar to data frames) supporting flat or nested types
  • Fast, language agnostic metadata messaging layer (using Google's Flatbuffers library)
  • Reference-counted off-heap buffer memory management, for zero-copy memory sharing and handling memory-mapped files
  • IO interfaces to local and remote filesystems
  • Self-describing binary wire formats (streaming and batch/file-like) for remote procedure calls (RPC) and interprocess communication (IPC)
  • Integration tests for verifying binary compatibility between the implementations (e.g. sending data from Java to C++)
  • Conversions to and from other in-memory data structures
  • Readers and writers for various widely-used file formats (such as Parquet, CSV)

Implementation status

The official Arrow libraries in this repository are in different stages of implementing the Arrow format and related features. See our current feature matrix on git main.

How to Contribute

Please read our latest project contribution guide.

Getting involved

Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved: