Towards a better default listlike datastructure for functional languages

Question

In most functional languages I know of, linked lists are the default datastructure of choice for many operations. The benefits are clear - they're clearly encoded with ADTs, and can be utilised easily with pattern matching. However, the cons (no pun intended) are also clear - they are memory and cache inefficient, and are frankly not great for anything other then a LIFO queue/stack.

Therefore, it seems reasonable to search for an alternative. Many such datastructures exist (Chris Okasaki's work on functional data structures comes to mind), but an immediate concern arises: they're simply not as easy to use.

Therefore, the meat of this question: Are there examples of existing functional languages that are built around other datastructures? If so, what approaches do they take (if any) to effectively mitigate concerns such as ease of use and learning curve?

This question is very broad and would benefit from refining it a bit; is the core asking for instances of functional languages built around other core data structures? How is that core defined, and how do we determine which properties have been preserved? If the goal is instead ideation on conceptual designs then it doesn’t quite form an answerable question currently. What is the principal goal here? It’s possible it needs to be split into multiple questions on some axis or another (e.g. different structures, different features preserved) in any case. — Michael Homer
– Michael Homer ♦, Commented Oct 17, 2023 at 2:26
@MichaelHomer Apologies for the unclear wording. The question is indeed meant to target existing languages built on other concepts; I will reword it. — blueberry
– blueberry, Commented Oct 17, 2023 at 2:44
Clojure is using vectors instead of cons cells to represent lists. Downside - no more dot-pairs. — SK-logic
– SK-logic, Commented Oct 17, 2023 at 13:16
Regarding "easy to use", surely that is an issue of what API the language exposes for lists, not what data structure is used under the hood. Unless by this you mean "easy to implement" (for the language implementor). There would be nothing stopping you from implementing Lisp using copy-on-write arrays, for example, while still making lists look like cons lists to the user. It's also a bit unclear if you are asking about performance considerations (which you mentioned in the first paragraph) or UX considerations (mentioned in your second and third paragraphs), or both. — kaya3
– kaya3, Commented Oct 17, 2023 at 16:43
@kaya3 Apologies for the confusion. My intent was to, with each paragraph: 1. A quick explanation of why I don't consider linked lists optimal. 2. Noting that better alternatives exist, but are more difficult to use. 3. Asking for languages that make these structures easier to use. — blueberry
– blueberry, Commented Oct 17, 2023 at 21:44

Marisa Kirisame · Accepted Answer · 2023-12-23 10:42:23Z

The better default then List is... List!

While this is not exactly what the question is looking for, the performance between List and Array is more nuanced then "Array fit everything consecutively in memory so it is good". The benefit of List is also amplified in functional world.

TLDR: Array is still faster in functional languages, but

List is a bit faster and Array is a bit slower, so the gap is not as big
List can be made faster with optimizations
List is more "functional".

List is a bit faster and Array is a bit slower, so the effect is not as big

Dynamic Array is not necessary more memory efficient then List. To have amortized O(1) insert, an array allocate more storage then needed to store the elements, copying/moving all elements to a new array when capacity is breached again. This mean dynamic array will use a constant factor more memory than needed. On the other hand, linked list have a overhead (the pointers) multiple of the list length. So basically - the larger the elements size, the more memory array will take up vs list (and array can be more inefficient!). You can use a rootish array, but then you lost some memory locality.
Memory Strategy matter. In a unmanaged/mark-sweep language, you can't bump allocate, so consecutive call to malloc give non-consecutive address. However in mark-copy/mark-compact you always get consecutive address. Some garbage collector even try to move objects in a way to improve locality so you may gain more.
If you already box or otherwise store pointers in the array, it is not that cache-friendly and you are paying for cache miss. This make paying for cache miss induced by list less of a bottleneck. And in functional language boxing happens basically everywhere.

List can be made faster with optimizations

Let's look at this classical list definition:

data List0 a = Nil | Cons a (List0 a)

You can imagine the compiler loop-unrolling it, giving you a segment list:

data List1 a = Nil | Cons1 a (List1 a) | Cons2 a a (List1 a)

or

data List2 a = Nil | Cons1 a | Cons2 a a (List2 a)

Note how List2 is isomorphic with List0, but List1 may represent the same List in multiple ways.

This make List1 better then List2 - List2 with odd length, appending with another List2, will cause a deep copy and traversing the other List2. List1 is free from said problem.

If the compiler successfully managed to mostly use Cons2, there will be half as much pointer chasing. You could imagine loop unrolling more - or going even fancier and have List (DynArray a) as the internal represetation for List a. But note that alas, loop unrolling work for other ADT (e.g. treelike) as well, but the DynArray approach become much harder for other ADT.

List is more "functional".

You can cons a List as many time as you like, but you can only append on an array once. Subsequent apply force a copy or get you into segment-list world.
Boxing is the bread and butter for functional programming. You want cheap 'copying' for everything? You need boxing! You want laziness? You need boxing! Have polymorphic function but dont want to monomorphize and compile everything multiple time ala C++? You need boxing!
It is not List, but ADT. In SML/Haskell you dont actually have lots of list, but a wide variety of ADT of different kinds, for example representing ASTs. It is unclear how you can 'extend' Array to store ASTs. And in fact even in unmanaged languages like C, C++, Rust, the canonical solution is to box them and throw locality out of the windows. I would claim that in such case performance is even worse in unmanaged world, as the canonical solution is to use ref-count to managed memory, which is substantially slower than a good GC.

Still, you want Array in your language as they are useful (just not good enough to be a default for general-purpose functional language).

A good functional api will be as follow:

Array: Type -> Type
build: (Nat -> a) -> Nat -> Array a
get: Array a -> Nat -> a (or Maybe a, getting safety for a bit of verboseness)
length: Array a -> Nat

as an example, append could then be written as:

append x y = build (\i -> if i < lx then get x i else get y (i - lx)) (lx + ly)
  where
    lx = length x
    ly = length y

Note two facts:

It seems inefficient because you have to copy the array over and over, as that is the functional style, but it could be fixed via functional-but-in-place style: <Perceus: Garbage Free Reference Counting with Reuse>. Said approach could even turn the classical quicksort program into an in-place one: Fully in-Place Functional Programming!
If you want mutability, either for expressiveness reason, or the automated methods above is not good enough: You dont need an imperative array, separated from the above definition! Ref (Array (Ref a)) get you what you want, and you can remove one of the two Ref for precise control. Even cooler - above mutable Array can invoke polymorphic function of immutable form (e.g. map, filter, length)!

This answer should mention difference lists; in general, there are many different types which can abstractly implement list-like APIs, not just your top-of-stack ADT. — Corbin
– Corbin, Commented Dec 24, 2023 at 21:08
@Corbin thx for the suggestion, I will include them in the answer in a bit. — Marisa Kirisame
– Marisa Kirisame, Commented Dec 24, 2023 at 23:17
-1. This answer, while very thorough, does not address my question, which explicitly asks for alternatives. — blueberry
– blueberry, Commented Jan 11, 2024 at 15:35

Moonchild · Accepted Answer · 2023-12-22 02:32:20Z

1

Arrays

Arrays are convenient to use and familiar to most programmers, and they admit efficient random and sequential access. We can make their operation efficient in a referentially transparent setting by doing two things (one to the language design, the other to the implementation):

Encourage the use of operations that operate on most or all of an array at a time, in order to amortise the overhead of allocating new array storage.
Detect when an array reference is unique, and when it is, perform operations on it in place rather than copying. This can be done dynamically, with reference counting, or statically. (Or both.)

These strategies are used by high-performance apls and relatives (e.g. dyalog, j, futhark), where they empirically work quite well.

answered Dec 22, 2023 at 2:32

Moonchild

1,0964 silver badges11 bronze badges

2

$\begingroup$ Many language's primary or "default" list data-structure is backed by an array (Java's ArrayList and Vector, Swift's and Objective-C's array, Rust's Vec, C++'s std::vector), so this is a good take. But it's important to note that an array-list isn't just an array: it also has amortized time-complexity for insertion and removal, by storing the length separate from the array's capacity, and re-allocating/transferring to a larger array (usually of size original_array.size * SOME_GLOBAL_CONSTANT) when you insert an element and the underlying array is full (when length == capacity). $\endgroup$

tarzh
– tarzh

2023-12-22 03:17:44 +00:00
Commented Dec 22, 2023 at 3:17
$\begingroup$ And I should note that while these languages may not be considered "functional" by everyone (implicitly sequential and impure), some of them (Swift, Rust, Scala, Kotlin) include many functional concepts. In my experience, I rarely if ever see people using linked lists in these languages, even when the list frequently gets copied then data is prepended or appended (a case where LinkedList would theoretically be preferred), because in practice computers are just so much faster at processing arrays. $\endgroup$

tarzh
– tarzh

2023-12-22 03:38:50 +00:00
Commented Dec 22, 2023 at 3:38

Add a comment |

André L F S Bacci · Accepted Answer · 2023-10-19 13:40:09Z

`dict`

All of Lua's language data structures can be said to be implemented by one and only one structure, a dictionary like data type called table:

The type table implements associative arrays, that is, arrays which can be indexed not only with numbers, but with any value (except nil). Therefore, this type may be used not only to represent ordinary arrays, but also symbol tables, sets, records, etc. To represent records, Lua uses the field name as an index.

ML languages already demonstrated that this direct indexing is not strictly necessary, as this can be emulated with a list of pairs. For example: ((key1 val1)(key2 val2)). Anything you can conveniently gain from a Dictionary/HashMap structure, you can do with the most abstract and basic list structure.

But as you noted, this is memory and cache inefficient, and very annoying to use. And as the outer list knows nothing about their inner nodes, it can not pack the data in some specific performant way, and cannot access its elements as fast for the same reason.

But after your language has a dict like as the basic structure, that can index inner nodes by any other primitive value (explicit strings and ints, and internal ints), you gain all functional features of lists and then are able to implement the performant and memory efficiency parts.

You don't need to even implement explicit indexing. By only internally implementing lists as an PHP/Lua's array()/table, instead of linked lists, you already work around memory and cache inefficiency. By adding indexing access, you gain the easy to use parts.

If data structures are immutable, then using a regular array-backed dictionary as the basic data structure could make performance even worse. Updating a value would require the entire thing to be copied. You could implement it with something like a red-black tree, but then why not just have that in the standard library? Alternatively, you could force the user to only interact with the dictionary through a monad or effect handler or something, but that seems very inconvenient. — Luke LaBonte
– Luke LaBonte, Commented Oct 19, 2023 at 14:28
This answer recommends a less efficient alternative, but the question is asking about more efficient alternatives. — Corbin
– Corbin, Commented Oct 20, 2023 at 18:27

Stack Exchange Network

Towards a better default listlike datastructure for functional languages

3 Answers 3

Arrays

`dict`

You must log in to answer this question.

Hot Network Questions

Towards a better default listlike datastructure for functional languages

3 Answers 3

Arrays

dict

You must log in to answer this question.

Related

Hot Network Questions

`dict`