Restructuring an OOP datatype into Haskell types

Question

Coming from an OOP background, Haskell's type system and the way data constructors and typeclasses interact is difficult to conceptualize. I can understand how each are used for simple examples, but some more complication examples of data structures that are very well-suited for an OOP style are proving non-trivial to translate into similarly elegant and understandable types.

In particular, I have a problem with organizing a hierarchy of data such as the following.

data

This is a deeply nested hierarchical inheritance structure, and the lack of support for subtyping makes it unclear how to turn this structure into a natural-feeling alternative in Haskell. It may be fine to replace something like Polygon with a sum data type, declaring it like

data Polygon
= Quad Point Point
| Triangle Point Point Point
| RegularNGon Int Radius
| ...

But this loses some of the structure, and can only really satisfactorily be done for one level of the hierarchy. Typeclasses can be used to implement a form of inheritance and substructure in that a Polygon typeclass could be a subclass of a Shape, and so maybe all Polygon instances have implementations for centroid :: Point and also vertices :: [Point], but this seems unsatisfactory. What would be a good way of capturing the structure of the picture in Haskell?

Why are typeclasses unsatisfactory? I'm not a Haskell expert, but I think that they would be the tool of choice. — Thomas
– Thomas, Commented Aug 31, 2017 at 5:45
How is that inheritance hierarchy useful? As given, Shape offers no common behaviour, so the first step should be to dismantle the hierarchy. — Mark Seemann
– Mark Seemann, Commented Aug 31, 2017 at 5:52
Additionally, I think the premise of this question ought to be challenged. We've known for more than 20 years that inheritance is bad, re Design Patterns (1994): Favor object composition over class inheritance. — Mark Seemann
– Mark Seemann, Commented Aug 31, 2017 at 6:10
It's generally the case that OOP it's better at adding new types of data while functional programming is better at adding new operations to work with those data. We functional programmers tend to think that we need new operations more often than we need new types of data. There are various ways to try to get everything, but they're not for the faint of heart. Search for "expression problem". — dfeuer
– dfeuer, Commented Aug 31, 2017 at 6:11
I agree entirely with @MarkSeemann - you are not making proper use of OOP principles, so trying to emulate them in Haskell (or even in a language with actual OOP) will be confusing at best. That being said, you can write e.g. data Polygon = Quad Quad | ...; data Quad = Rectangle Point Point Length Length | Parallelogram Point Point Length Length Length; essentially each node is a the datatype containing nodes below it; and the constructor injecting that datatype into nodes above it. — user2407038
– user2407038, Commented Aug 31, 2017 at 6:11

Ben · Accepted Answer · 2017-08-31 11:47:46Z

You can use sum types to represent the entire hierarchy, without losing structure. Something like this would do it:

data Shape = IsPoint Point
           | IsLine Line 
           | IsPolygon Polygon

data Point = Point { x :: Int, y :: Int }

data Line = Line { a :: Point, b :: Point }

data Polygon = IsTriangle Triangle
             | IsQuad Quad
             | ...

And so on. The basic pattern is you translate each OO abstract class into a Haskell sum type, with each of its immediate OO subclasses (that may themselves be abstract) as variants in the sum type. The concrete classes are product/record types with the actual data members in them.¹

The thing you lose compared to the OOP you're used to by modeling things this way isn't the ability to represent your hierarchy, but the ability to extend it without touching existing code. Sum types are "closed", where OO inheritance is "open". If you later decide that you want a Circle option for Shape, you have to add it to Shape and then add cases for it everywhere you pattern match on a Shape.

However, this kind of hierarchy probably requires fairly liberal downcasting in OO. For example, if you want a function that can tell if two shapes intersect that's probably an abstract method on Shape like Shape.intersects(Shape other), so each sub-type gets to write its own implementation. But when I'm writing Rectangle.intersects(Shape other) it's basically impossible generically, without knowing what other subclasses of Shape are out there. I'll have to be using isinstance checks to see what other actually is. But that actually means that I probably can't just add my new Circle subclass without revisiting existing code; an OO hierarchy where isinstance checks are needed is de-facto just as "closed" as the Haskell sum type hierarchy is. Basically pattern matching on one of the sum-types generated by applying this pattern is the equivalent of isinstancing and downcasting in the OO version. Only because the sum types are exhaustively known to the compiler (only possible because they're closed), if I do add a Circle case to Shape the compiler is able to tell me about all the places that I need to revisit to handle that case.²

If you have a hierarchy that doesn't need a lot of downcasting, it means that the various base classes have substantial and useful interfaces that they guarantee to be available, and you usually use things through that interface rather than switching on what it could possibly be, then you can probably use type classes. You still need all the "leaf" data types (the product types with the actual data fields), only instead of adding sum type wrappers to group them up you add type classes for the common interface. If you can use this style of translation, then you can add new cases more easily (just add the new Circle data type, and an instance to say how it implements the Shape type class; all the places that are polymorphic in any type in the Shape class will now handle Circles as well). But if you're doing that in OO you always have downcasts available as an escape hatch when it turns out you can't handle shapes generically; with this design in Haskell it's impossible.³

But my "real" answer to "how do I represent OO type hierarchies in Haskell" is unfortunately the trite one: I don't. I design differently in Haskell than I do in OO languages⁴, and in practice it's just not a huge problem. But to say how I'd design this case differently, I'd have to know more about what you're using them for. For example you could do something like represent a shape as a Point -> Bool function (that tells you whether any given point is inside the shape), and having things like circle :: Point -> Int -> (Point -> Bool) for generating such functions corresponding to normal shapes; that representation is awesome for forming composite intersection/union shapes without knowing anything about them (intersect shapeA shapeB = \point -> shapeA point && shapeB point), but terrible for calculating things like areas and circumferences.

¹ If you have abstract classes with data members, or you have concrete classes that also have further subclasses you can manually push the data members down into the "leaves", factor out the inherited data members into a shared record and make all of the "leaves" contain one of those, split a layer so that you have a product type containing the inherited data members and a sum type (where that sum type then "splits" into the options for the subclasses), stuff like that.

² If you use catch-all patterns then the warning might not be exhaustive, so it's not always bullet proof, but how bullet proof it is is up to how you code.

³ Unless you opt into runtime type information with a solution like Typeable, but that's not an invisible change; your callers have to opt into it as well.

⁴ Actually I probably wouldn't design a hierarchy like this even in OO languages. I find it doesn't turn out to be as useful as you'd think in real programs, hence the "favour composition over inheritance" advice.

data Line = Line { a :: Line, b :: Line }? Shouldn't a and b be :: Point?
This is a great comment, but the example that Rectangle.intersects(Shape other) needs ot use instanceOf (or another form of RTTI or type tags ...) is not generally true. I would consider using instanceOf a design flaw in your hierarchy. The Visitor pattern solves this problem by using double-dispatch, so you can dispatch on both types.It makes it easy to add new operations to your program, at the expense of making it harder to add new Shape types. But this is also true for the sum type where you have to add cases everywhere.
For a completely different take on this, check out the awesome diagrams package, which offers points and lines and rectangles and more. The big idea of that package is to discover a core semantics shared by all the objects of interest, or, if there is no shared semantics, admit it and use two semanticses (e.g. points and rectangles, which after all don't really share that many operations). Then there is no question of inheritance; rectangles and circles aren't different types, they are two different semantic objects in the same type.
Perhaps my strongest argument for favoring composition over inheritance is that compositional designs tend to port easily from OOP to FP (and vice versa) while inheritance-based designs do not. This leads me to believe that composition is more modular in some general sense that transcends parochial concerns like classes or ADTs.

comingstorm · Accepted Answer · 2017-08-31 09:27:06Z

You may be looking for a Haskell equivalent of dynamic dispatch, such that you could store a heterogeneous list of values supporting distinct implementations of a common Shape interface.

Haskell's existential types support this kind of usage. It's fairly rare for a Haskell program to actually need existential types -- as Ben's answer demonstrates, sum types can handle this kind of problem. However, existential types are appropriate for a large, open-ended collection of cases:

{-# LANGUAGE ExistentialQuantification #-}
...
class Shape a where
    bounds :: a -> AABB
    draw   :: a -> IO ()

data AnyShape = forall a. Shape a => AnyShape a

This lets you declare instances in an open-ended style:

data Line = Line Point Point
instance Shape Line where ...

data Circle= Circle {center :: Point, radius :: Double}
instance Shape Circle where ...

...

Then, you can build your heterogeneous list:

shapes = [AnyShape(Line a b),
          AnyShape(Circle a 3.0),
          AnyShape(Circle b 1.8)]

and use it in a uniform way:

drawIn box xs = sequence_ [draw s | AnyShape s <- xs, bounds s `hits` box]

Note that you need to unwrap your AnyShape in order to use the class Shape interface functions. Also note that you must use the class functions to access your heterogeneous data -- there is no other way to "downcast" the unwrapped existential value s! Its type only makes sense within the local scope, so the compiler will not let it escape.

If you are trying to use existential types, yet find yourself needing to "downcast" them, sum types might be a better fit.

It's worth noting that this use of existentials is widely considered an anti-pattern. They are generally much more cumbersome to deal with than properly seperable sum types. The advantage they do bring, as you say, is that they're freely extendable.
@leftaroundabout I was about to write the same thing. IMO, for classes such as Shape above, it could be better to employ data Shape = Shape AABB (IO ()) and include this inside Circle,.... The existential typeclass approach becomes more interesting, instead, on type-preserving operations like translate :: Shape a => a -> Vector -> a since we can't instead have data Shape = Shape ... (Vector -> ?(actual subtype)), provided we really need to collect shapes in a list or a map (which forces us to have a AnyShape existential).
And, of course, such type preserving operations are those which (ironically) can not be expressed in OOP. Usually we see encodings of that such as class Circle implements Translatable<Circle> {..} (AKA CRTP, AKA F-bound polymorphism). Still, those are the cases were (AFAICS) one really needs existentials+typeclass.
@chi Why can't we use a normal typeclass for translate? How is it relevant whether translate is type-preserving or not when we only use it on elements of a heterogenous collection?
Thanks for pointing out that existentials (as with other high-level features) are best avoided when they aren't necessary -- as well as articulating a use case where they are necessary! I'm still considering how (and if) I ought to update my answer to incorporate those points...

Collectives™ on Stack Overflow

Restructuring an OOP datatype into Haskell types

2 Answers 2

7 Comments

7 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

7 Comments

Related