Problems with Immutable Data in Functional Programming

Question

I am new to functional programming. What I understood is functional programming is writing code using pure functions and without changing value of data.

Instead of changing value of variables we create new variables in functional programming when we need to update a variable.

Suppose we have a variable x which represents the total number of HTTP requests made by the program. If we have two threads then I want the threads to increment x whenever a HTTP request is made by any thread. If both threads make a different copy of the variable x then how can they synchronize the value of x. For example: if thread 1 make 10 HTTP requests and thread 2 made 11 HTTP requests then they will print 10 and 11 respectively but how would I print 21.

Threads are never a goal, always a tool. You normally want to forget about them as much as possible and only let some low level, very well thought machinery do the threading for you under the hood. — Erik Kaplun
– Erik Kaplun, Commented Jan 16, 2016 at 20:17

leeor · Accepted Answer · 2016-01-16 15:30:19Z

I can provide an answer for the clojure case. In clojure, if you need to co-ordinate access to shared state, there are constructs in the language that are designed to deal with these situations.

In this case, you could use an atom to hold the value. Changes made to an atom are atomic, and will be made optimistically by way of clojure's STM. Atoms are one of clojure's reference types. An atom is essentially a reference to a value, which can change over time in a controlled way via the atom's mutation functions.

See the clojure docs for more information on atoms and the other references types.

Sibi · Accepted Answer · 2019-09-06 08:56:50Z

1

I will address the Haskell part. MVar is one of the communication mechanism for threads. This is one of the example taken from Simon Marlow's book (the program is self-explanatory):

main = do
  m <- newEmptyMVar
  forkIO $ do putMVar m 'x'; putMVar m 'y'
  r <- takeMVar m
  print r
  r <- takeMVar m
  print r

The output for the above program will be:

'x'
'y'

You can see in the above example how the MVar value in variable m is shared between threads. You can learn more about these techniques in this book.

edited Sep 6, 2019 at 8:56

answered Jan 16, 2016 at 16:27

Sibi

49k18 gold badges105 silver badges172 bronze badges

2 Comments

Erik Kaplun Over a year ago

-1 for not addressing the real issue — a misconception in how threads should be used, especially in a language like Haskell.

Sibi Over a year ago

@ErikAllik I was just showing one of the mechanism of how threads communicate with each other. I would be glad to read your answer.

MasterMastic · Accepted Answer · 2016-01-18 04:02:36Z

I will also address the Haskell part.

First, I want to clear something up:

Instead of changing value of variables we create new variables in functional programming when we need to update a variable.

That's not so accurate. We create new "variables" in FP when we need them, not when we need to mutate existing ones. We don't even think in terms of mutation when we do what you describe; we may just think we're creating a new value that's similar to one we have.

What you're describing with threads is a bit different. You're actually looking for a side-effect (increment a counter). Haskell, being pure, will not just allow you to throw side-effects without being very explicit of it. So in this case you will need to resort to reference types / mutable cells. The most simple one is called IORef and it's very much like a variable in this sense; you can assign a value, read the current value, and so on.

So, as you can see, when you're looking for these kind of stuff, you really do have just one copy of the counter.

The above is the essence of my answer, but you've asked concretely about threads so I'll respond to that as-well.
IORefs are not actually thread-safe. So there are MVars as suggested. They are not like regular variables, but they are close, and they get the job done elegantly. Generally and loosely speaking: they abstract variables and locking. I think you might find TVars easier though. They behave like IORef/variables, only that unlike both, they are thread-safe; you can compose their operations into one operation, and any operation done to them is done atomically (STM).

By the way, you might find ways to avoid state altogether, and this is very encouraged. E.g. you can have the two threads execute an asynchronous recursive function that remembers through an argument how many requests have been made, and later have it as a return value. The total requests count is the sum of requests all your threads returned. This can avoid the side-effect on the counter, but it could only yield you a result when the threads are finished. That's quite limited, so sometimes you might want that side-effect.

Soniku · Accepted Answer · 2016-01-18 14:31:19Z

Well I will try to provide more general explanation when holding state since I think that is what you really want to know.

Generally you can accomplish the same thing through recursion, so for example if you have function below:

somefun ()->
   somefun(0).
somefun (X) ->
  perform_http_request(),
  if(something!=quit)
     somefun(X+1)
end function.

generate_thread(0, Accumulator) ->
      Accumulator;
generate_thread(X, Accumulator) ->
      Y = somefun(),
      NewAccumulator = add_to_accumulator(Y),
      generate_thread(X-1, NewAccumulator).

I just typed this in a hurry and this is a very generic explanation (you won't be able to use this code directly) but think you can find that you don't really the mutability here... The function will finish when all of the threads finish their processing, now the actual thread sync you can do depending on the language of your choice and different languages have different ways of handling concurrency and "threads" .. I would suggest that you take a look at Erlang if you are into concurrency as it has a really good concurrency model imo.

Anyways at the end you can just sum all the values in the accumulator that are returned and display that, take a look at foldl and foldr functions as well by the way.

Amanuel Nega · Accepted Answer · 2019-09-03 19:24:53Z

I'm not a guru myself, but I think you might have miss understood.

Without keeping state, you can't create a program that would be much useful. There needs to state one way or another. The target of FP is not to avoid state, it's to keep usage of state under control.

Look at it this way, your state should be as isolated and as safe as your database entries. If you treat state as you would treat a database, I think you'll be fine

This implies,

You won't have some login like (inc count). You will rather have a function increment-count! that will safely update the count. Notice !, this means side effect.
You will not have a code that will depend on the side effects. Rather you'll depend on functions that expect everything from their parameter. Unless they absolutely have to depend on the state. Like updating count, which is an undeniable call for a state.
Your first choice should be to avoid state. When passing it to the function becomes impossible, then you'll create states that are updated properly.
Treat states as you would treat external APIs. Something remote you have to use some kind of protocol to access.

I hope this makes sense.

Ming L. · Accepted Answer · 2019-09-05 14:32:11Z

I will the address Erlang part. There is no magic for synchronization even in Erlang, that someone somewhere has to deal with synchronization. It’s just Erlang with the benefit with immutability (aka no variable) helps to prevent common sync mistakes in concurrent programming. Erlang/OTP like gen_server already has the infrastructure to manage the state. In fact, gen_server is single threaded that any message it receives are queued up in mailbox. Here is a link about Erlang’s message concurrency. How Erlang processes access mailbox concurrently

In the original post case, to peg a http request counter, you can use a single gen_server OTP (Erlang). You will be surprised how much throughput it can handle. If a single gen_server throughput is really insufficient, there can be a hierarchical gen_server to aggregate the counts. Erlang/OTP comes with a set of runtime API to measure performance in real time.

Collectives™ on Stack Overflow

Problems with Immutable Data in Functional Programming

6 Answers 6

Comments

2 Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

2 Comments

Comments

Comments

Comments

Comments

Linked

Related