Go blocks, threads, futures, and garbage collection


#1

Please vote. Comments are encouraged!

  • :thumbsup: Yes, please teach this!
  • :thumbsdown: No, I’m not interested.

0 voters

I don’t know if this is covered by content elsewhere but I’d like to understand how Clojure’s various concurrency constructs interact with the JVM garbage collector.

My basic puzzle is this. On the one hand, objects in the JVM need to be referred to to stay alive. That is, if there is no reference to an object, then the JVM’s garbage collector can remove it at any time.

But on the other hand, many of clojure’s asynchronous API seem to be “fire and forget”. They allow you to request a block of code to be executed, but don’t necessarily require you to hold onto a reference to an object that keeps that block of code alive.

I can see three possibilities:

  1. Are we at risk of every piece of asynchronous code just spontaneously stopping when the objects underneath it are GCd?
  2. Or are we required to hold onto references to some kind of object associated with every task we request to be executed (e.g., the Thread object)?
  3. Or do parts of “the system”, whether it is the Java standard libraries or Clojure, take responsibility for holding a reference for us?

Some examples. If I do

(future (while true (Thread/sleep 10000) (println "hello")))

and I don’t hold onto the returned future, then who is holding onto a reference that keeps the running code from being GCd?

You can pose a similar question for future-call, for go, and even for tasks that I submit to a ExecutorService. If I don’t hold onto the service, then are the unexecuted tasks at risk never of being executed because the executor could be GCd?

(Or alternatively, do isolated references cycles keep themselves alive forever, and in that case why am I not responsible for explicitly disposing of resources in all of the above examples?)

Basically, I’m trying to get a deeper understanding of what a developer’s responsibilities are in order to use Clojure’s asynchronous task constructs correctly, and it seems that understanding that requires understanding better how it hooks into the GC and Java’s concurrency constructs.


#2

Hi Alexis,

This is a great question and goes deep into how the JVM treats threads. Your curiosity is great! These are the kinds of things I was curious about when I got into Clojure as well. What I found was that everything in this respect is pretty well-behaved. Clojure wouldn’t be good for concurrency without a lot of thoughtful design around everything to do with threads.

Threads are treated specially by the JVM. Running threads (that is, Threads that have been started but not run to completion) are considered to be roots for the GC. As roots, running threads are never collected themselves, and anything they reference transitively is also protected from the GC. Once a thread terminates, it will be collected like other objects. So there’s no need to hold onto references to thread objects. This is all part of the JVM, and Clojure simply inherits this behavior.

As far as ExecutorServices go, the same applies. ExecutorServices are garbage collected when they go out of scope, but the threads they refer to will run to completion before they are collected. Unfortunately, there’s no reliable way to stop a running thread from another thread. So if it doesn’t stop on its own, it could be a leak.

Clojure futures are run in threads managed by the Clojure runtime. Infinite loops in futures are a problem since the thread will never be collected and could hold references to objects, causing memory leaks.

Your responsibilities for using Threads, futures, and other threading constructs in Clojure:

  • Keep them free of infinite loops. There is no way to stop them.
  • Hold onto futures and other value-holding constructs as long as you still need the value.

Besides that, I think you’re good to trust the VM underneath.

Thanks! This was fun!

Eric


#3

Thanks, this is very helpful!

So a running Thread is a root for GC. That explains why a running task is not at risk of suddenly stopping because it gets GC’d out of existence. In fact, it sounds like one more often encounters the opposite problem, that once a thread is running there is no way to stop it from the outside. Okay.

But let me make sure I understand what this implies for a task that has not yet been assigned to a Thread object, or that is otherwise waiting to run but is not yet a running thread.

For instance, if I call (future (task)), all Clojure gives me is an assurance that (task) will be run at some point in the future. For instance, it does not guarantee that the task will start running before future returns. So in that case there is an interval of time between when future returns (on my thread) and task is started (on its thread). My question: who is holding onto the reference that keeps the future alive in that interval, before it becomes a running Thread and has its own zombie-like powers of immortality? If I understand you right, you are saying this is managed by the Clojure runtime. Is that correct? And does the same apply for go blocks?

And this is unlike the case with an ExecutorService, where I would be responsible for holding the reference to the service, which holds the reference to the task. So if I didn’t hold the reference to the service, and the task has not yet starting running, then the executor service could be GC’d, and the task along with it. Correct?

If I understand all of the above correctly, then that would lead me to guess that under the hood the Clojure runtime is managing future objects by holding a reference to its own private ExecutorService for futures, just as (I presume) it manages its own thread pool for executing go blocks.

And a follow-up question: if infinite loops in threads are a problem, then are not infinite loops in go blocks a problem for the same reason? Even if execution stops because the channels are blocked, they are still holding storage. Or perhaps there is some way to explicitly terminate them?


#4

Hey Alexis,

You’re asking very detailed and poignant questions. I think we’ll have to go deep into the source for this one. Let’s follow the trail through the source code and see where that leads us.

In the code for clojure.core/future, we see the docstring:

Takes a body of expressions and yields a future object that will
invoke the body in another thread, and will cache the result and
return it on all subsequent calls to deref/@. If the computation has
not yet finished, calls to deref/@ will block, unless the variant of
deref with timeout is used. See also - realized?.

You said that there’s “no guarantee that the task will start running before the call to future returns”. But certainly the task has to be on some call stack somewhere for it to be run. But let’s trace it through entirely.

We see that it’s a macro that expands to a call to clojure.core/future-call. Here’s the source:

Here are the relevant lines:

  (let [f (binding-conveyor-fn f)
        fut (.submit clojure.lang.Agent/soloExecutor ^Callable f)]

clojure.core/binding-conveyor-fn is not something you use every day. But it makes a new function that appears to reset vars to their root values. Vars have a root value and a per-thread value, per their thread-safe semantics. That’s not very relevant to the question. After that function gets called, (.submit clojure.lang.Agent/soloExecutor f) is called. clojure.lang.Agent/soloExecutor is a standard ExecutorService.

So, that’s the end of its journey. .submit returns a Java Future, which is then used by Clojure. The code is just wrapped up a little in some utility function and passed off to an ExecutorService that’s part of the Clojure runtime. The ExecutorService puts it on a queue and it will be put on a thread to be run.

That was a very detailed question, and actually deals with stuff most Clojure programmers don’t deal with very often, if at all. But I appreciate the curiosity. Clojure’s concurrency/thread parallelism story is very solid. This assurance that the task will be run at some point is handled completely by the runtime.

core.async does have its own thread pool that actually runs the go blocks. Yes, you need to worry about infinite loops in go blocks. You also need to worry about deadlock. Luckily, go blocks tend to be small and easy to understand.

Now, that’s not to say you don’t want an infinite loop in your go block. If you’re waiting on a channel inside of an infinite loop, that go block will be “parked” while it’s waiting and won’t be blocking any threads. Infinite loops are fine as long as you take or put to channels inside the loop. In fact, go block infinite loops are very common for implementing local event loops.

Because you’ve been asking for lots of gory details, I’ll go deeper. Go blocks are essentially callback functions that are parked in the channels they are waiting on. So when the channels are out of scope and the go block is parked, the go block can be collected along with the channel. That’s a very nice feature because you almost never have to worry about ending a go block like you do in the Go language. Channels and go blocks are just objects and the GC takes care of everything.

Watch this talk by Rich Hickey if you like these details.

Eric


#5

This is awesome! Thanks for helping me understand it.

And thanks for the link. I’ll definitely check out that talk.

I think the reason these questions are occurring to me is because most of my experience with concurrency APIs is with Cocoa’s Grand Central Dispatch system, which is used on iOS and on OS X.

That system is different in a few respects from Java’s APIs and Clojure’s APIs:

  • the system provides default queues and an executor. So you don’t need to create and manage one yourself. (Here it’s like Clojure but unlike Java.)
  • you need to take special care to avoid creating references cycles, since storage management is done not by a garbage collector but by reference counting code that the compiler automatically generates. But it doesn’t have a cycle detector, so you need to break cycles manually. This comes up almost whenever you’re creating a lambda really.
  • object disposal is deterministic. Once the references go to zero, poof, it’s gone.
  • on mobile especially, you sometimes need to worry very hard about use of RAM, since just a few screen-sized images will consume more than your working memory.

So with this background, when it’s not clear who is supposed to be holding the reference that keeps something alive, I worry that it’s being kept alive just by accident! Or I worry about holding too much.


#6

Ah, I see!

Well, let me say that Rich Hickey takes object ownership very seriously. That is one of the biggest reasons to use a GC with immutable values. You simply don’t have to think about it so much anymore. Reference cycles are not a problem with the Java GC. Java uses a generational collector, so short-lived garbage is collected very quickly, often with only a few instructions. That said, it can be a problem on mobile or embedded. I think the rule of thumb is that you need about 100% more memory than you need at peak memory usage or you can get pauses.

What you do have to think about is holding onto the head of a long lazy list. If you’re consuming it, you’re holding onto all of that garbage. I’ve been bitten by that before. Generally, you can create a bunch of garbage and return from your function. Even the returned value is now “someone else’s” problem.

Eric


#7

I just wanted to follow up on your observation that go blocks are callback functions owned by the channels, so they get cleaned up when the channels go out of scope.

I ran a little experiment to try to kill a go block by killing its channel. But it does not seem to work.

Is that because the GC hasn’t got around to removing it yet?

(require '[clojure.core.async :refer [go <! timeout]])
(defn go-repeatedly [f]
  (go
    (loop []
      (<! (timeout 1000))
      (f)
      (recur))))
(def output-chan (atom (go-repeatedly #(println "hello"))))

;; I'd expect this to cause the chan to be GC'd, and stop, but it doesn't.
(reset! output-chan nil)

#8

Actually, never mind. I suppose this thread is always running so it’s a GC root.

It seems like the only practical way to kill a go loop is to build it so that it is designed to be killable later, by including some conditional within the loop that checks for a closed channel, or for a flag variable.


#9

Yeah, that’s an infinite loop and so won’t ever stop.

And in general, it’s very difficult to tell when any object is actually collected. It requires a lot of faith in the GC.

I guess you could make a go loop that makes a bunch of garbage (something visible on a graph) and waits on a timeout for, say, 10 minutes. When the timeout fires, it could print something before finishing. Then you watch the graph? I’m not sure if that will work, because GC can always be deferred until you actually need the memory. Like I said, always hard.

Eric


#10

Great question, @alexisgallagher. I have yet to dig into claypoole, but that project’s README is helpful:

The project’s creator gave a talk at Clojure/West you might want to check out:
Beyond futures:

@ericnormand I think a discussion about core.async, futures, threads, and other options in Clojure for handling concurrency/parallelism could be a good deep-dive topic.

Probably tough to do in 10 minutes.