I think they reason they say that, is because of the way the supervision is structured due to being based on a process model.
The premise of the idea is that the error handling is moved out into a different process, the supervisor, and that handles how things should be dealt with. The other process is free to crash and burn because it lives in isolation from everything else, and shares no state with the outside world.
The supervisor is configured with a setting that tells it the dependencies of the processes it supervises, so it knows how to start from a fresh state.
One of the metaphors I have seen around this is if you are working in Microsoft Word and it starts acting up, you kill Word and restart it. If that is still fails you would restart all of Office, and then if still having issues, you would likely restart higher level groups of programs until you get to logging of your account, and then rebooting the whole machine.
As far as the Try Three Times, in Erlang, it is more of a try X times in X seconds, and then crash yourself and let the issue escalate.
That is not to say that this whole model can’t be done in other languages, but I would guess a lot of the reason that people would think it is tricky has to do more with the process model, then purely actor model, although the actor model helps due to the state being isolated away from everything else.
Clojure may have a better chance then some of the other languages with some of the threading models that it uses, and the fact that it (mostly) deals with immutable local data as well. The global mutable items like refs, atoms, and agents(?) would have to be avoided to ensure a clean state and the ability to retry.
I am guessing the catch is that the logic for these tasks to supervise and allowed to fail and be restarted would need to be pulled out into their own tasks so they could fail and be restarted cleanly without cratering the other system.
The other option that Clojure could give you is to do some aspect oriented, pre/post logic by building up a macro system that would allow you to abstract away the error handling that you would normally have to do and be able to manage the nesting of dependencies to allow for restarts.
I am thinking of some of the other side of the “use Erlang” argument is that it is not that it cannot be done in other languages, but that it is a large effort, and unless there will be a team of people dedicated to a project like that, it is better to try and use one of the languages that runs on the Erlang Runtime System for that part of the system, instead of trying to rewrite OTP all on your own for your single project.
Another good reference, in addition to Erlang and OTP in Action, that I have picked up that shows some of the intricacies of OTP, is Designing for Scalability with Erlang/OTP by Francesco Cesarini and Steve Vinoski from O’Reilly.
Happy to try and answer any other questions you may have about this.