When are Clojure's concurrency primitives useful (vs a database)?


#1

I have a fairly general question: in practice, when are Clojure’s concurrency primitives useful as opposed to a database? In what kinds of applications and contexts does an app need to correctly handle concurrent access to data, but not also need to persist that data to disk?

I’m asking this because I’m noticing a mistake I’ve made twice, and trying to reflect on the lesson of it. What I’ve done is, first, carefully implement a webapp that uses Clojure’s own collection structures to manage data, therefore using Clojure’s concurrency reference types to manage multithreaded access. Then, second, I realize that I really want the app to persist the data, so that it will survive process exits, server reboots, etc…

At this point, I realize that I need a database instead of in-memory data structures. And then I see that the care I’ve taken in using Clojure’s concurrency primitives, like the transform function defined to work with swap!, or the code in a dosync block, was wasted work. Because if I’m moving the data into a database, then I will need to let the database manage the synchronization of access. So I will need to rewrite my Clojure transform function into a Postgres stored procedure, or a datomic database function, or whatever.

Given that I’ve made this error twice, one lesson I take from it is I suppose fairly obvious in retrospect: if it’s remotely likely I’m going to want to make the storage durable in the future, then I should just architect to use a database from the beginning. I could even use an in-memory database in the beginning and then configure it to hit the disk later, only when that’s relevant. It occurs to me, however, that taking this lesson to heart amounts to doing away with a significant part of part of what makes Clojure distinctive and basically adopting the strategies that apps in other languages use: punt on data synchronization issues by saving everything in a database and relying on the database to handle them.

So that’s why I’m wondering, just as a practical matter, what kinds of apps, or what parts of apps, most often have requirements to handle data concurrently but not persist it?

I’d love to hear about people’s experiences on this.


#2

Hey Alexis,

Great question!

I generally agree with you. Sometimes I regret not using a database earlier. And I also think most apps use databases a little too much. For instance, languages that aren’t good at threads often use Redis to make queues when on the JVM they’re easily done in memory.

I hope you are optimistic about what happened with your changing requirements. Refs and atoms are much faster to iterate on. By starting with refs, you may have saved yourself more time than you’re losing by rewriting stuff in SQL. Also, I recommend this article http://www.brandonbloom.name/blog/2013/06/26/slurp-and-spit/ about how far you can get by saving an atom to a file asynchronously. Of course all of this depends on your environment.

The short answer to your question (“When are concurrency primitives better than a database?”) is when you want to work in shared memory. This is often for speed or access to the host language.

More specifically, I’ve used atoms before for storing caches to stuff in the database. People pay for expensive database servers when they can just live with their data always being 5 seconds old. When you want threads to collaborate on a calculation, you can’t beat in-memory stuff.

Now, if you’re doing a traditional CRUD web app, you might not find much use for them.

Rock on!
Eric


#3

Hi,

Glad to hear I’m not nuts, and others have gone on this run around the block as well.

It’s funny you should mention Brandon’s article. I followed the link, read it, and thought, “This is awesome. Let me bookmark it!” So I hit “pinit” to save the bookmark on pinboard, and then I find this note I left for myself in pinboard in 2013, when I made this mistake the first time:

article on fs-based persistence.
good in combo with https://github.com/alandipert/enduro

So it seems like I’m a quick learner but also sort of a quick forgetter too. :confused:

In fairness, I did remember and consider using enduro this time around but I assumed (maybe wrongly) that it wouldn’t work well for the total amount of data I’m storing, which may need to expand into many megabytes.

I then considered using, instead of one durable atom containing a map, an array of durable atoms synced to a collection of files in a directory. But that felt somehow awkward, I think because I’m not used to seeing arrays of atoms in Clojure. But now I think I probably should have questioned that feeling a bit more. If I want transaction isolation at the per-record level, rather than at the per-database level, than an array of atoms is actually more appropriate than one atom holding a map of all the records. Hmm. I suppose I could add to my own pinboard note on this, but would I remember to read it? :slight_smile:

But fundamentally, it sounds like you’re saying Clojure’s lovely concurrency-aware reference constructs are probably irrelevant for a plain old CRUD app, as such an app will end up needing a database anyway. And then the most likely case when they are relevant is if you want to write your own cache. This all sounds very plausible to me, but also rather disappointing.

What good are super-powers if you’re not doing superhero work?

Clearlly, I need to find harder problems to work on.

cheers!
Alexis