Dealing with concurrency


#1

Hi

I’m trying to output a ‘success’ file when a job completes, but I’m having an issue.

My main calls 2 functions, the first one calls a function that makes an API call to get a list of account numbers, and then fires off an API call for each one, process the data and then writes each file to disc. The second function copies an empty txt file called ‘success’ to the folder. What appears to be happening is as soon as the first function is run, the second function is executed and the success file is written and the job ends. I’ve experimented with using delay and calling force on the 2nd function, but this is still being called as soon as the first function is called.

The reason I need the success file is that the files are being written to S3 and our DBA needs to set up an event to automatically load the data into Redshift, but he needs a trigger, and as I don’t know what order the account numbers are returned in, or when new accounts are created, I need a static file that I know will always be the last file written.

As the number of API calls being made and the response times can vary, I don’t want to rely on a timed method. Is there a way of achieving this where the second function isn’t executed until all the dependent functions of the first one have completed?

Thanks

Ben


#2

Hi @draven72,

Thanks for the question. Let me see if I can help you.

It seems like you’re not blocking where you should be. What library are you using to do the API calls? Is it asynchronous? Are you calling them from a different thread? I’d love to see some code, if possible.

Are the API calls going through? Are you sure they are being called? Without seeing the code, these are my two top guesses:

  1. Laziness is coming into play and the API calls are not actually being made.
  2. You’re using non-blocking network requests.

Show me some code and I’ll help you more.

Rock on!
Eric


#3

Hi Eric

The API calls are going if I call them on their own, the issue arises if I try and add the second function to write the file.

I managed to get some help with the code, and it looks like it was due to using a for loop, this is the code I was using:

(defn get-accounts [token business]
(for [account-id (ad-list business token)]
(adinsights token account-id “test”)))

(defn -main [token business]
(get-accounts token business)
(write-file-to-s3 “_SUCCESS” “adinsights” “_SUCCESS” “test”))

Which was getting a list of account numbers back from the Facebook API, parsing the result and returning a vector of account numbers that were then being sent through individually to another API call to get back the stats for each account, writing them to a temp folder, copying them to S3 and then deleting all the temp files when the job ends.

I was advised to change the get-accounts function to:

(defn get-accounts [token business]
(run! #(adinsights token % “test”)
(ad-list business token)))

Now write-file-to-s3 only runs once the get-accounts function has completed.

Regards

Ben


#4

Congrats on figuring it out.

Yep, that looks like classic laziness. for isn’t actually a loop. It’s a list comprehension that happens to be lazy.

Rock on!
Eric