Changing data types


I spend a lot of my time dealing with data in different formats which I need to be able to convert into a different format, (eg converting a list of maps into a map of maps) so that I can do something with it. A topic on changing between different data formats would be really useful.

Please vote. Comments are encouraged!

  • :thumbsup: Yes, please teach this!
  • :thumbsdown: No, I’m not interested.

0 voters


Great topic! Data transformation is huge! So important and so many things to know.

I have plenty to say but I want to make sure I’m hitting the things you’re having trouble with. It would help me if you had a couple of example problems you’re dealing with. Otherwise, I’ll just talk about what I think is important.

So, if you (@draven72) or anyone else has some specific things you want out of this topic, please reply!


Hi Eric,

Here are some examples of data structures I’ve been dealing with recently from the Google Analytics and Facebook APIs:

Google Analytics

{:total-results 5, :columns
[{“columnType” “DIMENSION”, “dataType” “STRING”, “name” “ga:SourceMedium”}
{“columnType” “METRIC”, “dataType” “INTEGER”, “name” “ga:sessions”}],
:sampled? false,
:records ({:name “ga:SourceMedium”, :column-type “DIMENSION”, :value “(direct) / (none)”}
{:name “ga:sessions”, :column-type “METRIC”, :value 2})
({:name “ga:SourceMedium”, :column-type “DIMENSION”, :value “ask / organic”}
{:name “ga:sessions”, :column-type “METRIC”, :value 1})
({:name “ga:SourceMedium”, :column-type “DIMENSION”, :value “bing / organic”}
{:name “ga:sessions”, :column-type “METRIC”, :value 1})
({:name “ga:SourceMedium”, :column-type “DIMENSION”, :value “google / organic”}
{:name “ga:sessions”, :column-type “METRIC”, :value 19})
({:name “ga:SourceMedium”, :column-type “DIMENSION”, :value “yahoo / organic”}
{:name “ga:sessions”, :column-type “METRIC”, :value 2})}


({:name “post_story_adds_unique”, :values [{:value 3}], :id “12345_67890/insights/post_story_adds_unique/lifetime”} {:name “post_story_adds”, :values [{:value 3}], :id “12345_67890/insights/post_story_adds/lifetime”} {:name “post_impressions_by_paid_non_paid”, :values [{:value {:total 2624, :unpaid 2624, :paid 0}}], :id “12345_67890/insights/post_impressions_by_paid_non_paid/lifetime”} {:name “post_video_length”, :values [{:value{}}], :id “12345_67890/insights/post_video_length/lifetime”} {:name “post_video_avg_time_watched”, :values [{:value 0}], :id “12345_67890/insights/post_video_avg_time_watched/lifetime”} {:name “post_consumptions_unique”, :values [{:value 29}], :id “12345_67890/insights/post_consumptions_unique/lifetime”} {:name “post_consumptions_by_type”, :values [{:value {:other clicks 6, :link clicks 27}}], :id “12345_67890/insights/post_consumptions_by_type/lifetime”} {:name “post_negative_feedback_unique”, :values [{:value 2}], :id “12345_67890/insights/post_negative_feedback_unique/lifetime”})

The Facebook response is really unpleasant to work with as they include spaces in some of their keys, and if a key hasn’t had any clicks it’s returned as an empty map or zero!

What I’ve been working on doing is pulling out the data into a flat file format so that it can be exported as a csv and then imported into a database. Usually I try and get the data into a map of maps, and then map Juxt over the data. For large amounts of data I use Spark, so will then split the data and save it as a text file. For smaller data sets I’m using to save the file as a csv.




Hi @draven72,

This is a really good question and I haven’t had a chance to answer it to my satisfaction yet. I thought I would but I haven’t.

One pattern I use when I’m cleaning up other people’s data is to make an explicit cleanup function that just normalizes everything. I just keep adding to it until everything is clean.

(defn clean-row [row]
  (-> row

Then you can map it over all of your rows and only deal with the cleaned up rows.


Thanks Eric, that’s a good idea. I’m working on building a data pipeline for various different sources, so this will come in useful. I’ll also be revisiting the Protocol videos, as I’ve already seen that each source appears to be using a different date format!