How to extract data from Mailchimp API

Our favorite email marketing monkey. Mailchimp is the heart of our marketing campaigns. But it generates data and sometimes these data are important to be together with data from services like billing or social. How can you get them and then load them to a data warehouse like Amazon Redshift for further analysis?

About Mailchimp

Mailchimp helps businesses observe their subscribers’ activities, send automated emails to them based on their behavior and preferences, optimize and target the appropriate audience for each campaign using specific tools, and monitor sales and website activity with revenue reports. Companies can add content and collaborate on campaigns that fit their brand using MailChimp’s Email Designer; edit campaigns, collaborate with their teams using MailChimp Editor; send one-to-one messages using Mandrill; collect signups from their tablet using Chimpadeedoo; and access all the services via mobiles using MailChimp Mobile.

About Mailchimp API

[update 18/2] Mailchimp updated its API to v3, for more information visit MailChimp API v3.0 documentation.

MailChimp was always a promoter of APIs and encouraged integration with other systems. It has a rich API that exposes a large number of endpoints for interacting with the resources of the applications.

There are endpoints for the following resources:

  • Automations – offers functionality related to automated tasks that we define on MailChimp. For example it llets you build a series of triggered emails that are sent to subscribers over a set period of time.
  • Batch operations – for managing batch processes on our MailChimp account
  • Campaign folders – helps you organise your campaigns into folders
  • Campaigns – for managing your campaigns
  • Conversations – helps you view subscribers’ replies to your campaigns in your MailChimp account
  • File manager files – is a place to store images, documents, and other files you include or link to in your campaigns, templates, or signup forms.
  • File manager folders – for creating folders to organise your assets
  • Lists – CRUD operations on lists of users
  • Reports – for accessing reports with statistics on your campaigns. All Reports endpoints are read-only. MailChimp’s campaign and Automation reports analyze clicks, opens, subscribers’ social activity, e-commerce data, and more.
  • Template folders – operations on creating folders for organising your templates for your emails
  • Templates – operations on templates for your emails (A template is an HTML file used to create the layout and basic design for a campaign).

Where are the users?

The MailChimp API doesn’t have a root level resource for Users or Subscribers. Users are not considered a stand alone resource but instead they exist only inside the lists.

Note: It is important to note here that every service perceives the world from a different perspective, relevant to the value it offers. So even if what we care about is information about our users, it makes completely sense for a service like MailChimp to organise everything around lists or campaigns. If we check the model of another service, like Intercom, we’ll notice that the user is at the top resources, which again makes sense because Intercom is all about one to one communication with users.

Interact with Mailchimp API

Interacting with the MailChimp REST API can be done by using tools like CURL or Postman or by using http clients for your favourite language or framework. A few suggestions:

  • Apache HttpClient for Java
  • Spray-client for Scala
  • Hyper for Rust
  • Ruby rest-client
  • Python http-client

The MailChimp REST API supports oAuth 2.0 authentication, more information can be found in the Authorised apps section of the API documentation. After you successfully authenticate with the REST API, you have to start interacting with its resources and start fetching data from it in order to load them on your data warehouse.

Extract your user data from the MailChimp API

MailChimp and any other service that you might be using, has figured out (hopefully) the optimal model for its operations. But when we fetch data from them we usually want to answer questions or do things that are not part of the context that these services operate.

For this reason we should always keep in mind that when we work with data coming from external services we need to re-model it and bring it to the right form for our needs.

Let’s assume that we want to get all the information we got on MailChimp for our users in order to enrich our user records inside our data warehouse. To do that we need to do the following.

First we need to fetch all the lists that we have created on MailChimp, we can do this by performing a GET request to the appropriate endpoint.

curl –request GET \
–url ‘https://usX.api.mailchimp.com/3.0/lists’ \
–user ‘anystring:apikey’ \
—include

We should get back a response like the following:

{
“lists”: [
{
“id”: “57afe96172”,
“name”: “Freddie’s Jokes”,
“contact”: {
“company”: “MailChimp”,
“address1”: “675 Ponce De Leon Ave NE”,
“address2”: “Suite 5000”,
“city”: “Atlanta”,
“state”: “GA”,
“zip”: “30308”,
“country”: “US”,
“phone”: “”
},……

Using the ID we get from the response for each list, we iterate through all the lists and make requests to the appropriate end-points to get the members for list

curl –request GET \
–url ‘https://usX.api.mailchimp.com/3.0/lists/57afe96172/members’ \
–user ‘anystring:apikey’ \
—include

and we should get a response back like the following:

{
“members”: [
{
“id”: “f777bbffab8d1ceca8b757df63c47cb8”,
“email_address”: “urist.mcvankab+1@freddiesjokes.co”,
“unique_email_id”: “882e9bec19”,
“email_type”: “html”,
“status”: “subscribed”,
“status_if_new”: “”,
“merge_fields”: {
“FNAME”: “”,
“LNAME”: “”
},
“interests”: {
“9143cf3bd1”: true,
“3a2a927344”: false,
“f9c8f5f0ff”: false,
“f231b09abc”: true,
“bd6e66465f”: false
},…………

Keep in mind that a user might appear in more than one lists, which means that you also need to reduplicate your results based on this fact.

Schema & Load

After you extract all the information you need, you have to map it to the schema of your data warehouse repository and then load the data to it.

The design of the schema and loading of the data is of most importance as it can both affect the performance of your data warehouse cluster and the questions that you can answer. And of course this procedure differs in the case of Amazon Redshift or Azure SQL Data Warehouse or Google BigQuery.

Takeaways: Which option will work for you?

The way to proceed relies on the data you want to load and the requirements of your use case. Things can get even more complicated if you want to integrate data coming from different sources. You can:

[ol]

  • Write, host and maintain a flexible data infrastructure.
  • Use a product like Blendo that can handle this kind of problems automatically for you. With Blendo you can quickly and safely move all your data from Mailchimp into data warehouse solutions like Amazon Redshift and start generating insights from your data.
  • [/ol]