Author Archives: Michael Nutt

via NOAA's Geodesy Collection

Serving Files: S3 and High Availability

At Movable Ink we heavily use Amazon S3 for storing millions of files and serving them to hundreds of millions of users. It has a number of very compelling qualities: it has great performance characteristics and durability guarantees of a blistering eleven 9’s—they replicate our data in such a way that in theory there is 99.999999999% object retention.

However, durability and uptime are not one and the same, as many S3 customers found out when an internal configuration issue impacted services on Monday morning. The problem affected buckets in the US Standard S3 region, the most commonly used US S3 region.

We’re pretty conscious about potential single points of failure, and tend to have redundancy at multiple tiers: each layer is spread across multiple hosts which are interconnected at multiple points to the layers above and below it. This manifests as multiple load balancers, app servers, and availability zones, with the entire setup replicated across geographically separate datacenters thousands of miles apart. With all of that redundancy, of course we want our S3 serving to also be redundant.

S3 buckets are tied to a geographical location, and most correspond to one of Amazon’s datacenters. However, US Standard stores data on both the east coast and west coast. Given that it can be accessed from either coast, my first concern was around consistency: what would happen if you were to write data on one side and then immediately try to read it from the other? We tested it and it was oddly consistent, which seemed strange since it was serving from two different regions.

It turns out there is no replication happening. It actually only writes to the region of the endpoint you use while writing:

Amazon S3 automatically routes requests to facilities in Northern Virginia or the Pacific Northwest using network maps. Amazon S3 stores object data only in the facility that received the request.

Given this, we should really be treating US Standard as a single point of failure. So how can we make it redundant?

The strategy we take is to store data in different S3 regions, then come up with a way to point users and our backend services at whichever region is currently active. AWS actually has a couple of tools to facilitate the former. S3 supports file creation notifications to SNS or SQS, and you could set up AWS Lambda to automatically copy files to a different region. But even better than that, a few months ago Amazon released Cross-Region Replication to do exactly what we want. Setup is simple:

  • Turn on versioning on the source bucket. This comes at an extra cost since you pay for all previous versions of your files, but since we’ve already decided that this data is very important it’s worth it. After all, we’re talking about doubling our storage costs here.
  • Turn on cross-region replication. As part of the setup, you’ll create another versioned bucket in the destination datacenter and an IAM policy to allow data transfer between the two.
  • Do a one-time manual copy of all of your files from the source bucket to the destination bucket. Replication only copies files that are added or changed after replication is enabled. Make sure the permissions are identical.


Now every time we add a file to the source bucket, it is (eventually) replicated to the destination bucket. If all of our access is through our backend services, this may be good enough since failing over is a simple configuration change. But many of the references to our S3 buckets are buried in HTML documents or managed by third parties. How can we make it easy to switch between the buckets?

Our initial idea was to just set up a subdomain entry on a domain we control to CNAME to our S3 bucket, then do failover with DNS. S3 allows this, with one big caveat: your bucket must be named exactly the same as the domain. If you want to reference your S3 bucket as, your S3 bucket needs to be named Combined with S3’s restriction that every bucket name must be unique across regions, only one bucket can ever be referenced from so this doesn’t work.

Amazon has a CDN service, Cloudfront, which allows us to set an S3 bucket as an origin for our CDN distribution. We can then CNAME our subdomain to our Cloudfront distribution’s endpoint. In the event of a regional S3 failure, we can update Cloudfront to point to our backup S3 bucket. And you can either turn on caching and reap some latency benefits, or set the time-to-live cache setting to zero to act as a pass-through.

We would have preferred to set up two Cloudfront distributions and switch between them with DNS, but Amazon has similar restrictions disallowing two distributions from having the same CNAME. Still, this setup still lets us respond to an S3 outage in minutes, routing traffic to an unaffected region. In our tests, the failover can fully complete in between 5-10 minutes.

Building applications in the cloud means expecting failure, but it’s not always straightforward, especially when using third-party services like S3. Even with our final setup, it’s not completely clear what Cloudfront’s dependencies and failure modes are. But importantly, we control the DNS so we can implement our own fixes rather than waiting for Amazon.

If you’re interested in working on challenging problems like this, check out Movable Ink’s careers page.

– Michael Nutt, CTO

Real-Time Content and Re-Open Tracking Return to Gmail

Gmail image caching updateBack in December, Gmail made a major overhaul to the way it processes email. Images started being proxied through Gmail’s servers, which changed the way that images are cached. As a result, images could now be immediately displayed to recipients (no more “click here to view images from this sender”), but at the same time, marketers lost the ability to track email re-opens and the ability to serve relevant, real-time content upon re-open was limited.

What Happened at Gmail?

Image caching on the web and in email is controlled through headers, which are sent back with images. The header informs the web browser or email client how long and under which conditions the image should be re-used before making another request to the server. In its initial rollout in December, Gmail respected the caching headers sent from the original server, but always served images to the user with instructions to re-use the same image for 24 hours.

Due to the 24-hour caching header, web browsers would see real-time content initially but not on subsequent re-opens until one day later. Since open tracking also relies on images, the initial opens registered properly but re-opens were cached and could not be tracked. There were some reported workarounds for re-open tracking, but they involved sending malformed data to the Gmail proxy and were not guaranteed to work.

The Return of Real-Time Content and Re-Open Tracking

Last week, the Movable Ink team noticed that Gmail had begun deploying updates to address the issues caused by its 24-hour caching. The cache still exists, but it is now overridable if you pass a no-cache header (example below).

Content-Type: image/png
Cache-Control: no-cache, max-age=0

This means that re-open tracking now works as it did before December’s changes. In addition, these changes fix a long-standing issue of Gmail temporarily caching entire emails when navigating between emails in Gmail. When using Movable Ink, these updates mean that whenever you open an email in Gmail, you can be sure that you are seeing the most up-to-date, real-time content on every open and re-open.

Gmail’s Recent Image Handling Changes

Gmail image caching changes

UPDATE: Since this post was originally published, real-time content and re-open tracking have returned to Gmail. Learn more in our latest post on how Gmail handles images.

Last week, Gmail implemented changes that impact the way the email service renders images that will impact real-time content for a segment of Gmail users.

Below, we hope to clarify the Gmail changes, summarize their impact, and share what actions Movable Ink has taken and is continuing to pursue to address any concerns.

1. What changes were made in Gmail, and what is the impact to Movable Ink?
Traditionally, when a recipient views an email, images are downloaded from the server that hosts the images. This allows information to be communicated back to the image’s host source—such as the user’s current location, device, and time of day.

a.) Gmail is now requesting all images from proxy servers (, which incorrectly situates users in its headquarters in Mountain View, California when images are downloaded. This impacts the ability to geo-target image content for those Gmail users who are affected by the changes. (Note: Local Maps using zip codes appended as query parameters are unaffected.)

b.) Gmail is stripping the user-agent headers from the client request, which eliminates the ability to determine the Gmail user’s device and target image content appropriately.

c.) Gmail is removing the cache-control headers from the responses, which forces the user’s images to be stored in their browser’s cache for up to a day. This only impacts live image content if a Gmail user re-opens the email after the first open.

In summary, a limited set of Movable Ink features will not work within a segment of Gmail accounts and, in those cases, will be replaced with default content.

2. What email users are affected by the changes? How big is the impact to my list?
After analyzing our data since the changes were implemented late last week, 2% – 5% of the average enterprise B2C email marketer’s subscriber list is affected by Gmail’s changes, since they only affect recipients that open emails through the desktop client, the Android Gmail app, and the iOS Gmail app.

Not all Gmail users are impacted.

The changes have no impact on Gmail users who access their accounts through Mac Mail, the native Mail app on iOS devices, non-Gmail Android apps, non-Gmail Windows apps, Gmail via Outlook, etc. Additionally, all email domains that are not are not impacted. (Update: As of 12/12, Gmail has rolled out the changes to custom domains as well.)

More Gmail recipients open email on iOS devices (iPhones and iPads) than through any other email service — including web-based Gmail itself, which greatly mitigates the impact of the changes, and is the reason why they only affect 2% – 5% of most email marketers’ subscribers.

Below is a summary of who is affected by the changes:

Gmail Image Caching Impact

3. How is Movable Ink responding to the affected features?
a.) Geo-targeting: We have made it possible for marketers to show default content to users that have images hosted within the Gmail proxy domain. This eliminates any concerns about displaying incorrectly geo-targeted content when a user is falsely identified as being in Mountain View, California.

b.) Device targeting: If a user’s device cannot be detected for any reason, a default version of an email will be rendered and is configurable within the Movable Ink dashboard.

c.) All other real-time content: Other types of real-time content such as countdown timers, social feeds, web crops, and video will appear as intended on the first open of an email. Subsequent opens from an individual recipient will display the original image due to Google’s caching which can last for up to a day.

Our team is in contact with representatives at Google to recommend and discuss alternatives to last week’s changes. We will be sure to share updates as we have more information. If you have any questions in the meantime, please do not hesitate to reach out to us at