Introducing Objstore Cluster, a Multi-Master Distributed Caching Layer for Amazon S3

13 Mar 2015

Amazon Simple Storage Service, or Amazon S3, is a cloud storage platform that gives developers the ability to securely collect, store, and analyze their data with almost unlimited scalability. It has proven to be fast and reliable — a PaaS solution that acts like a backbone for many business applications — and gives customers the flexibility of optimizing their cost based off the amount of storage they’re using.

“S3 is great — it’s no wonder why it’s widely used across every industry,” said Sphere Software Senior Go Engineer Maxim Kupriianov. “However, the cost of service for S3 may become too high depending on your usage patterns. For example, if your application runs in your own datacenter, than the file transfer costs will skyrocket. Also request frequency has its limits.”

This sparked Maxim’s idea of an objstore cluster — an easy to use, self-organizing multi-master caching layer for various cloud storage backends like S3.

Introducing Objstore Cluster

Written in Go, an objstore cluster combines the functionality of a simple object storage with the added robustness of cross-node journal synchronization, object replication, and cluster auto-discovery. This tool aims to mitigate the problems of file transfer costs by simply running in your datacenter — implementing a near-cache for all files.

The cluster’s API allows to upload, head, read, and delete files by key — like any other object. All related meta-data may be preserved with files as well. This caching layer will upload the file to S3 and store a copy locally, with optional replication among other nodes. The next time you’d like to access a file, it will be served from a local machine or its near nodes. In the event of a cache miss, it will get the file from S3 directly. This is the basic functionality of objstore cluster.

Using Objstore Cluster

A robust cluster will yield the best results, although it’s not required to reach the same levels as traditional databases — or other stores that are required to be highly consistent. A certain amount of fault resilience is important because a dropped cache implies a huge (and unplanned) spike in latency and CoS, which may hurt infrastructure and your wallet. In addition, caches may recover very slowly.

Objstore leverages a P2P discovery mechanism, so once some nodes are started already, another might join knowing only one physical IP address. The cluster sets up a logical network over persistent TCP connections between nodes, and uses an internal HTTP API to share events and data between nodes. This eliminates a single point of failure. Everything involves zero configuration — except the HTTP load balancer, which may be any of your choice.

Node disk sizes are required to be identical because the overall limit of the cluster is limited by size of the smallest disk used for data replication. If you want to expand the size linearly, setup another objstore cluster and tweak your HTTP load balancer.

Installation, Server Usage, Client Usage, and More

To see how to install objstore cluster, its server usage, client usage, and more, visit Sphere Software’s Github/objstore. You’ll be provided with a more detailed breakdown of objstore cluster’s functionality and how to implement it. Objstore cluster is currently in open beta stage, so be sure to test!