/ Engineering

Neo4j on GCP

Current is a financial technology company that has developed a platform, comprised of a debit card and app, that connects teenagers with the people, brands and experiences they value. As the Systems Architect at Current, my responsibility is to ensure that we maintain a reliable and high-quality experience for our users.

Note: We are looking for software engineers to join our team @Current.
Learn more and apply.

We rely on a Neo4j Graph Database to store and expose relationships among our users, their family members, and financial instruments such as their debit cards and their connected banks. Storing and exposing this data via a graph facilitates quick and low-cost retrieval of data. Traversing the graph is very similar to traversing an index. As an index exposes a graph in the form of a B+ tree, it give us an ideal entry point to retrieve data in log(n) time: we query a collection of Nodes by key and then successfully traverse one or more relationships to find what we’re looking for.

Neo4j has proven to be a huge win for our backend development velocity because it encapsulates complex pattern matching and validation in small, easily understood queries.

In our initial version, we built our Neo4j instance locally on our API server, which made it difficult to measure our cost in terms of required CPU time and memory footprint. We also lacked a way to robustly log and profile the database. As our user growth accelerated, we started noticing CPU bottlenecks on the API box during periods of high activity.

In order to address those issues, we prioritized breaking our Neo4j implementation out from our API instance. One of our first thoughts was to migrate our data to a hosted Neo4j solution. After evaluating a few competing hosted solutions we concluded that we could not find an affordable option that offered both a good feature set and a good engineering user-experience.

We evaluated our existing data footprint and calculated our needs with guidelines specified by the Neo4j Operations Manual. We concluded that we would require our running instances to be provisioned with at least 30GB of Memory. To achieve the necessary instance footprint on GCP, it was estimated to cost us about $600 USD, where as a comparable footprint on GrapheneDB would cost us double that, and wouldn’t include multi-AZ HA deployments (which is only included in the “Call Us” Enterprise tier of Graphene).

After putting our heads together as an engineering team, we agreed that it was best to move forward by building our own Highly Available Neo4j Cluster. Using Google Cloud Platform’s tools, we were able to build our own comparable solution for half the cost of hosted solutions.

The Road to High Availability

To achieve high availability, our Neo4j instance footprint was devised to include 3 instances. 1 master, 1 slave and 1 arbiter. Very similar to a MongoDB replica-set, master replicates to slave while the arbiter stands-by to vote at a tied-failover-election.

Neo4jCluster

Included in this cluster are the following:

3 instances, each hosted at a different availability zone from each other.

  • 2 large instances, 1 master and 1 slave.
  • 1 small instance to host the arbiter - minimizing cost.
    Load Balancing that is internal to our private network.

At this point, we know what the cluster is going to look like. Still, the following questions remain:

  1. How will our services reach Neo4j?
  2. Is failover transparent to our users?
  3. How will we monitor the performance of our database?
  4. How will we be alerted if the one or more hosts become unavailable?
  5. How do we build this highly available cluster?

Hosting Neo4j / Building

What was needed?

  • Quick means to build an instance from scratch
  • Recover an instance quickly in case of failure.
  • Upgrade the Neo4j Software and Java easily.
  • Persistent software configuration across builds.
  • Instances provisioned to meet the needs of our existing data-set and query patterns
  • Keep costs minimal.

The moving pieces

  • Google Storage
  • Gcloud compute/instance api

We stuck close to home by leveraging GCP’s native technology to do most of the heavy lifting.

Google Storage was utilized to stage the software that would end-up on a Neo4j instance. A Google Storage bucket now exists to host gzipped packages of JDK (Java) 8 and Neo4j-3.3.1. We’re also using Google Storage to host a templated Neo4j Configuration file.

The following had to happen to build an instance that would host Neo4j:

Neo4jBuildSteps

We were able to accomplish the above steps by utilizing the gcloud command-line-interface to drive the compute api.

For example:

gcloud compute instances create \
  $name \
  --project $project_name \
  --scopes cloud-platform \
  --boot-disk-size $disk_size_in_gb \
  --disk "name=${diskname},device-name=${disk_name},auto-delete=yes" \
  --zone $zone \
  --metadata "mode=${mode},pagecache_size=${pagecache_size},heap_size=${heap_size},bootstrap=${bootstrap}" \
  --machine-type ${machine_type} \
  --metadata-from-file "startup-script=setup-neo4j.sh" \
  --no-address \
  --tags no-ip \
  --service-account "${service_account}@${project_name}.iam.gserviceaccount.com"

The above was driven by the following to build the master, slave and arbiter.

./create-neo4j-instance.sh us-east1-d $env SINGLE neo4j-node1 bootstrap &
./create-neo4j-instance.sh us-east1-b $env HA neo4j-node2 bootstrap &
./create-neo4j-instance.sh us-central1-c $env ARBITER neo4j-node3 bootstrap &

Attributes such as instance size would be driven by environment spec.

case $env in
"stage")
  GS_BUCKET="..."; # A google-storage bucket to store configuration
  GS_BUCKET_EXT="..."; # A gs bucket that store neo4j and java
  project_name="..."; # GCP Project ID for stage environment
  machine_type=${small_instance_type};
  pagecache_size=${small_page_cache_size};
  heap_size="1g";
  ;;  
"production")
  GS_BUCKET="...";  
  GS_BUCKET_EXT="...";
  project_name="...";
  machine_type=${larger_instance_type};  # Larger instance for production
  pagecache_size=${larger_page_cache_size};
  heap_size=${larger_page_cache_size};
  ;;  
*)
  _help;
  ;;  
esac

HAProxy on Kubernetes

The Neo4j Operations Manual had the answers to all of the above questions in raw form. To address question 1, Neo4j recommends the use of HAProxy to dispatch reads and writes between master and slave as appropriate. This solution did not seem intuitive at first sight. Why should we build an entire compute instance to host a load balancer? What could we accomplished by other means such as GCP’s Cloud Load Balancing?

What was needed?
A means to forward client connections to the currently available master

The moving pieces

  • A Docker Image: rafpe/docker-haproxy-rsyslog
  • haproxy.cfg
  • kubernetes

GCP’s Load Balancing requires the target instances to be within an Instance Group. Instance Groups that cover multiple availability zones are not eligible to be exposed via an internal LB (whether TCP or HTTP/S). Having instances of different sizes makes our Instance Group an Unmanaged Instance Group, hindering us from fulfilling the above mentioned requirements.

Going back to the drawing board, I started experimenting with the Google Kubernetes Engine which is best known as GKE. GKE exposes control of a Kubernetes Clusters as a means to host services via Docker. After some research, I found that HAProxy is packaged as a Docker image on docker hub. Even better, I found an image that exposes logging.

Very quickly, I was able to draft a Dockerfile that pulls the HAProxy image and copies my custom HAProxy configuration onto the container at build time. The newly built image is pushed onto GCR (Google Container Registry) and is now running on a Kubernetes Cluster that exist within the confines of Current’s GCP Project. Through Kubernetes, I exposed my instance of HAProxy, as it exists in its cocoon, via an Internal IP. I then configured our services to reference this IP as our gateway to Neo4j.

NEo4jHACluster

At this point, HAProxy is polling Neo4j’s http routes on each instance to ascertain which is slave and which is master at any given time. Also, HAProxy is ultimately made aware of which hosts are available. Depending on the results of http polling, we have the means to configure the forwarding TCP traffic such as queries sent via the bolt interface.

After solving the problem of proxying the connection, it only seemed natural to utilize GKE for monitoring and maintaining our long term data retention policy (perform backups).

The Neo4j Monitoring Agent

What was needed?
To be alerted when one or more of the follow events happen:

  • When 1 or more Neo4j Hosts become unavailable.
  • When a failover happens.

The moving pieces

  • PagerDuty
  • Neo4j Master
  • Neo4j Slave

We use PagerDuty for most of our alerting at Current. As we did with our implementation of HAProxy, the following endpoints were used to ascertain the ongoing state of each node.

/db/manage/server/ha/master
/db/manage/server/ha/slave
/db/manage/server/ha/available

Polling of the above endpoints on each host is done on a set interval of 10 seconds. If the current state of ../master doesn’t match that of the previous state, we know that a failover occurred. If ../available is not reachable on a given host, well, we know that the given host is not reachable. Upon either case, we send a nicely written message to PagerDuty. We wrote a small client library that handles the http request to PagerDuty’s API.

setInterval(availCheck, config.timers.avail);

function availCheck () {
  config.neo4j.forEach((host) => {
	request
  	.get(host.url + '/db/manage/server/ha/available')
  	.then((res) => {
    	/**
     	* Check global role object
     	* ie. Who was and who is master?
     	*/
    	if (
      	role[host.url] !== 'bootstrap' &&
      	res.text !== role[host.url]
    	) {
      	let msg = 'Neo4j Failover occured! ' +
        	`${host.url} was ${role[host.url]}` +
        	` : ${host.url} now is ${res.text}`;
      	console.log(msg);

      	// Send alert via PagerDuty
      	pd.send('neo4j-cluster-monitor' + msg);
    	}
    	/**
     	* Set the current role of
     	*   the host.
     	* ie. role['neo4j-node1'] = 'master'
     	*/
    	role[host.url] = res.text;
    	debug(role);
  	})  
  	.catch((e) => {
    	let msg = ('Neo4j host', host.url, 'is unavailable!', e.toString());

    	// Send alert message to Pager Duty;
    	pd.send(msg);

The Neo4j Backup Agent

What was needed?
A means to run recurring backups of our data hosted on Neo4j instances.

The moving pieces

  • Node.JS client library for gcloud api - @google-cloud/compute
  • GCP Disk Snapshots

Very similar to the Neo4j Monitoring Agent, we’re invoking an http get request on each host to ascertain who’s who on the cluster. In this case, we wanna find where the slave is on the cluster. Performing a snapshot on an attached disk on any host will risk an impedance in I/O performance or a restart of the instance. Thus, it’s best if we perform the snapshot on the slave. We chose to invoke the snapshots on an hourly interval.

// kick-off the first invocation!
main(config.neo4j);

// successive invocations of main happen on configured interval
setInterval(main, config.timers.backupFreq, config.neo4j);

async function main (hostObjects) {
  try {
	// get the list of available slaves
	let slaves = await getSlaves(hostObjects);

	// pick the first slave
	let instanceName = slaves[0].split('.')[0];

	let params = {
  	instanceName: instanceName,
  	deviceName: config.deviceName,
  	snapshotName: 'neo4j-snapshot-' + new Date().getTime()
	};  

	// signs of life!
	console.log(
  	new Date().toISOString(),
  	'creating snapshot with params:',
  	JSON.stringify(params, null, 2)
	);  

	// create the snapshot
	let a = await createSnapshot(params);
	console.log('snapshot id', a[2].id);
  } catch (e) {
	console.error(e.toString());
  }
}

Conclusion

Not only were we able to build a high-performant and highly available database cluster, we were also able to leverage the GCP toolchain to build the other essentials such as a monitoring agent and a backup agent. Moreover, we were able to build this at a fraction of the cost of a hosted solution. Please feel free to check out these code snippets and other utilities we’ve built on our Public GitHub Space, OpenCurrent.