Blog: How to Handle Data Duplication in Data-Heavy Kubernetes Environments

Blog: How to Handle Data Duplication in Data-Heavy Kubernetes Environments

Authors:
Augustinas Stirbis (CAST AI)

Why Duplicate Data?

It’s convenient to create a copy of your application with a copy of its state for each team.
For example, you might want a separate database copy to test some significant schema changes
or develop other disruptive operations like bulk insert/delete/update…

Duplicating data takes a lot of time. That’s because you need first to download
all the data from a source block storage provider to compute and then send
it back to a storage provider again. There’s a lot of network traffic and CPU/RAM used in this process.
Hardware acceleration by offloading certain expensive operations to dedicated hardware is
always a huge performance boost. It reduces the time required to complete an operation by orders
of magnitude.

Volume Snapshots to the rescue

Kubernetes introduced VolumeSnapshots as alpha in 1.12,
beta in 1.17, and the Generally Available version in 1.20.
VolumeSnapshots use specialized APIs from storage providers to duplicate volume of data.

Since data is already in the same storage device (array of devices), duplicating data is usually
a metadata operation for storage providers with local snapshots (majority of on-premise storage providers).
All you need to do is point a new disk to an immutable snapshot and only
save deltas (or let it do a full-disk copy). As an operation that is inside the storage back-end,
it’s much quicker and usually doesn’t involve sending traffic over the network.
Public Clouds storage providers under the hood work a bit differently. They save snapshots
to Object Storage and then copy back from Object storage to Block storage when “duplicating” disk.
Technically there is a lot of Compute and network resources spent on Cloud providers side,
but from Kubernetes user perspective VolumeSnapshots work the same way whether is it local or
remote snapshot storage provider and no Compute and Network resources are involved in this operation.

Sounds like we have our solution, right?

Actually, VolumeSnapshots are namespaced, and Kubernetes protects namespaced data from
being shared between tenants (Namespaces). This Kubernetes limitation is a conscious design
decision so that a Pod running in a different namespace can’t mount another application’s
PersistentVolumeClaim (PVC).

One way around it would be to create multiple volumes with duplicate data in one namespace.
However, you could easily reference the wrong copy.

So the idea is to separate teams/initiatives by namespaces to avoid that and generally
limit access to the production namespace.

Solution? Creating a Golden Snapshot externally

Another way around this design limitation is to create Snapshot externally (not through Kubernetes).
This is also called pre-provisioning a snapshot manually. Next, I will import it
as a multi-tenant golden snapshot that can be used for many namespaces. Below illustration will be
for AWS EBS (Elastic Block Storage) and GCE PD (Persistent Disk) services.

High-level plan for preparing the Golden Snapshot

  1. Identify Disk (EBS/Persistent Disk) that you want to clone with data in the cloud provider
  2. Make a Disk Snapshot (in cloud provider console)
  3. Get Disk Snapshot ID

High-level plan for cloning data for each team

  1. Create Namespace “sandbox01”
  2. Import Disk Snapshot (ID) as VolumeSnapshotContent to Kubernetes
  3. Create VolumeSnapshot in the Namespace “sandbox01” mapped to VolumeSnapshotContent
  4. Create the PersistentVolumeClaim from VolumeSnapshot
  5. Install Deployment or StatefulSet with PVC

Step 1: Identify Disk

First, you need to identify your golden source. In my case, it’s a PostgreSQL database
on PersistentVolumeClaim “postgres-pv-claim” in the “production” namespace.

kubectl -n <namespace> get pvc <pvc-name> -o jsonpath='{.spec.volumeName}'

The output will look similar to:

pvc-3096b3ba-38b6-4fd1-a42f-ec99176ed0d90

Step 2: Prepare your golden source

You need to do this once or every time you want to refresh your golden data.

Make a Disk Snapshot

Go to AWS EC2 or GCP Compute Engine console and search for an EBS volume
(on AWS) or Persistent Disk (on GCP), that has a label matching the last output.
In this case I saw: pvc-3096b3ba-38b6-4fd1-a42f-ec99176ed0d9.

Click on Create snapshot and give it a name. You can do it in Console manually,
in AWS CloudShell / Google Cloud Shell, or in the terminal. To create a snapshot in the
terminal you must have the AWS CLI tool (aws) or Google’s CLI (gcloud)
installed and configured.

Here’s the command to create snapshot on GCP:

gcloud compute disks snapshot <cloud-disk-id> --project=<gcp-project-id> --snapshot-names=<set-new-snapshot-name> --zone=<availability-zone> --storage-location=<region>
Screenshot of a terminal showing volume snapshot creation on GCP

GCP snapshot creation

GCP identifies the disk by its PVC name, so it’s direct mapping. In AWS, you need to
find volume by the CSIVolumeName AWS tag with PVC name value first that will be used for snapshot creation.

Screenshot of AWS web console, showing EBS volume identification

Identify disk ID on AWS

Mark done Volume (volume-id) vol-00c7ecd873c6fb3ec and ether create EBS snapshot in AWS Console, or use aws cli.

aws ec2 create-snapshot --volume-id '<volume-id>' --description '<set-new-snapshot-name>' --tag-specifications 'ResourceType=snapshot'

Step 3: Get your Disk Snapshot ID

In AWS, the command above will output something similar to:

"SnapshotId": "snap-09ed24a70bc19bbe4"

If you’re using the GCP cloud, you can get the snapshot ID from the gcloud command by querying for the snapshot’s given name:

gcloud compute snapshots --project=<gcp-project-id> describe <new-snapshot-name> | grep id:

You should get similar output to:

id: 6645363163809389170

Step 4: Create a development environment for each team

Now I have my Golden Snapshot, which is immutable data. Each team will get a copy
of this data, and team members can modify it as they see fit, given that a new EBS/persistent
disk will be created for each team.

Below I will define a manifest for each namespace. To save time, you can replace
the namespace name (such as changing “sandbox01” → “sandbox42”) using tools
such as sed or yq, with Kubernetes-aware templating tools like
Kustomize,
or using variable substitution in a CI/CD pipeline.

Here’s an example manifest:

---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
 name: postgresql-orders-db-sandbox01
 namespace: sandbox01
spec:
 deletionPolicy: Retain
 driver: pd.csi.storage.gke.io
 source:
   snapshotHandle: 'gcp/projects/staging-eu-castai-vt5hy2/global/snapshots/6645363163809389170'
 volumeSnapshotRef:
   kind: VolumeSnapshot
   name: postgresql-orders-db-snap
   namespace: sandbox01
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
 name: postgresql-orders-db-snap
 namespace: sandbox01
spec:
 source:
   volumeSnapshotContentName: postgresql-orders-db-sandbox01

In Kubernetes, VolumeSnapshotContent (VSC) objects are not namespaced.
However, I need a separate VSC for each different namespace to use, so the
metadata.name of each VSC must also be different. To make that straightfoward,
I used the target namespace as part of the name.

Now it’s time to replace the driver field with the CSI (Container Storage Interface) driver
installed in your K8s cluster. Major cloud providers have CSI driver for block storage that
support VolumeSnapshots but quite often CSI drivers are not installed by default, consult
with your Kubernetes provider.

That manifest above defines a VSC that works on GCP.
On AWS, driver and SnashotHandle values might look like:

  driver: ebs.csi.aws.com
  source:
    snapshotHandle: "snap-07ff83d328c981c98"

At this point, I need to use the Retain policy, so that the CSI driver doesn’t try to
delete my manually created EBS disk snapshot.

For GCP, you will have to build this string by hand – add a full project ID and snapshot ID.
For AWS, it’s just a plain snapshot ID.

VSC also requires specifying which VolumeSnapshot (VS) will use it, so VSC and VS are
referencing each other.

Now I can create PersistentVolumeClaim from VS above. It’s important to set this first:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: postgres-pv-claim
 namespace: sandbox01
spec:
 dataSource:
   kind: VolumeSnapshot
   name: postgresql-orders-db-snap
   apiGroup: snapshot.storage.k8s.io
 accessModes:
   - ReadWriteOnce
 resources:
   requests:
     storage: 21Gi

If default StorageClass has WaitForFirstConsumer policy,
then the actual Cloud Disk will be created from the Golden Snapshot only when some Pod bounds that PVC.

Now I assign that PVC to my Pod (in my case, it’s Postgresql) as I would with any other PVC.

kubectl -n <namespace> get volumesnapshotContent,volumesnapshot,pvc,pod

Both VS and VSC should be READYTOUSE true, PVC bound, and the Pod (from Deployment or StatefulSet) running.

To keep on using data from my Golden Snapshot, I just need to repeat this for the
next namespace and voilà! No need to waste time and compute resources on the duplication process.


Source: Kubernetes Blog

Announcing HashiCorp Nomad 1.2 Beta

Announcing HashiCorp Nomad 1.2 Beta

We are excited to announce that the beta release of HashiCorp Nomad 1.2 is now available. Nomad is a simple and flexible orchestrator used to deploy and manage containers and non-containerized applications. Nomad works across on-premises and cloud environments. It is widely adopted and used in production by organizations such as Cloudflare, Roblox, Q2, Pandora, and GitHub.

Let’s take a look at what’s new in Nomad and in the Nomad ecosystem, including:

  • System Batch jobs
  • User interface upgrades
  • Nomad Pack

»System Batch Jobs

Nomad 1.2 introduces a new type of job to Nomad called sysbatch. This is short for “System Batch”. These jobs are meant for cluster-wide, short-lived, tasks. System Batch jobs are an excellent option for regularly upgrading software that runs on your client nodes, triggering garbage collection or backups on a schedule, collecting client metadata, or doing one-off client maintenance tasks.

Like System jobs, System Batch jobs work without an update stanza and will run on any node in the cluster that is not excluded via constraints. Unlike System jobs, System Batch jobs will run only on clients that are ready at the time the job was submitted to Nomad.

Like Batch jobs, System Batch jobs are meant to run to completion, can be run on a scheduled basis, and support dispatch execution with per-run parameters.

If you want to run a simple sysbatch job, the job specification might look something like this:

job "sysbatchjob" {
  datacenters = ["dc1"]

  type = "sysbatch"

  constraint {
    attribute = "${attr.kernel.name}"
    value     = "linux"
  }

  group "sysbatch_job_group" {
    count = 1

    task "sysbatch_task" {
      driver = "docker"

      config {
        image = "busybox:1"

        command = "/bin/sh"
        args    = ["-c", "echo hi; sleep 1"]
      }
    }
  }
}

This will run a short-lived Docker task on every client node in the cluster that is running Linux.

To run this job at regular intervals, you would add a periodic stanza:

periodic {
  cron             = "0 0 */2 ? * *"
  prohibit_overlap = true
}

For instance, the stanza above instructs Nomad to re-run the sysbatch job every hour.

Additionally, sysbatch jobs can be parameterized and then invoked later using the dispatch command. These specialized jobs act less like regular Nomad jobs and more like cluster-wide functions.

Adding a parameterized stanza defines the arguments that can be passed into the job. For example, a sysbatch job that upgrades Consul to a different version might have a parameterized stanza that looks like this:

parameterized {
  payload   	= "forbidden"
  meta_required = ["consul_version"]
  meta_optional = ["retry_count"]
}

This sysbatch job could then be registered using the run command, and executed using the dispatch command:

$ nomad job run upgrade_consul
$ nomad job dispatch upgrade_consul -meta consul_version=1.11.0

»User Interface Upgrades

Traditional Batch jobs and System Batch jobs now include an upgraded Job Status section, which includes two new statuses: Not Scheduled and Degraded.

Not Scheduled shows the client nodes that did not run a job. This could be due to a constraint that excluded the node based on its attributes, or because the node was added to the cluster after the job was run.

The Degraded state shows jobs in which any allocations did not complete successfully.

sysbatch

Additionally, you can now view all the client nodes that batch and sysbatch jobs run on with the new Clients tab. This allows you to quickly assess the state of each job across the cluster.

clients

»Nomad Pack (Tech Preview)

We are excited to announce the tech preview of Nomad Pack, a package manager for Nomad. Nomad Pack makes it easy to define reusable application deployments. This lets you quickly spin up popular open source applications, define deployment patterns that can be reused across teams within your organization, and discover job specifications from the Nomad community. Need a quick Traefik load balancer? There’s a Pack for that.

Each Pack is a group of resources that are meant to be deployed to Nomad together. In the Tech Preview, these resources must be Nomad jobs, but we expect to add volumes and ACL policies in a future release.

Let’s take a look at Nomad Pack, using the Nomad Autoscaler as an example.

Traditionally, users deploying the Nomad Autoscaler often need to deploy or configure multiple jobs within Nomad, usually Grafana, Loki, the autoscaler itself, an APM, and a load balancer.

With Nomad Pack you can run a single command to deploy all the necessary autoscaler resources to Nomad. Optionally, the deployment can be customized by passing in a variable value:

A

This allows you to spend less time learning and writing Nomad job specs for each app you deploy. See the Nomad Pack repository for more details on basic usage.

By default, Nomad Pack uses the Nomad Pack Community Registry as its source for Packs. This registry provides a location for the Nomad community to share their Nomad configuration files, learn app-specific best practices, and get feedback and contributions from the broader community. Alternative registries and internal repositories can also be used with Nomad Pack. To view available packs, run the registry list command:

Nomad

You can easily write and customize Packs for your specific organization’s needs using Go Template, a common templating language that is simple to write but can also contain complex logic. Templates can be composed and re-used across multiple packs, which allows organizations to more easily standardize Nomad configurations, codify best practices, and make changes across multiple jobs at once.

To learn more about writing your own packs and registries, see the Writing Custom Packs guide in the repository.

A Tech Preview release of Nomad Pack will be available in the coming weeks. The Nomad team is still validating the design and specifications around the tool and packs. While we don’t expect changes to the user flows that Nomad Pack enables, some details may change based on user feedback. Until the release, to use Nomad Pack you can build from the source code. Details can be found in the repository’s contributing guide.

As you use Nomad Pack and write your own packs, please don’t hesitate to provide feedback. Issues and pull requests are welcome on the GitHub repository and Pack suggestions and votes are encouraged via Community Pack Registry issues.

»What’s Next?

We encourage you to experiment with the new features in Nomad 1.2 and Nomad Pack, but we recommend against using Nomad 1.2 in a production environment until the official GA release. We are eager to see how the new features and projects enhance your Nomad experience. If you encounter an issue, please file a new bug report in GitHub and we’ll take a look.

Finally, on behalf of the Nomad team, I’d like to thank our amazing community. Your dedication, feature requests, pull requests, and bug reports help us make Nomad better. We are deeply grateful for your time, passion, and support.


Source: HashiCorp Blog