08 Jul2020

Why Kubernetes Compares With Hash

In Kubernetes, the replica set controller uses a hashing mechanism to compare objects. Pods that are owned by a replica set have a label that is used to remember what the hash of the pod template that led to this pod spec:

kind: Pod
metadata:
  labels:
    pod-template-hash: 775855699b

In this post, I want to present the advantages and limitations of using a hash function in this context.

✅ Pros: works around the fact that child object might get mutated or defaulted, which means the reflect.DeepEqual can’t work (what about performance???)

❌ Cons: if the child gets updated, the parent cannot know that it has been changed since the hash only works in one way. (talk about replica set mapping to multiple pods)

why is it a label and not an annotation?

Looks like the hash is created in the replicaset sync func and it uses the ComputeHash func which uses fnv.New32a from the std lib:

// ComputeHash returns a hash value calculated from pod template and
// a collisionCount to avoid hash collision. The hash will be safe encoded to
// avoid bad words.
func ComputeHash(template *v1.PodTemplateSpec, collisionCount *int32) string {
    podTemplateSpecHasher := fnv.New32a()
    hashutil.DeepHashObject(podTemplateSpecHasher, *template)

    // Add collisionCount in the hash if it exists.
    if collisionCount != nil {
        collisionCountBytes := make([]byte, 8)
        binary.LittleEndian.PutUint32(collisionCountBytes, uint32(*collisionCount))
        podTemplateSpecHasher.Write(collisionCountBytes)
    }

    return rand.SafeEncodeString(fmt.Sprint(podTemplateSpecHasher.Sum32()))
}

and the DeepHashObject looks like this:

// DeepHashObject writes specified object to hash using the spew library
// which follows pointers and prints actual values of the nested objects
// ensuring the hash does not change when a pointer changes.
func DeepHashObject(hasher hash.Hash, objectToWrite interface{}) {
    hasher.Reset()
    printer := spew.ConfigState{
        Indent:         " ",
        SortKeys:       true,
        DisableMethods: true,
        SpewKeys:       true,
    }
    printer.Fprintf(hasher, "%#v", objectToWrite)
}

So the hashing mechanism uses davecgh/go-spew to turn the object into a string, and then uses the fnv std library to hash that string.

Benchmarking two hashing functions

package main

import (
    "hash"
    "hash/fnv"
    "testing"

    "github.com/davecgh/go-spew/spew"
    "github.com/mitchellh/hashstructure"
)

type ComplexStruct struct {
    Name     string
    Age      uint
    Metadata map[string]interface{}
}

var v = ComplexStruct{
    Name: "mitchellh",
    Age:  64,
    Metadata: map[string]interface{}{
        "car":      true,
        "location": "California",
        "siblings": []string{"Bob", "John"},
    },
}

func BenchmarkMitchellhHashstructure(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _, _ = hashstructure.Hash(v, nil)
    }
}

func BenchmarkKubernetesComputeHash(b *testing.B) {
    for i := 0; i < b.N; i++ {
        hasher := fnv.New32a()
        DeepHashObject(hasher, v)
        _ = hasher.Sum32()
    }
}

// From https://github.com/kubernetes/kubernetes/blob/7e75a5ef/pkg/util/hash/hash.go#L25-L37
func DeepHashObject(hasher hash.Hash, objectToWrite interface{}) {
    hasher.Reset()
    printer := spew.ConfigState{
        Indent:         " ",
        SortKeys:       true,
        DisableMethods: true,
        SpewKeys:       true,
    }
    printer.Fprintf(hasher, "%#v", objectToWrite)
}

The winner seems to be the Kubernetes’ ComputeHash function, although it relies on spew to generate a string. I guess most of the time spent by hashstructure.Hash is due to the reflect package?

% go test -bench .
BenchmarkMitchellhHashstructure-8         368101              3.127 µs/op
BenchmarkKubernetesComputeHash-8          456028              2.704 µs/op

Other clues

  controller-revision-hash: 6f7768f569
  k8s-app: kube-proxy
  pod-template-generation: "7"

  k3s.io/node-config-hash

📝 Edit this page