Optimizing External Secrets Operator Traffic

In Kubernetes, a Secret is an object that stores sensitive information like a password, token, key, etc. One of the several good practices for Kubernetes secret management is making use of a third-party secrets store provider solution to manage secrets outside of the clusters and configuring pods to access those secrets. There are plenty of such third-party solutions available in the market, such as:

These third-party solutions, a.k.a External Secrets Managers (ESM), implement secure storage, secret versioning, fine-grain access control, audit and logging.

The External Secrets Operator (ESO) is an open-source solution used for secure retrieval and synchronization of secrets from the ESM. The secrets retrieved from the ESM are injected into the Kubernetes environment as native secret objects. Thus, ESO enables application developers to use Kubernetes Secret Object with Enterprise grade External Secret Managers.

ESO implementation in a Kubernetes cluster primarily requires two resources:

Secret retrieval is a one-time activity, but synchronization of secrets generates traffic at regular intervals. So it's important to follow best practices (listed below) that can optimize ESO traffic to the external secrets management systems.

Defining Refresh Interval for the ExternalSecret Object

Long-lived static secrets pose a security risk that can be addressed by adopting a secret rotation policy. Each time a secret gets rotated in the ESM, it should be reflected in the corresponding Kubernetes Secret object. ESO supports automatic secret synchronization for such situations. Secrets get synchronized after a specified time frame, called "refresh interval," which is a part of the ExternalSecret resource definition. 

It is advisable to opt for an optimum refresh interval value; e.g., a secret that's not likely to get modified often can have a refresh interval of one day instead of one hour or a few minutes. Remember, the more aggressive the refresh interval, the more traffic it will generate.

Defining Refresh Interval for the ClusterSecretStore Object

The refresh interval defined in the ClusterSecretStore (CSS) is the frequency with which the CSS validates itself with the ESM. If the refresh interval is not specified while defining a CSS object, the default refresh interval (which is specific to the ESM API implementation) is considered. The default CSS refresh interval has been found to be a very aggressive value; i.e., the interaction with the ESM happens very frequently in this case. 

For example, the picture below is an excerpt of the description of a sample CSS (HashiCorp Vault is the ESM in this case) that has no refresh interval value in its definition. The refresh interval seen in the CSS description below is five minutes, implying the resource is approaching the ESM every five minutes, generating avoidable traffic.

CSS description showing five minutes as refresh interval

The refresh interval attribute gets missed in most CSS definitions because:

The significance of defining a refresh interval for CSS can be realized by monitoring the traffic generated via a CSS object without a refresh interval in a test cluster that does not have any ESO object.

Using Cluster-Scoped External Secrets Over Namespace-Scoped External Secrets

The first ESO release was done in May 2021. Back then, the only option was to use the namespace-scoped ExternalSecret resource. So, even if the secret stored was global, an ExternalSecret object had to be defined for each namespace. ExternalSecret objects across all namespaces would get synchronized at the defined refresh interval, thereby generating traffic. The larger the number of namespaces, the more traffic they would generate.

There was a dire need for a global ExternalSecret object accessible across different namespaces. To fill this gap, the cluster-level external secret resource, ClusterExternalSecret (CES) was introduced in April 2022 (v0.5.0). Opting for ClusterExternalSecret over ExternalSecret (where applicable) can avoid redundant traffic generation.

A sample YAML specific to HashiCorp Vault and Kubernetes image pull secret can be referred to below:

YAML
 
apiVersion: external-secrets.io/v1beta1
kind: ClusterExternalSecret
metadata:
  name: "sre-cluster-ext-secret"
spec:
  # The name to be used on the ExternalSecrets
  externalSecretName: sre-cluster-es

  # This is a basic label selector to select the namespaces to deploy ExternalSecrets to.
  # you can read more about them here https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#resources-that-support-set-based-requirements
  namespaceSelector: #mandatory -- not adding this will expose the external secret
    matchLabels:
      label: try_ces
 
  # How often the ClusterExternalSecret should reconcile itself
  # This will decide how often to check and make sure that the ExternalSecrets exist in the matching namespaces
  refreshTime: "10h"

  # This is the spec of the ExternalSecrets to be created
  externalSecretSpec:
    secretStoreRef:
      name: vault-backend
      kind: ClusterSecretStore
     
    target:
      name: sre-k8secret-cluster-es
      template:
        type: kubernetes.io/dockerconfigjson
        data:
          .dockerconfigjson: "{{.dockersecret | toString}}"

    refreshInterval: "24h"

    data:
    - secretKey: dockersecret
      remoteRef:
        key: imagesecret
        property: dockersecret


Conclusion

By following the best practices listed above, the External Secrets Operator traffic to the External Secrets Manager can be reduced significantly.

 

 

 

 

Top