The failure mode of AppRole login?

kwilczynski · May 10, 2023, 7:04am

Hello,

I have a question about the failure modes when issuing a write to auth/approle/login to obtain a new client token.

For example, using the Go client and either the more low-level API per:

client.Logical().Write("auth/approle/login", map[string]interface{}{
    "role_id":   ...,
    "secret_id": ...,
})

Or the more high-level API that abstracts some operations per:

auth, _ := approle.NewAppRoleAuth(
    ...,
    &approle.SecretID{FromString: ...},
)
client.Auth().Login(context.TODO(), auth)

This operation usually returns a SecretAuth type we would access to read the ClientToken attribute from it. This works fine almost always.

However, we have seen an occasional issue where the ClientToken attribute is absent.

This does not happen very often, but it does happen, and we have added provisions in our code to check for the presence of the relevant types to avoid a nil pointer dereference.

The typical pattern is something along the lines of the following:

secret, err := client.Logical().Write("auth/approle/login", ...)
if err != nil {
    return err
}
if secret == nil || secret.Auth == nil || secret.Auth.ClientToken == "" {
    return errors.New("...")
}
client.SetToken(secret.Auth.ClientToken)

The above would be different when using the more high-level API (the client.Auth().Login() function) as this specific API internally sets the new client token for us at the client object level after performing a validation similar to the above one internally - and
should there be an error, the client.Auth().Login() function will return a plain error with an appropriate error message.

This is all fine, of course.

That said, I would like to understand better why a “login” operation, which is just a request to retrieve a new client token upon prior authentication with some current token, can return no results (the authentication information will be missing).

Should this be considered a standard failure mode? Something that should be handled, recovered from and perhaps even retried?

Perhaps this is a side-effect of some of our Vault deployment issues? Maybe we need to up the compute sizing on which Vault runs as some performance issue is causing this?

I am asking here as I couldn’t find anything to address my question within the issues on GitHub or the documentation.

Thank you!

Krzysztof

maxb · May 10, 2023, 8:13am

When a Vault API operation fails, errors should be returned in the JSON response.

~~Browsing some of the code you linked to, I’m starting to think that the client library code is discarding the error information and just returning nil instead~~

I’m not in a position to test this myself right now, but you could easily check it by submitting incorrect approle credentials, and seeing whether you get nil, or a meaningful error string complaining about the credentials.

Now I’ve opened up an IDE and played with the code a bit more, I see my initial guess wasn’t correct.

Normal errors are returned by client.Logical().Write() as Go errors.

I guess there must be some odd edge case in the client code, which is causing it to return an unusual response.

If your Vault has audit logging turned on, cross-referencing the time of one of these unexpected responses with the audit log, might help you figure out what the response was from the Vault server’s perspective.

Also, it would be helpful to know which of the three unusual cases:

is being triggered.

There is a case where it is expected and normal for client.Logical().Write() to return a nil secret with a nil error - that is when Vault returns an HTTP 204 No Content response - however I would never expect that from auth/approle/login.

In summary, then:

I don’t think this is a standard failure mode - it feels like a bug in Vault or the client library to me.
Since you can’t reproduce it on demand, I think you need a way to capture the actual HTTP response from Vault when it happens.
- If your Vault has audit logging turned on, and you can find the response in the audit log, this may suffice.
- Otherwise, it might be necessary to modify the github.com/hashicorp/vault/api library itself to selectively provide the full response body when this happens.

nikamaadeel · May 12, 2023, 7:59pm

The issue you’re experiencing with missing authentication information is not a standard further mode and may be a bug in either Vault or the client library. To investigate further, you can capture the HTTP response from vault when it occurs, either through audit logging or by modifying the client library. This will help determine which of the unusual cases is being triggered and provide more insight into the issue.

maxb · May 12, 2023, 8:54pm

That’s… just my response rephrased with less detail?

nikamaadeel · May 13, 2023, 6:15am

Okay Sorry It’s my mistake … Now Should I Have to delete it or not Please guide me…!!!

maxb · May 13, 2023, 1:48pm

There’s no need to delete it, just avoid doing that in future.

kwilczynski · May 15, 2023, 9:24am

@nikamaadeel and @maxb, I appreciate both responses.

I guess there must be some odd edge case in the client code, which is causing it to return an unusual response.

This would be my assumption. However, any non-200 response code is dealt with internally as an error that is then eventually bubbled up to the client as an error of sorts.

Normal errors are returned by client.Logical().Write() as Go errors.

Correct. This is where I wish that, i.e., client.Auth().Login() would return a custom error type, making it easier to ascertain whether it was missing authentication information or some other error.

If your Vault has audit logging turned on, cross-referencing the time of one of these unexpected responses with the audit log, might help you figure out what the response was from the Vault server’s perspective.

This is good advice. That said, I have only seen this issue manifest itself in our production environment where the volume of the calls is substantial - we do have a lot of clients requesting all sorts of things from our Vault cluster all the time. I am a little weary of enabling trace logging in production. The logging volume would increase substantially and perhaps even load on Vault itself, which I would rather avoid doing, if possible.

Nonetheless, I could not reproduce this locally with a local development Vault instance. Perhaps there is a benchmark you can think of I could apply to try to trigger this before I go and write something myself.

Also, it would be helpful to know which of the three unusual cases:
kwilczynski:
if secret == nil || secret.Auth == nil || secret.Auth.ClientToken == "" {
is being triggered.

That is an excellent point, and I suppose it would be missing either the entire SecretAuth type or the ClientToken attribute is blank.

There is a case where it is expected and normal for client.Logical().Write() to return a nil secret with a nil error - that is when Vault returns an HTTP 204 No Content response - however I would never expect that from auth/approle/login .

If this is the case, I would consider this a bug in Vault.

For the time being, I have settled with something like the following. I began to treat the missing authentication information as a transient issue and simply retry it.

An example of how I retry AppRole “login” attempts:

type retryError struct {
	error
}

func RetryStop(err error) error {
	return retryError{err}
}

func Retry(attempts int, sleep time.Duration, callbackFunc func() error) error {
	var e retryError
	if err := callbackFunc(); err != nil {
		if errors.As(err, &e) {
			return e.error
		}
		if attempts--; attempts > 0 {
			sleep += jitter(sleep) / 2
			time.Sleep(sleep)
			return Retry(attempts, 2*sleep, callbackFunc)
		}
		return err
	}
	return nil
}

func jitter(t time.Duration) time.Duration {
	n, err := rand.Int(rand.Reader, big.NewInt(int64(t)))
	if err != nil {
		panic(err)
	}
	return time.Duration(n.Int64())
}

...

auth, _ := approle.NewAppRoleAuth(
    ...,
    &approle.SecretID{FromString: ...},
)

err := Retry(5, 250*time.Millisecond, func() error {
	_, err := client.Auth().Login(context.TODO(), auth)
	if err != nil {
		const clientTokenError = `client token not set`
		if strings.Contains(err.Error(), clientTokenError) {
			return err
		}
		return utils.RetryStop(err)
	}
	return nil
})
if err != nil {
	return err
}

Perhaps there are better ways to handle this. Nevertheless, this has proven to be a simple and practical approach.

Topic		Replies	Views
Why can't you login with method=approle? Vault	2	417	April 26, 2024
AppRole based authentication and reading secrets using role-id and secret-id Vault	12	2276	January 13, 2020
Oci auth method from golang code Vault	2	691	January 24, 2020
Approle auth method Vault	0	209	July 20, 2023
Token capabiliites for getting AppRole role-id and secret Vault	2	336	November 5, 2020

The failure mode of AppRole login?

Related topics