Getting data resources from real provider when using mock provider

I think I know the answer, but if I have a module that has a bunch of data resources like

data "aws_iam_policy_document" "allow_ses_send_policy" {
  statement {
    actions   = ["ses:SendEmail"]
    resources = ["*"]
  }
}

Is there a way to get Terraform to use the real provider for, say, all data resources, or all data resources of a specific type?

I know there’s provider aliasing, like the example here, but in this case, I want to mock the provider for all resources, but use the real provider for things like generating policy documents.

This also kind of highlights another gap, which (while I understand the reason it wasn’t built this way) is that it would be really nice to have some sort of plugin or functionality for allowing more flexibility for defining better fake data in mock providers.

not sure if I get you correctly. yes, you can use provider aliases in your module:
for example:

terraform {
  required_providers {
    aws = {
      source                = "hashicorp/aws"
      version               = ">= 5.0"
      configuration_aliases = [aws.real]
    }
  }
}

then assign provider = aws.real to data sources like aws_iam_policy_document

data "aws_iam_policy_document" "allow_ses_send_policy" {
  provider = aws.real 

  statement {
    actions   = ["ses:SendEmail"]
    resources = ["*"]
  }
}

In your *.tftest.hcl file, define both a real provider and a mocked one

provider "aws" {
  alias = "real"
}

mock_provider "aws" {
  alias = "mock"

  mock_data "aws_s3_bucket" {
    defaults = {
      arn = "arn:aws:s3:::fake-bucket"
    }
  }
}

run "test_with_mixed_providers" {
  command = apply

  providers = {
    aws      = aws.mock
    aws.real = aws.real
  }

  # Assertions here
  assert {
    condition     = data.aws_iam_policy_document.allow_ses_send_policy.json == "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Action\":[\"ses:SendEmail\"],\"Resource\":[\"*\"]}]}"
    error_message = "Policy document not generated correctly"
  }
}

Thanks. Yes, I thought about an approach like this, but won’t this also require me to use the two different provider aliases within the module itself (and thus also requiring defining the aliased provider everywhere I call the module as well)? This is the rabbithole I was trying to stay out of.

ya, it does require specifying the alias in the module for those data sources and passing both providers when calling the module.