I am trying to provision a newly deployed ec2 instance on AWS with multiple files. I can use cloudinit to copy a few smaller text type files (PHP code) to the ec2-user home directory and then deploy a httpd webserver and copy these files to the correct /var/www/html location. These files are correctly copied to ec2-user home directory using the user_data variable of the aws_instance. I also need to have the AWS SDK for PHP be installed. This file comes as a zip file that exceeds the file size limitation of cloudinit so cannot be done with it. I can (separately) create an s3 bucket and upload this large zip file to it. I then see that I can use an aws_s3_bucket_object to get the file but this seems to require the use of the same user_data variable of the aws_instance. How can retrieve both my files from cloudinit and the s3bucket?
I have already created the s3bucket with the zip file in another project and now my aws_instance code looks like this:
In your cloudinit you’d need to download the large file from the S3 bucket. So you don’t need the aws_s3_bucket_object data source, as it is the EC2 instance while it is booting that would access the S3 bucket rather than Terraform.
Should I then put ‘some command’ in my provisioner to download the file(s) from the s3 bucket to the instance? Would the command go in the inline = section? What is the command to download a file from an s3 bucket that I would use? Or are you suggesting that I add a new ‘part’ to my template_cloudinit_config object?
Where VMs are concerned, provisioners are a workaround for VM platforms and images that don’t support cloud-init or something like it.
Since you are already using cloud-init for some of your VM bootstrapping, you should not also use the workaround intended for when it is not available, and should use cloud-init for all of your initialisation tasks.
It is up to you to figure out an appropriate one, for the software available in your image, and your authentication requirements.
As others have noted, there is a size limit on user_data that is imposed by EC2 itself, and so larger files will need to be distributed out of band, such as in S3.
The exact details of this will vary depending on how your organisation does access control in AWS but there are some topics that are likely to be relevant here that will hopefully give you enough to research further in the AWS docs:
When one AWS service talks directly to another, AWS IAM can typically handle issuing the necessary credentials automatically as long as you have the appropriate access policies configured. For EC2 instances there is the concept of an “instance profile” which effectively attaches an EC2 instance to an IAM role. You can then grant that role access to read objects from your S3 bucket using the bucket policy.
IAM is a broad and flexible system, so unfortunately I can’t give detailed advice beyond that but hopefully the above helps you to find the relevant source materials to learn more.
Once your EC2 instance has access to the S3 bucket, the AWS CLI and the AWS SDKs should typically be able to issue themselves suitable credentials using the same hidden API service that provides the “user data”, so it is often sufficient then to:
Use “user data” to command cloud-init to run a script.
In that script, use the AWS CLI to download the needed files from S3 into a local directory.
In that same script, launch whatever actions you need to take based on those files.
The exact details of this will depend on what software is installed in your AMI. I believe the AnazonLinux AMIs include both cloud-init and the AWS CLI. On other stock Linux distribution images you may need to install the AWS CLI from the distribution package manager first.