I packed a network appliance with vagrant 2.2.6 from apt repo on ubuntu 20 / wsl 1 / windows 10.
In order for the vm to not error out when vagrant tries to connect to it, I use one of the available shells, which are not fully featured, as a workaround.
config.ssh.shell = "tclsh"
When I use vagrant 2.2.15 from Hashicorp repo it fails with error below.
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!
sed -i '/#VAGRANT-BEGIN/,/#VAGRANT-END/d' /etc/fstab
Stdout from the command:
'
(>&2 printf '41e57d38-b4f7-4e46-9c38-13873d338b86-vagrant-ssh')
sed -i '/#VAGRANT-BEGIN/,/#VAGRANT-END/d' /etc/fstab
exit
<HPE>sudo -E -H tclsh
^
% Unrecognized command found at '^' position.
Stderr from the command:
If I omit config.ssh.shell = "tclsh" then I get the following error
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
The configured shell (config.ssh.shell) is invalid and unable
to properly execute commands. The most common cause for this is
using a shell that is unavailable on the system. Please verify
you're using the full path to the shell and that the shell is
executable by the SSH user.
Can anyone explain what exactly is vagrant doing with config-ssh-shell?
Can this setting be overridden? Could you point me to the source code and possible values?
Can vagrant be set to ignore such errors and continue executing the Vagrantfile?
The issue you seem to be running into with vagrant 2.2.15 has to do with synced folders getting added to fstab. It looks like the command sed -i '/#VAGRANT-BEGIN/,/#VAGRANT-END/d' /etc/fstab is not valid in the tclsh shell? You can disable this feature with the allow_fstab_modification config option.
Can anyone explain what exactly is vagrant doing with config-ssh-shell ?
The config.ssh.shell option allows users to set the shell to be used by Vagrant when connecting to the remote machine. For example, in this snippet, Vagrant prepares a command to run on the remote machine using the configured shell. So for example, if the shell set for config.ssh.shell is not installed on the host system then Vagrant will return an error about the shell being invalid.
Can this setting be overridden? Could you point me to the source code and possible values?
Yes, this setting can be overridden. The default value is bash -l. This value gets overridden using the config.ssh.shell option. There is not checking on this command, so you can set it to whatever you want theoretically. However, Vagrant will fail if the shell specified does not exist on the system.
Can vagrant be set to ignore such errors and continue executing the Vagrantfile?
Nope. Vagrant needs to be able to ssh into the machine to continue with it’s guest configuration.
config.vm.synced_folder was already disabled. Apparently, I also have to do the same for config.vm.allow_fstab_modification.
I don’t need that Vagrant ssh into the machine, as it’s not a *nix system. So I also set config.ssh.insert_key as well as vb.check_guest_additions to false.
Now that I also disabled allow_fstab_modification (and uninstalled plugins vagrant-proxyconf and vagrant-host-shell, just in case) it still errors out if config.ssh.shell is set to tclsh. Keep in mind that this is not a fully featured shell.
Failed to open an SSH channel on the remote end! This typically
means that the maximum number of active sessions was hit on the
SSH server. Please configure your remote SSH server to resolve
this issue.
The SSH server on the network appliance allows by default 32 concurrent ssh sessions, then it denies connection requests. On the machine itself I see
%Apr 13 15:20:38:039 2021 HPE SSHS/6/SSHS_LOG: Connection closed by 10.0.2.2.
%Apr 13 15:20:38:039 2021 HPE SSHS/6/SSHS_DISCONNECT: SSH user (null) (IP: 10.0.2.2) disconnected from the server.
%Apr 13 15:20:43:678 2021 HPE SSHS/6/SSHS_LOG: Accepted publickey for vagrant from 10.0.2.2 port 52690.
%Apr 13 15:21:07:488 2021 HPE SHELL/6/SHELL_CMD: -Line=vty0-IPAddr=**-User=**; Command is tclsh
%Apr 13 15:21:08:718 2021 HPE SSHS/6/SSHS_DISCONNECT: SSH user vagrant (IP: 10.0.2.2) disconnected from the server.
%Apr 13 15:21:11:894 2021 HPE SSHS/6/SSHS_LOG: Accepted publickey for vagrant from 10.0.2.2 port 52698.
%Apr 13 15:21:19:367 2021 HPE SHELL/6/SHELL_CMD: -Line=vty0-IPAddr=**-User=**; Command is tclsh
%Apr 13 15:21:23:811 2021 HPE SHELL/6/SHELL_CMD: -Line=vty0-IPAddr=**-User=**; Command is tclsh
%Apr 13 15:21:27:886 2021 HPE SHELL/6/SHELL_CMD: -Line=vty0-IPAddr=**-User=**; Command is tclsh
%Apr 13 15:21:32:006 2021 HPE SHELL/6/SHELL_CMD: -Line=vty0-IPAddr=**-User=**; Command is tclsh
%Apr 13 15:21:35:910 2021 HPE SHELL/6/SHELL_CMD: -Line=vty0-IPAddr=**-User=**; Command is tclsh
%Apr 13 15:21:36:943 2021 HPE SSHS/6/SSHS_DISCONNECT: SSH user vagrant (IP: 10.0.2.2) disconnected from the server.
%Apr 13 15:21:37:041 2021 HPE SSHS/6/SSHS_LOG: Connection closed by 10.0.2.2.
%Apr 13 15:21:37:041 2021 HPE SSHS/6/SSHS_DISCONNECT: SSH user (null) (IP: 10.0.2.2) disconnected from the server.
%Apr 13 15:21:37:043 2021 HPE SSHS/6/SSHS_LOG: Connection closed by 10.0.2.2.
%Apr 13 15:21:37:043 2021 HPE SSHS/6/SSHS_DISCONNECT: SSH user (null) (IP: 10.0.2.2) disconnected from the server.
If I unset ssh.shell to tclsh, it defaults to bash and errors out as already described.
I don’t know what else is going on and what I could disable in order for the machine to continue with the provisioning phase.
In a nutshell, I don’t need Vagrant to ssh in the machine to configure anything as it’s not *nix. This is done subsequently.
Hmmm, I don’t think this is possible with Vagrant at this time. As part of the process of bringing up a machine Vagrant will wait for a communicator to become available, for example using the virtualbox provider.
However, this kind of thing almost exists for the docker provider with the has_ssh config option.
Just to be clear — I do need to ssh into the machine, but don’t need Vagrant probing for guest capabilities.
I cannot seem to find a setting for this. I suppose guest inheritance doesn’t make it easier.
At the moment I’m stuck with this error
DEBUG guest: Found cap: persist_mount_shared_folder in linux
INFO guest: Execute capability: persist_mount_shared_folder [#<Vagrant::Machine: default (VagrantPlugins::ProviderVirtualBox::Provider)>, nil] (atomic)
INFO persist_mount_shared_folders: clearing /etc/fstab
DEBUG ssh: Re-using SSH connection.
INFO ssh: Execute: test -f /etc/fstab (sudo=false)
ERROR warden: Error occurred: Failed to open an SSH channel on the remote end! This typically
means that the maximum number of active sessions was hit on the
SSH server. Please configure your remote SSH server to resolve
this issue.
I’m checking this with a colleague. What would be the best course of action, a new config.vm.guest, e.g., :other, or try to patch the builtins?
We’ve seen locally that even if a new bare plugin does nothing for the guest type, some actions still try to check, for example, for rsync on the guest vm.
Doing a patch is probably the fastest way to just get something bringing up a machine (and stop Vagrant from connecting to it). You can make the builtins that are giving you trouble no-ops, or you can patch the provider plugin to not do any operations after booting. For example, if you are using the virtualbox provider you can modify the boot action to no wait for the communicator or do any of the normal post boot actions.
This is kind of an extreme option, and this kind of patch would certainly not be accepted upstream.
Making a new plugin probably a more flexible idea. My guess is that you might want to make a new communicator plugin opposed to a guest plugin. In the WaitForCommunicator action, Vagrant will wait for the two conditions: (1) for the machine to be in a running state according to the provider and (2) for the machine to be reachable by the communicator. So, you could have a new config.vm.communicator (eg. none / ill-communication) that only does no-op’s, so it never tries to connect to the guest machine.
I’m not completely sure about the details here though. I expect this approach could work. Also note, that I don’t think this kind of change would necessarily resolve the github issue Enhancement Request: Supporting network devices as guests in Vagrant · Issue #11771 · hashicorp/vagrant · GitHub. A resolution to that issue should allow for users to still run commands against the guest device.
the guest plugin does not control if Vagrant connects to the guest or not. There are other plugin types that might also try to access to guest as part of the process of bringing up a machine.
the communicator plugins define how the guest machine gets connected to
a plugin and a builtin are different ideas, a plugin does not necessarily preempt a builtin
Right, so there a few different types of plugins (eg. guest, host, provider, communicator, provisioner), each of which have different types of capabilities. But each of these different types of plugins can use each other. So, while you can define a new guest plugin that doesn’t ever access the guest, you might still be using a different type of plugin (e.g. a provider plugin) that does some action to access the guest.
So, for example, if you are on a Linux trying to bring up a no-op guest with Virtualbox you are actually using the no-op guest plugin, Virtualbox provider plugin, Linux host plugin and ssh communicator plugin. And, any one of those plugins might try to access the guest machine depending on what action is being run.
Just having a guest no-op plugin then, doesn’t guarantee that Vagrant will never try to connect to the guest. However, you might be able to build a no-op communicator plugin that achieves this goal. Since the communicator plugin defines how Vagrant communicates with the guest.
Can’t a plugin preempt a builtin?
A plugin and a builtin are kind of separate ideas. Builtins are “actions” that are provided by Vagrant that can be used across plugins. Where an “action” is a step that needs to be taken in order to complete some goal. So, a provider plugin might use some builtin actions to sync a folder as part of it’s boot action which is used to achieve the goal of bringing up a guest machine. But then a plugin might also define it’s own actions as well.
If I understood you correctly, making a no-op ssh communicator plugin that only checks for connection readiness could preempt any other plugin from further entering the machine for checks and configuration — is that so?
How other plugins handle the implementation of a communicator that they don’t know about?
Heya, sorry for the late reply here.
If you don’t need that sync folder on the guest you can disable it, and this step should be skipped. It’ll look something like:
I disabled that, as well as ssh insert key, keep alives, hosts, fstab. This time seems the VirtualBox provider connecting to the guest, which I cannot disable.
INFO ssh: SSH is ready!
DEBUG ssh: Re-using SSH connection.
INFO ssh: Execute: (sudo=false)
DEBUG ssh: Exit status: 0
INFO guest: Autodetecting host type for [#<Vagrant::Machine: default (VagrantPlugins::ProviderVirtualBox::Provider)>]
DEBUG guest: Trying: atomic
DEBUG ssh: Re-using SSH connection.
INFO ssh: Execute: grep 'ostree=' /proc/cmdline (sudo=false)
ERROR warden: Error occurred: Failed to open an SSH channel on the remote end! This typically
means that the maximum number of active sessions was hit on the
SSH server. Please configure your remote SSH server to resolve
this issue.
Vagrant.configure("2") do |config|
config.vm.box = "hashicorp/bionic64"
config.vm.communicator = "ill"
config.vm.synced_folder ".", "/vagrant", disabled: true
end
Looks like it was able to start the VM and no-op through doing all the other Vagrant bits and exited successfully.
% bundle exec vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'hashicorp/bionic64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'hashicorp/bionic64' version '1.0.282' is up to date...
==> default: Setting the name of the VM: vagrant_default_1619117556089_46789
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
==> default: Forwarding ports...
default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
default: No guest additions were detected on the base box for this VM! Guest
default: additions are required for forwarded ports, shared folders, host only
default: networking, and more. If SSH fails on this machine, please install
default: the guest additions and repackage the box to continue.
default:
default: This is not an error message; everything may continue to work properly,
default: in which case you may ignore this message.
And once the machine was up I was able to ssh in. Not sure if this will work for your setup though. Also, I would still probably consider this approach a bit hack-y.
We have a similar no-op communicator plugin on a linux dev environment, but rsync just ignores it and opens a shell to localhost, see above.
Is your environment also linux? What host and guest types are detected for your setup?
but rsync just ignores it and opens a shell to localhost
Hmmmm, that seems suspicious to me. Vagrant plugins should not be trying to directly access the guest machine. Can you please share the relevant parts of your Vagrantfile and the full debug output?
Is your environment also linux? What host and guest types are detected for your setup?
Command: "rsync" "--verbose" "--archive" "--delete" "-z" "--copy-links" "--no-owner" "--no-group" "--rsync-path" "sudo rsync" "-e" "ssh -p 2200 -o LogLevel=FATAL -o ControlMaster=auto -o ControlPath=/tmp/vagrant-rsync-20210426-292276-45mj5j -o ControlPersist=10m -o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i '/home/chris/.vagrant.d/insecure_private_key'" "--exclude" ".vagrant/" "/home/chris/projekte_git/vsr1k/" "vagrant@127.0.0.1:/vagrant"
Error: protocol version mismatch -- is your shell clean?
(see the rsync man page for an explanation)
rsync error: protocol incompatibility (code 2) at compat.c(178) [sender=3.1.3]
This looks like this is an issue with rsync that might need to be resolved on the guest.
INFO interface: info: ==> default: Rsyncing folder: /home/chris/projekte_git/vsr1k/ => /vagrant
It looks like you do have a synced folder defined in your Vagrantfile. Vagrant will try to guess at a good default synced folder type if one is not provided in the Vagrantfile. If you do need to sync this folder, you can try to set the virtualbox synced folder type.
Note, that mounting the folder on the guest won’t work because Vagrant doesn’t have the ability to run commands on the guest machine. So, this folder will be available on the guest in the same way as you would expect if you mounted the folder using the Virutalbox UI.