Boot stalls at SSH step

I’ve made a new VM - it’s the first time I’ve done one from scratch, so it’s very simple. Host is Windows 11, guest is Debian 12. The first time I did vagrant up it worked fine (but I don’t know how long it took, as I wasn’t watching). I successfully did vagrant ssh and installed some stuff. But the second time it timed out with this output:

C:\dev\Debian>vagrant up
Bringing machine ‘default’ up with ‘virtualbox’ provider…
==> default: Checking if box ‘debian/bookworm64’ version ‘12.20230723.1’ is up to date…
==> default: Clearing any previously set forwarded ports…
==> default: Clearing any previously set network interfaces…
==> default: Preparing network interfaces based on configuration…
default: Adapter 1: nat
default: Adapter 2: hostonly
==> default: Forwarding ports…
default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Booting VM…
==> default: Waiting for machine to boot. This may take a few minutes…
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
Timed out while waiting for the machine to boot. This means that
Vagrant was unable to communicate with the guest machine within
the configured (“config.vm.boot_timeout” value) time period.

If you look above, you should be able to see the error(s) that
Vagrant had when attempting to connect to the machine. These errors
are usually good hints as to what may be wrong.

If you’re using a custom box, make sure that networking is properly
working and you’re able to connect to the machine. It is a common
problem that networking isn’t setup properly in these boxes.
Verify that authentication configurations are also setup properly,
as well.

If the box appears to be booting properly, you may want to increase
the timeout (“config.vm.boot_timeout”) value.

Unfortunately there are no errors to be those “good hints” - it just stops progressing after the “SSH auth method” line and sits there. But the computer is not idle at all - Task Manager shows “VirtualBox Headless Frontend” chewing up about 20-30% of the CPU, even after the boot times out! I have a second VM (an old CentOS 7 one I’m trying to replace), but it behaves fine - it boots quickly and uses no CPU when not being asked to do something.

vagrant status says the timed-out VM is running, so even though the boot process didn’t complete, I have to do vagrant halt. When I do, I get this:

C:\dev\Debian>vagrant halt
==> default: Attempting graceful shutdown of VM…
default: Guest communication could not be established! This is usually because
default: SSH is not running, the authentication information was changed,
default: or some other networking issue. Vagrant will force halt, if
default: capable.
==> default: Forcing shutdown of VM…

C:\dev\Debian>

As suggested, I increased the timeout value to 600 seconds. The next time it booted, it took a long time but succeeded before 10 minutes. But since then I have not been able to boot it within 10 minutes. I could increase the timeout even more and perhaps get a better batting average, but there is clearly something wrong.

The only uncommented lines in my Vagrantfile are as follows:

Vagrant.configure("2") do |config|
  config.vm.box_download_options = {"ssl-revoke-best-effort" => true}
  config.vm.box = "debian/bookworm64"
  config.vm.boot_timeout = 600
  config.vm.network "private_network", ip: "192.168.56.100"
  config.vm.synced_folder "c:/dev/www", "/var/www", owner: "www-data", group: "www-data"
end

With no feedback, I don’t know what’s going on. Any suggestions for how to troubleshoot this?

Addendum: I increased config.vm.boot_timeout to 1200 (20 minutes), and it succeeded in booting the first time after that (I was not watching, so I don’t know how long it took). I then did vagrant ssh and was working on stuff for awhile (trying, so far in vain, to connect two VMs to copy files between them - that’s a different issue) when the CLI stopped taking any input from the keyboard. I had to close the window and re-open it, but then vagrant ssh just hung. vagrant reload said that once again it had to do a forceful shutdown because SSH was not running (same message as above), so apparently SSH died on its own somehow while I was using it - I have no idea why, as I still get no error messages about this stuff, but I suspect it’s related to the same mysterious root cause of the struggle to boot up. The subsequent boot timed out again - 20 minutes was not enough! I tried again at 20 minutes and it failed again, so now I’ve changed it to 30 minutes and am waiting, but this is ridiculous! Yes, the whole time it is trying (and if it fails, also afterwards until I halt it) the process uses 20-30% of the CPU and the PC’s fan revs up much of that time. I haven’t the foggiest idea what it’s doing - calculating more digits of pi? :roll_eyes: If it succeeds in booting, the CPU settles down to normal.

No one has any thoughts on this?

If you experience a long pause at the point in the process shown below. The VM is up and running, but vagrant is trying to establish a SSH session to it. However, sometimes it hangs indefinitely and eventually times out. The cause is unknown.

=> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2222
    default: SSH username: vagrant
   default: SSH auth method: private key

To free up the process before it times out, perform the following steps:

 1. Open `Oracle VM VirtualBox Manager`.
 2. Click on the name of your VM machine in the left panel. The name starts with the name of the root folder of your repository.
 3. If the VM is running, you will see a green right arrow called `Show`. It opens a new panel where you can monitor the boot process of your VM. This should free up the vagrant session and then the booting up should continue to completion.
 4. Once the vagrant session is up, you can close down the panel, without shutting down the VM, by clicking menu` Machine/Detach GUI`.

I finally got back to this (I was busy for a couple weeks with many unrelated projects). Indeed, ilearner’s trick works - thanks!

But that’s just a workaround. Does anyone know of true causes and/or solutions? Is it particular boxes that have the problem, or particular Vagrantfile setups? My older VM never had this problem, but my new one has it almost every time.