How do you handle Proxmox clusters when you only have 1 or 2 servers?

I technically have 3 servers but I keep one offline because I don’t need it 24/7 most point wasting power on a server I don’t need.

I believe I read somewhere that you can force Proxmox it to a lower number but it isn’t recommended. Has anyone done this and if so have you run into any issues with this?

My main issue is I want my VM to start no matter what. For example I had a power outage. When the servers came back online instead of starting they waited for the quorum number to reach 3. (it will never reach 3 because the third server wasn’t turn on.) so they just waited forever until I got home and ran

pvecm expected 2

  • I have 2 nodes and a raspberry pi as a qdevice.
    I can still power off 1 node (so I have 1 node and an rpi) if I want to.
    To avoid split brain, if a node can see the qdevice then it is part of the cluster. If it can’t, then the node is in a degraded state.
    Qdevices are only recommended in some scenarios, which I can’t remember off the top of my head.

    With 2 nodes, you can’t set up CEPH cluster (well, I don’t think you can).
    But you can set up High Availability, and use ZFS snapshot replication on a 5 minute interval (so, if your VMs host goes down, the other host can start it with a potentially outdated snapshot).

    This worked for my project as I could have a few stateless services that could bounce between nodes, and I had a postgres VM with streaming replication (postgres not ZFS) and failover. Which lead to a decently fault tolerant setup.

    • I will have to look into the qdevice. I do have an old PI3 setup as a software defined radio. I might be able to also set it up as a qdevice.

      https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support

      Looking at the documentation it isn’t recommended to use a a qdevice in a odd number node. I guess I technically have.

      If the QNet daemon itself fails, no other node may fail or the cluster immediately loses quorum. For example, in a cluster with 15 nodes, 7 could fail before the cluster becomes inquorate. But, if a QDevice is configured here and it itself fails, no single node of the 15 may fail. The QDevice acts almost as a single point of failure in this case.

      But it seems to be more of an issue in large node clusters. In my situation I don’t think this is a big deal because if the qdevice fails and my third server is offline I am in the same situation I am now.

      Just out of ceriosity do you backup your PI at all? Not sure what the recovery process is if the Qdevice fails how easy is it to replace resetup.