Census pollution when recycling IPs

We have encountered an interesting problem during development of Habitat services. I work alongside two colleagues and we each maintain separate bastion rings of supervisors. The rings are deployed in AWS, in shared subnets of a shared VPC. We each routinely add and remove services from our rings via Terraform apply/destroy.

We have observed configuration of our services changing unexpectedly and the census of one bastion ring containing service configuration intended for another.

We suspect that our bastion rings are holding on to ‘dead’ service group members by IP, and then ‘reviving’ them when they reappear, even though the IP has since been recycled by a colleague as part of their separate ring of services.

I wanted to open a conversation about this to determine whether it is expected behaviour and to see if anyone has encountered the same problem or foreseen it happening. For our intended implementation of Habitat, we will absolutely rely on recycling of IPs and the integrity of config exported by services in the ring.

I invite any comments. In the meantime, I’ll try to reliably reproduce the issue and provide some steps to reproduce.

If you get some repro steps it’d be good to file a bug report