Red Hat Ranch outage - tests failing [Errno -2] Name or service not known

November 30, 2022 at 9:00 AM UTC

Resolved 16:40 UTC - Outage was resolved. There were multiple issues here - incorrect DNS settings, code sensitive on DNS resolution, podman not using host networking.

Update 14:40 UTC - We are testing a woraround to be able to enable back the service. Fingers crossed.

Update 13:40 UTC - Seems we found a setup which makes the DNS resolving more reliable (removing aadvark-dns) and a bug from the latest release, making this error not being retried. As for why it worked before ….

Reopened 11:45 UTC - The IT DNS was a false lead. Seems we are hitting some podman DNS bug. Investigation continues.

Resolved 11:45 UTC - As it turned out, the VPC we used for years was using a DNS setup that can be prone to problems. After migrating to another VPC the problem is gone.

Update 10:30 UTC - The problems have been identified, currently it seems IT DNS problems causing random resolution errors. We are testing a workaround and working with IT to resolve the problem.

Outage 8:00 UTC - We are investigating some DNS problems on the workers. All tests might fail with Failed to establish a new connection: [Errno -2] Name or service not known'.

Last updated: June 2, 2025 at 11:36 PM UTC