У нас есть виртуальная машина в Compute Engine с CentOs 8. Эта виртуальная машина работает в течение длительного времени и никогда не перезагружается. Прошлой ночью мы внезапно потеряли подключение к виртуальной машине через внутренние и внешние IP-адреса. SSH также был невозможен.
На последовательном порту мы наблюдали следующие журналы:
Apr 29 15:53:18 <vm-name> google_osconfig_agent[1215]: default
Apr 29 16:05:18 <vm-name> google_osconfig_agent[1215]: default
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3120] dhcp4 (eth0): option dhcp_lease_time => '3600'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3124] dhcp4 (eth0): option domain_name => 'us-central1-a.c.<project-name>.internal'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3124] dhcp4 (eth0): option domain_name_servers => '169.254.169.254'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3124] dhcp4 (eth0): option domain_search => 'us-central1-a.c.<project-name>.internal c.<project-name>.internal google.internal'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3124] dhcp4 (eth0): option expiry => '1619716266'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3124] dhcp4 (eth0): option host_name => '<vm-name>.us-central1-a.c.<project-name>.internal'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3124] dhcp4 (eth0): option interface_mtu => '1460'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3124] dhcp4 (eth0): option ip_address => '10.128.0.4'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3124] dhcp4 (eth0): option next_server => '10.128.0.1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3124] dhcp4 (eth0): option ntp_servers => '169.254.169.254'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3124] dhcp4 (eth0): option requested_broadcast_address => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3124] dhcp4 (eth0): option requested_domain_name => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3125] dhcp4 (eth0): option requested_domain_name_servers => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3125] dhcp4 (eth0): option requested_domain_search => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3125] dhcp4 (eth0): option requested_host_name => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3125] dhcp4 (eth0): option requested_interface_mtu => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3125] dhcp4 (eth0): option requested_ms_classless_static_routes => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3125] dhcp4 (eth0): option requested_nis_domain => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3125] dhcp4 (eth0): option requested_nis_servers => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3125] dhcp4 (eth0): option requested_ntp_servers => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3125] dhcp4 (eth0): option requested_rfc3442_classless_static_routes => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3125] dhcp4 (eth0): option requested_root_path => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3126] dhcp4 (eth0): option requested_routers => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3126] dhcp4 (eth0): option requested_static_routes => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3126] dhcp4 (eth0): option requested_subnet_mask => '1'
Apr 29 16:11:06 <vm-name> dbus-daemon[827]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.8' (uid=0 pid=907 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0")
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3126] dhcp4 (eth0): option requested_time_offset => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3126] dhcp4 (eth0): option requested_wpad => '1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3126] dhcp4 (eth0): option rfc3442_classless_static_routes => '10.128.0.1/32 0.0.0.0 0.0.0.0/0 10.128.0.1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3126] dhcp4 (eth0): option routers => '10.128.0.1'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3126] dhcp4 (eth0): option subnet_mask => '255.255.255.255'
Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info> [1619712666.3126] dhcp4 (eth0): state changed extended -> extended
Apr 29 16:11:06 <vm-name> systemd[1]: Starting Network Manager Script Dispatcher Service...
Apr 29 16:11:06 <vm-name> dbus-daemon[827]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Apr 29 16:11:06 <vm-name> systemd[1]: Started Network Manager Script Dispatcher Service.
Apr 29 16:15:18 <vm-name> google_osconfig_agent[1215]: default
Apr 29 16:29:30 <vm-name> GCEGuestAgent[1269]: 2021-04-29T16:28:35.1302Z GCEGuestAgent Error main.go:181: Error watching metadata: Get http://metadata.google.internal/computeMetadata/v1//?recursive=true&alt=json&wait_for_change=true&timeout_sec=60&last_etag=4ac15b8179731d72: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Apr 29 16:29:31 <vm-name> OSConfigAgent[1215]: 2021-04-29T16:29:03.9464Z OSConfigAgent Error main.go:189: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: i/o timeout
Apr 29 16:29:44 <vm-name> google_osconfig_agent[1215]: default
[16281276.978048] google_guest_agent[1269]: 2021/04/29 16:41:10 logging client: context deadline exceeded
Apr 29 17:11:14 <vm-name> google_guest_agent[1269]: 2021/04/29 16:41:10 logging client: context deadline exceeded
Apr 29 17:23:19 <vm-name> google_osconfig_agent[1215]: 2021/04/29 16:42:42 logging client: context deadline exceeded
Apr 29 17:23:19 <vm-name> google_osconfig_agent[1215]: default
Apr 29 17:23:19 <vm-name> google_osconfig_agent[1215]: 2021/04/29 17:03:31 logging client: context deadline exceeded
Apr 29 17:23:19 <vm-name> google_osconfig_agent[1215]: default
Apr 29 17:27:20 <vm-name> OSConfigAgent[1215]: 2021-04-29T16:41:09.0306Z OSConfigAgent Error main.go:189: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: i/o timeout
Apr 29 17:27:20 <vm-name> NetworkManager[907]: <info> [1619717130.6935] dhcp4 (eth0): option dhcp_lease_time => '3600'
Apr 29 17:27:20 <vm-name> google_osconfig_agent[1215]: 2021/04/29 17:26:00 logging client: context deadline exceeded
Apr 29 17:27:20 <vm-name> OSConfigAgent[1215]: 2021-04-29T17:07:12.7039Z OSConfigAgent Error main.go:189: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: i/o timeout
Apr 29 17:27:20 <vm-name> OSConfigAgent[1215]: 2021-04-29T17:22:13.2740Z OSConfigAgent Error main.go:189: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: connect: network is unreachable
Apr 29 17:28:55 <vm-name> NetworkManager[907]: <info> [1619717199.5681] dhcp4 (eth0): option domain_name => 'us-central1-a.c.<project-name>.internal'
Apr 29 17:34:48 <vm-name> OSConfigAgent[1215]: 2021-04-29T17:27:54.8816Z OSConfigAgent Error main.go:189: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: connect: network is unreachable
Apr 29 17:39:16 <vm-name> google_osconfig_agent[1215]: default
Apr 29 17:39:16 <vm-name> google_osconfig_agent[1215]: 2021/04/29 17:36:13 logging client: context deadline exceeded
Apr 29 17:39:16 <vm-name> google_osconfig_agent[1215]: default
Apr 29 17:55:16 <vm-name> NetworkManager[907]: <info> [1619717287.9321] dhcp4 (eth0): option domain_name_servers => '169.254.169.254'
Apr 29 17:58:36 <vm-name> OSConfigAgent[1215]: 2021-04-29T17:30:47.0872Z OSConfigAgent Error main.go:189: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: connect: network is unreachable
Apr 29 18:03:20 <vm-name> google_osconfig_agent[1215]: default
Apr 29 18:03:20 <vm-name> google_osconfig_agent[1215]: 2021/04/29 17:46:22 logging client: context deadline exceeded
Перезапуск машины восстановил сеть. Больше ничего в логах найти не можем, да и мониторинг ничего подозрительного не показывает. Что могло быть причиной этого?