This page describes tips, troubleshooting, and known issues that you might find helpful if you run into problems using Google Compute Engine.
Contents
General tips
- Viewing different response format
-
Viewing
gcloud compute
logs - Selecting resource names
- Communicating between your instances and the Internet
- Accessing Google Compute Engine as a different ssh user
- Setting non-OAuth2 Access Key and secret credentials for gsutil/boto with Google Cloud Storage
Viewing different response formats
gcloud compute
performs most of its actions by making REST API calls. The
pretty-printed results show only the most important information returned by any
specific command. To see the different response output formats, use the
--format
flag which displays the format in a different format than the
pretty-printed version. Different output formats include
json
,
yaml
, and
text
.
For example, to see a list of instances in JSON, use
--format json
:
$ gcloud compute instances list --format json
Viewing gcloud compute logs
gcloud compute
creates and stores logs in a log file that you can query, located
at $HOME/.config/gcloud/logs. To see the latest log file on a Linux-based operating
system, run:
$ less $(find ~/.config/gcloud/logs | sort | tail -n 1)
The log file includes information about all requests and responses made using
the
gcloud compute
tool.
Selecting resource names
When selecting names for your resources, keep in mind that these friendly-names may be visible on support and operational dashboards within Google Compute Engine. For this reason, it is recommended that resource names that do not expose any sensitive information.
Communicating between your instances and the Internet
An instance has direct Internet access only if it has an external IP address . An instance with an external IP can always initiate connections to the Internet. It can also receive connections, provided that a firewall rule is configured to allow access. You can add a custom firewall rule to the default network, or add a new network with custom firewalls. In addition, you can set up a network proxy within the virtual network environment in order to provide proxied access from an instance without an external IP address.
Note that a system-wide "hidden" firewall rule is set to disconnect idle TCP connections after 10 minutes. If your instance initiates or accepts long-lived connections with an external host, you can adjust TCP keep-alive settings to prevent these timeouts from dropping connections. You can configure the keep-alive settings on the Compute Engine instance, your external client, or both, depending on the host that typically initiates the connection. You should set the keep-alives to less than 600 seconds to ensure that connections are refreshed before the timeout occurs. The following examples sets the keep-alives to one minute (60 seconds).
Compute Engine instance or Linux client
Run the following command:
sudo /sbin/sysctl -w net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=5To ensure that the settings survive a reboot, add the settings to your /etc/sysctl.conf file.
Mac OSX client
Run the following command:
sudo sysctl -w net.inet.tcp.always_keepalive=1 net.inet.tcp.keepidle=60000 net.inet.tcp.keepinit=60000 net.inet.tcp.keepintvl=60000
Windows client
Under the registry path
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
, add the following settings, using the
DWORD
data type, or edit the values if the settings already exist:
KeepAliveInterval: 1000 KeepAliveTime: 60000 TcpMaxDataRetransmissions: 10
Accessing Google Compute Engine as a different ssh user
By default,
gcloud compute
uses the
$USER
variable to add users to the
/etc/passwd
file for ssh'ing. You can specify a different user by
including the
-ssh-key-file SSH_KEY_FILE]
flag when ssh'ing into your
instance.
Setting non-OAuth2 Access Key and secret credentials for gsutil/boto with Google Cloud Storage
The Google Compute Engine standard images have a boto configuration that enables automatic usage of service accounts. If you want to disable this and revert to Interoperable Storage Access Keys , add this to your .boto file:
[GSUtil]
default_api_version = 1
default_project_id = PROJECT_NUMBER
For more information, see Enabling API v1.0 access for Google Cloud Storage.
Troubleshooting
- My persistent disk doesn't boot.
-
What does it mean for my instance to be in
TERMINATED
state? - Why was my instance terminated with status "Planned termination by system"?
- Why is network traffic to my instance being dropped?
- SSH errors
My persistent disk doesn't boot. What can I do?
Here are some tips to help troubleshoot your persistent boot disk if it doesn't boot.
-
Examine your virtual machine instance's serial port output.
An instance's BIOS, bootloader, and kernel will print their debug messages into the instance's serial port output, providing valuable information about any errors or issues that the instance experienced. To get your serial port information, run:
$ gcloud compute instances get-serial-port-output INSTANCE
You can also access this information in the Google Developers Console .
-
Validate that your disk has a valid filesystem.
If your filesystem is corrupted or otherwise invalid, you won't be able to launch your instance. Validate your disk's filesystem:
-
Start an instance using the latest Google-provided image:
$ gcloud compute instances create INSTANCE --image debian-7
-
Attach your disk as a non-boot disk but don't mount it:
$ gcloud compute instances attach-disk INSTANCE --disk DISK
-
ssh into your instance:
$ gcloud compute ssh INSTANCE
-
Run the filesystem check.
$ user@myinst:~$ sudo fsck DEVICE_FILE fsck from util-linux 2.20.1 e2fsck 1.42.5 (29-Jul-2012) /: clean, 19829/655360 files, 208111/2621184 blocks
-
Mount your disk:
$ user@myinst:~$ sudo mkdir /mydisk $ user@myinst:~$ sudo mount DEVICE_FILE /mydisk
-
Check that the disk has kernel files:
$ user@myinst~:$ ls /mydisk/boot/vmlinuz-* /mydisk/boot/vmlinuz-3.2.0-4-amd64
-
-
Validate that your disk has a valid master boot record (MBR) .
Run the following command on an instance that is using the persistent boot disk in question:
$ sudo dd if=DEVICE_FILE bs=512 count=1 | xxd
If your MBR is valid, it should have the following last two bytes:
0x55 0xAA
.
What does it mean for my instance to be in
TERMINATED
state?
If you shut down your instance using
sudo shutdown
or
sudo poweroff
,
it is the equivalent of terminating it. There is no way to "freeze" an instance
and restart it at a later time. You must recreate your instance if you choose
to shut it down. When an instance is shut down from inside, it goes into the
TERMINATED
state but will still appear in the API (such as when
you list instances). To remove it from the list, you must delete the instance
explicitly. However, uptime for a
TERMINATED
instance is not billed.
Why was my instance terminated with status "Planned termination by system"?
A "Planned termination by system" status means that your instance lived in a zone that was scheduled for maintenance and has been terminated since that maintenance window went into effect.
Why is network traffic to/from my instance being dropped?
Google Compute Engine only allows network traffic that is explicitly permitted by your project's firewall rules to reach your instance. By default, all projects automatically come with a default network that only allows SSH or internal Compute Engine traffic. If you add any new networks , make sure you set up the appropriate firewall rules to allow network traffic because new networks deny all traffic by default, including SSH. For more information, see Networking page.
In addition, you may need to adjust TCP keep-alive settings to work around the default idle connection timeout of 10 minutes. For more information, see Communicating between your instances and the Internet .
SSH Errors
Under certain conditions, it is possible a Google Compute Engine instance will no longer accept SSH connections. There are many reasons this could happen, from a full disk to an accidental misconfiguration of sshd. If this happens, accessing the instance can be quite challenging. This section describes a number of tips and approaches to troubleshoot and resolve common ssh issues.
Check your firewall rules
Google Compute Engine provisions each project with a default set of firewall
rules which permit ssh traffic. If the default firewall rule that permits ssh
connections is somehow removed, you'll be unable to access your instance. Check
your list of firewalls with
gcloud compute
and ensure the
default-allow-ssh
rule is present. If it is missing, add it back:
user@local:~$ gcloud compute firewall-rules list
NAME NETWORK SRC_RANGES RULES SRC_TAGS TARGET_TAGS
default-allow-icmp default 0.0.0.0/0 icmp
default-allow-internal default 10.240.0.0/16 tcp:1-65535,udp:1-65535,icmp
user@local:~$ gcloud compute firewall-rules create default-allow-ssh --allow tcp:22
Test the network
You can use the netcat tool to connect to your instance on port 22, and see
if the network connection is working. If you connect and see an ssh banner
(e.g.
SSH-2.0-OpenSSH_6.0p1 Debian-4
), your network connection is working, and
you can rule out firewall problems:
user@local:~$ gcloud compute instances describe example-instance --format yaml | grep natIP
natIP: 108.59.82.95
user@local:~$ nc 173.255.115.70 22 # Check for SSH banner
SSH-2.0-OpenSSH_6.0p1 Debian-4
Try a fresh user
The issue that prevents you from logging in may be limited to your account
(e.g. if the permissions on your
~/.ssh/authorized_keys
file were set incorrectly).
The first thing to try is creating a new account on the machine. Because
gcloud compute
sets up keys and accounts based on your username, the easiest way to do this
is to create a new instance (using a f1-micro machine type is fine), log in,
add a new user, and switch to this user's account. Then, you can use
gcloud compute
to try to ssh to your existing instance. If this works, you will be able to use
this new account to fix the permissions on your primary user's account.
user@local:~$ gcloud compute instances create temp-machine --scopes compute-rw
...
user@local:~$ gcloud compute ssh temp-machine
...
user@temp-instance:~$ sudo useradd -m tempuser
user@temp-instance:~$ sudo su - tempuser
user@temp-instance:~$ gcloud compute ssh your-instance
Mount your disk on a temporary instance
If the above set of steps doesn't work for you, and the instance you're interested in is booted from a persistent disk, you can detach the persistent disk and attach this disk to another machine. On the temporary machine, you can mount it and determine what prevented your ssh connection from working, and finally recreate the original instance with the same boot disk.
user@local:~$ gcloud compute instances detach-disk INSTANCE --disk mydisk
...
user@local:~$ gcloud compute instances create debugger --disk mydisk,boot --no-boot-disk-auto-delete
....
user@local:~$ gcloud compute ssh debugger
...
user@debugger:~$ sudo su -
user@debugger:~$ mkdir /mnt/myinstance
user@debugger:~$ mount /dev/disk/by-id/scsi-0Google_PersistentDisk_boot-myinstance /mnt/myinstance
user@debugger:~$ cd /mnt/myinstance/var/log
user@debugger:~$ ls # Identify the issue preventing ssh from working
user@debugger:~$ exit
user@local:~$ gcloud compute instances delete debugger # Delete the debugging instance
user@local:~$ gcloud compute instances create myoldinstance --disk mydisk,boot # Re-add your instance and persistent disk
Inspect an instance without shutting it down
You may have an instance you can't ssh to that continues to correctly serve production traffic. In this case, you may wish to inspect its disk without interrupting its ability to serve users. The steps here are similar to the previous section, but you'll make use of persistent disk snapshots. First, take a snapshot of the instance's boot disk, then create a new disk from that snapshot, create a temporary instance, and finally attach and mount the new persistent disk to your temporary instance.
user@local:~$ gcloud compute disks snapshot example-disk --snapshot-name example-disk-snapshot
...
user@local:~$ gcloud compute disks create example-disk-debugging --source-snapshot example-disk-snapshot
...
user@local:~$ gcloud compute instances create debugger
...
user@local:~$ gcloud compute instances attach-disk debugger --disk example-disk-debugging
...
user@local:~$ gcloud ssh debugger
...
user@debugger:~$ sudo su -
user@debugger:~$ mkdir /mnt/myinstance
user@debugger:~$ mount /dev/disk/by-id/scsi-0Google_PersistentDisk_example-disk-debugging /mnt/myinstance
user@debugger:~$ cd /mnt/myinstance/var/log
user@debugger:~$ ls # Identify the issue preventing ssh from working
If none of the above helped, you can create a startup script to collect
information right after boot time. In order to do this, you will need to
update your instance metadata
to use a
startup-script-url
.
Afterwards, you will also need to reset your instance before the metadata will
take affect using
gcloud compute instances reset
.
Alternatively, you can also recreate your instance with a diagnostic startup
script:
-
Run
gcloud compute instances delete
with the--keep-disks
flag.$ gcloud compute instances delete INSTANCE --keep-disks boot
-
Add a new instance with the same disk and specify your startup script.
$ gcloud compute instances create example-instance --disk name=DISK boot=yes --startup-script-url URL
As a starting point, you can use the compute-ssh-diagnostic script to collect diagnostics information for most common issues.
Known Issues
CentOS image
v20131120
introduced a breaking change where iptables are turned on by default.
The
v20131120
release of CentOS 6 image,
centos-6-v20131120
,
has a breaking change where
iptables
are turned on by default. This prevents external traffic from reaching CentOS
instances that are running
centos-6-v20131120
, even if there is
a relevant
Firewall Rule resource
permitting the connection.
As a workaround, users will need to disable iptables or update iptables to permit the desired connection (in addition to permitting the traffic using firewall rules). To disable iptables, run:
# Save your iptable settings
user@centos-instance:~$ sudo service iptables save
# Stop the iptables service
user@centos-instance:~$ sudo service iptables stop
# Disable iptables on start up
user@centos-instance:~$ sudo chkconfig iptables off
To update iptables, review the iptables documentation.
Google-provided images have known bug with the ext4/scsi driver in the stable Debian and CentOS kernels
A known ext4 bug may cause memory leak and eventual crash of a virtual machine instance under heavy persistent disk load. Both centos-6-v20131120 and debian-7-wheezy-v20131120 images are affected. For details, please refer to this Linux Kernel Mailing list thread .
As a workaround, you can use the Debian backport image which contains fixes to this bug.
Instance names longer than 32 characters can cause problems with various UNIX tools.
Date Reported: June 2012
Although instance names can be up to 63 characters, names that are longer than 32 characters may cause some tools to be unreliable, including tools that may run during boot. As a workaround, choose instance names that are shorter than 32 characters.