I am using Kubernetes in Google Cloud (GKE).
I have an application that is hoarding memory I need to take a process dump as indicated here. Kubernetes is going to kill the pod when it gets to the 512Mb of RAM.
So I connect to the pod
# kubectl exec -it stuff-7d8c5598ff-2kchk /bin/bash
And run:
# apt-get update && apt-get install procps && apt-get install gdb
Find the process I want:
root@stuff-7d8c5598ff-2kchk:/app# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 4.6 2.8 5318004 440268 ? SLsl Oct11 532:18 dotnet stuff.Web.dll root 114576 0.0 0.0 18212 3192 ? Ss 17:23 0:00 /bin/bash root 114583 0.0 0.0 36640 2844 ? R+ 17:23 0:00 ps aux
But when I try to dump…
root@stuff-7d8c5598ff-2kchk:/app# gcore 1 ptrace: Operation not permitted. You can't do that without a process to debug. The program is not being run. gcore: failed to create core.1
I tried several solutions like these, that always ends in the same result:
root@stuff-7d8c5598ff-2kchk:/app# echo 0 > proc/sys/kernel/yama/ptrace_scope bash: /proc/sys/kernel/yama/ptrace_scope: Read-only file system
I cannot find the way to connect to the pod and deal with this ptrace thing. I found that docker has a --privileged
switch, but I cannot find anything similar for kubectl.
UPDATE I found how to enable PTRACE:
apiVersion: v1 kind: Pod metadata: name: <your-pod> spec: shareProcessNamespace: true containers: - name: containerB image: <your-debugger-image> securityContext: capabilities: add: - SYS_PTRACE
Get the process dump:
root@stuff-6cd8848797-klrwr:/app# gcore 1 [New LWP 9] [New LWP 10] [New LWP 13] [New LWP 14] [New LWP 15] [New LWP 16] [New LWP 17] [New LWP 18] [New LWP 19] [New LWP 20] [New LWP 22] [New LWP 24] [New LWP 25] [New LWP 27] [New LWP 74] [New LWP 100] [New LWP 753] [New LWP 756] [New LWP 765] [New LWP 772] [New LWP 814] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 185 ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory. warning: target file /proc/1/cmdline contained unexpected null characters Saved corefile core.1
Funny thing, I cannot find lldb-3.6, so I install the lldb-3.8:
root@stuff-6cd8848797-klrwr:/app# apt-get update && apt-get install lldb-3 .6 Hit:1 http://security.debian.org/debian-security stretch/updates InRelease Ign:2 http://cdn-fastly.deb.debian.org/debian stretch InRelease Hit:3 http://cdn-fastly.deb.debian.org/debian stretch-updates InRelease Hit:4 http://cdn-fastly.deb.debian.org/debian stretch Release Reading package lists... Done Reading package lists... Done Building dependency tree Reading state information... Done Note, selecting 'python-lldb-3.6' for regex 'lldb-3.6' 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Find SOS plugin:
root@stuff-6cd8848797-klrwr:/app# find /usr -name libsosplugin.so /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.5/libsosplugin.so
Run lldb…
root@stuff-6cd8848797-klrwr:/app# lldb `which dotnet` -c core.1 (lldb) target create "/usr/bin/dotnet" --core "core.1"
But it gets tuck forever, the prompt never gets to (lldb)
ever again…
Advertisement
Answer
I had similar issue. Try installing a correct version of LLDB. SOS plugin from specific dotnet version is linked to a specific version of LLDB. For example dotnet 2.0.5 is linked with LLDB 3.6, v.2.1.5 is linked with LLDB 3.9. Also this document might be helpful: Debugging CoreCLR
Note not all versions of LLDB are available for some OS. For example LLDB 3.6 is unavailable on Debian but available on Ubuntu.