[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 0/2] Add xen-crashd.



On 11/29/13 05:26, Ian Campbell wrote:
On Fri, 2013-11-15 at 14:20 -0500, Don Slutz wrote:

  Ian Campbell:
    Add 1st pass on some documention on crash's remote protocol.
My concern with this was that we were using some sort of internal crash
protocol which has no ABI stability guarantees etc. Documenting it in
the Xen tree doesn't really do anything to alleviate that concern. It
should be a protocol which is published by the crash folks not us.
I have no issues with this. The only documentation I can find is:

 http://people.redhat.com/anderson/crash_whitepaper/


Ideally they would agree to some sort of protocol stability level, or
maybe you can show that the protocol had inbuilt backward and forward
compatibility capabilities already?
It may not have the best backwards and forwards compatibility that could be designed. However so far I have been able to add features to a newer crash that have no issues with older "crashd" servers. And older crash code works fine with the newer "crashd" servers. This is not the 1st one of these I have coded, just the 1st that I can release.
Even more concerning is [0] where one of the crash maintainers says:
It's been deprecated for almost 10 years now.  I don't understand how
you have been able to even get it to build, never mind work as the mail
thread indicates?
We surely don't want to be adding code which relies on a protocol which
has been deprecated for 10 years!
The main reason that I know of is that crash in active mode (i.e. running live on machine A), is just so much simpler to use that using a remote crash on machine B talking to a crashd on machine A. This is because the crashd on machine A is in "live" mode. This means that slow or unresponsive systems cannot be examined using the remote protocol. And keeping the right kernel versions on machine B that you need is just overhead.

With all this in mind, I was not surprised that it had been deprecated for 10 years. However with Xen in the mix, the machine A no longer needs to be active to run "crashd", in fact it can be paused, or running, or crashed, or shutdown, etc.


Daniel K asked about gdbsx -- can that not speak to crash somehow?
It is clearly possible to write a remote crash to remote gdb server, but needing to run 2 servers to connect up crash is to me too complex. I could also embed the xen-crashd code in gdbsx by adding command line options. However very little code would be shared. Since I based xen-crashd off of xenctx, it currently uses libxc calls. gdbsx uses ioctl() directly to do the hyper calls. It does not appear to support physical addresses. It does not appear to support virtual address to physical address conversion. Quoteing from the crash whitepaper:
Furthermore, to examine the contents of a live system's kernel internals from user space, the only readily available option has been to use gdb on /proc/kcore. While gdb is an incredibly powerful tool, it is designed to debug user programs, and is not at all "kernel-aware". Consequently, using gdb alone has limited usefulness when looking at kernel memory, essentially constrained to the printing of kernel data structures if the vmlinux file was built with the -g C flag, the disassembly of kernel text, and raw data dumps.

 Or
run on /proc/vmcore directly, or be extended to do so?
There is no /proc/vmcore in this case. Extending dom0 linux to provide /proc/1/vmcore, /proc/2/vmcore, etc. (I.E. /proc/<domid>/vmcore) would be a big change and designing a security model for these would also not be quick.

Maybe this will help:

[root@dcs-xen-54 tmp]# xl list
NameÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ IDÂÂ Mem VCPUsÂÂÂÂÂ StateÂÂ Time(s)
Domain-0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 0Â 2048ÂÂÂÂ 8ÂÂÂÂ r-----ÂÂÂ 3928.9
P-1-0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 1Â 3080ÂÂÂÂ 1ÂÂÂÂ -b----ÂÂÂÂÂ 18.0


[root@dcs-xen-54 tmp]# /usr/lib/xen/bin/xen-crashd 1&
[1] 1447
[root@dcs-xen-54 tmp]#Â 2 Dec 13 11:38:01.042 socket ready on port 5001 after 1 bind call

[root@dcs-xen-54 tmp]# crash --machdep phys_base=0x200000 localhost:5001 /usr/lib/debug/lib/modules/2.6.18-128.el5/vmlinux

crash 6.1.4
Copyright (C) 2002-2013Â Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010Â IBM Corporation
Copyright (C) 1999-2006Â Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012Â Fujitsu Limited
Copyright (C) 2006, 2007Â VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011Â NEC Corporation
Copyright (C) 1999, 2002, 2007Â Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002Â Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
Â
Â2 Dec 13 11:38:08.917 Accepted a connection.
WARNING: daemon cannot access /proc/version

NOTE: setting phys_base to: 0x200000

GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

ÂÂÂÂÂ KERNEL: /usr/lib/debug/lib/modules/2.6.18-128.el5/vmlinux
 DUMPFILE: /dev/mem@localhost (remote live system)
ÂÂÂÂÂÂÂ CPUS: 1
 DATE: Mon Dec 2 11:37:02 2013
ÂÂÂÂÂ UPTIME: 00:33:11
LOAD AVERAGE: 0.01, 0.00, 0.00
ÂÂÂÂÂÂ TASKS: 81
ÂÂÂ NODENAME: P-1-0.TC5.CloudSwitch.com
ÂÂÂÂ RELEASE: 2.6.18-128.el5
ÂÂÂÂ VERSION: #1 SMP Wed Jan 21 10:41:14 EST 2009
ÂÂÂÂ MACHINE: x86_64Â (2400 Mhz)
ÂÂÂÂÂ MEMORY: 3 GB
ÂÂÂÂÂÂÂÂ PID: 0
ÂÂÂÂ COMMAND: "swapper"
ÂÂÂÂÂÂÂ TASK: ffffffff802eeae0Â [THREAD_INFO: ffffffff803dc000]
ÂÂÂÂÂÂÂÂ CPU: 0
ÂÂÂÂÂÂ STATE: TASK_RUNNING (ACTIVE)

crash> net
ÂÂ NET_DEVICEÂÂÂÂ NAMEÂÂ IP ADDRESS(ES)
ffffffff80321e80Â loÂÂÂÂ 127.0.0.1
ffff8100babd9000Â eth1ÂÂ 172.16.64.65
ffff8100b6c96000Â sit0ÂÂ
crash> q

[1]+Â DoneÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /usr/lib/xen/bin/xen-crashd 1

Is almost the same as:

[root@dcs-xen-54 tmp]# xl dump-core 1 p-1-0.vmore
[root@dcs-xen-54 tmp]# crash p-1-0.vmore /usr/lib/debug/lib/modules/2.6.18-128.el5/vmlinuxÂÂÂÂÂÂ

crash 6.1.4
Copyright (C) 2002-2013Â Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010Â IBM Corporation
Copyright (C) 1999-2006Â Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012Â Fujitsu Limited
Copyright (C) 2006, 2007Â VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011Â NEC Corporation
Copyright (C) 1999, 2002, 2007Â Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002Â Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
Â
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

ÂÂÂÂÂ KERNEL: /usr/lib/debug/lib/modules/2.6.18-128.el5/vmlinux
ÂÂÂ DUMPFILE: p-1-0.vmore
ÂÂÂÂÂÂÂ CPUS: 1
 DATE: Mon Dec 2 11:05:09 2013
ÂÂÂÂÂ UPTIME: 00:01:18
LOAD AVERAGE: 2.00, 0.70, 0.24
ÂÂÂÂÂÂ TASKS: 81
ÂÂÂ NODENAME: P-1-0.TC5.CloudSwitch.com
ÂÂÂÂ RELEASE: 2.6.18-128.el5
ÂÂÂÂ VERSION: #1 SMP Wed Jan 21 10:41:14 EST 2009
ÂÂÂÂ MACHINE: x86_64Â (2400 Mhz)
ÂÂÂÂÂ MEMORY: 3 GB
ÂÂÂÂÂÂ PANIC: ""
ÂÂÂÂÂÂÂÂ PID: 0
ÂÂÂÂ COMMAND: "swapper"
ÂÂÂÂÂÂÂ TASK: ffffffff802eeae0Â [THREAD_INFO: ffffffff803dc000]
ÂÂÂÂÂÂÂÂ CPU: 0
ÂÂÂÂÂÂ STATE: TASK_RUNNING (ACTIVE)
ÂÂÂÂ WARNING: panic task not found

crash> net
ÂÂ NET_DEVICEÂÂÂÂ NAMEÂÂ IP ADDRESS(ES)
ffffffff80321e80Â loÂÂÂÂ 127.0.0.1
ffff8100babd9000Â eth1ÂÂ 172.16.64.65
ffff8100b6c96000Â sit0ÂÂ
crash> quit

With the changes in crash 7.0.4 (yet to be released), crash can be invoked in a remote "not live" mode, which is how it runs on a vmcore file.

So if a DomU is paused, "xl dump-core;crash" and "xen-crashd;crash" will give the exact same answers in a lot less real time (xen-crashd case).


ÂÂ -Don Slutz

Ian.

[0]
http://thread.gmane.org/gmane.linux.kernel.crash-dump.crash-utility/4714/focus=4736


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.