Task 14839124

Name	hadam3p_pnw_bo4l_1989_1_008010068_1
Workunit	8165182
Created	24 Jun 2012, 13:52:29 UTC
Sent	24 Jun 2012, 13:52:39 UTC
Report deadline	6 Jun 2013, 19:12:39 UTC
Received	20 Jul 2012, 18:34:08 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	-226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS
Computer ID	1211096
Run time	5 days 6 hours 32 min 25 sec
CPU time	4 days 21 hours 51 min 20 sec
Validate state	Invalid
Credit	2,755.56
Device peak FLOPS	2.14 GFLOPS
Application version	UK Met Office HadAM3P-HadRM3P Pacific North West v6.09 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... 08:35:14 (2840): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4172, selfPID=3984, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2400, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=352, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5840, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4436, selfPID=4460, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5844, selfPID=5844, iMonCtr=2 16:00:18 (5700): No heartbeat from core client for 30 sec - exiting 16:00:19 (5700): No heartbeat from core client for 30 sec - exiting 16:00:20 (5700): No heartbeat from core client for 30 sec - exiting 16:00:21 (5700): No heartbeat from core client for 30 sec - exiting 16:00:22 (5700): No heartbeat from core client for 30 sec - exiting 16:00:23 (5700): No heartbeat from core client for 30 sec - exiting 16:00:24 (5700): No heartbeat from core client for 30 sec - exiting 16:00:25 (5700): No heartbeat from core client for 30 sec - exiting 16:00:26 (5700): No heartbeat from core client for 30 sec - exiting 16:00:27 (5700): No heartbeat from core client for 30 sec - exiting 16:00:28 (5700): No heartbeat from core client for 30 sec - exiting 16:00:29 (5700): No heartbeat from core client for 30 sec - exiting 16:00:30 (5700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 10:19:55 (3632): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=748, selfPID=748, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 08:14:57 (4136): No heartbeat from core client for 30 sec - exiting 08:14:58 (4136): No heartbeat from core client for 30 sec - exiting 08:14:59 (4136): No heartbeat from core client for 30 sec - exiting 08:15:00 (4136): No heartbeat from core client for 30 sec - exiting 08:15:02 (4136): No heartbeat from core client for 30 sec - exiting 08:15:03 (4136): No heartbeat from core client for 30 sec - exiting 08:15:04 (4136): No heartbeat from core client for 30 sec - exiting 08:15:05 (4136): No heartbeat from core client for 30 sec - exiting 08:15:07 (4136): No heartbeat from core client for 30 sec - exiting 08:15:08 (4136): No heartbeat from core client for 30 sec - exiting 08:15:09 (4136): No heartbeat from core client for 30 sec - exiting 08:15:10 (4136): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7320, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5536, selfPID=6124, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 9 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4544, selfPID=3464, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 10 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7000, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6920, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7360, selfPID=4996, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7176, selfPID=7540, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6984, selfPID=1176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2920, selfPID=5624, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
18 Jul 2012 19:21:47	1211096	14839124	hadam3p_pnw_bo4l_1989_1_008010068_1	126,816	407,840	3.2160
15 Jul 2012 00:55:51	1211096	14839124	hadam3p_pnw_bo4l_1989_1_008010068_1	115,296	371,272	3.2202
05 Jul 2012 18:13:07	1211096	14839124	hadam3p_pnw_bo4l_1989_1_008010068_1	103,776	333,834	3.2169
04 Jul 2012 19:03:59	1211096	14839124	hadam3p_pnw_bo4l_1989_1_008010068_1	92,256	296,517	3.2141
03 Jul 2012 14:24:05	1211096	14839124	hadam3p_pnw_bo4l_1989_1_008010068_1	80,736	259,870	3.2188
02 Jul 2012 16:34:01	1211096	14839124	hadam3p_pnw_bo4l_1989_1_008010068_1	69,216	223,713	3.2321
02 Jul 2012 14:18:14	1211096	14839124	hadam3p_pnw_bo4l_1989_1_008010068_1	57,700	187,992	3.2581
30 Jun 2012 12:45:18	1211096	14839124	hadam3p_pnw_bo4l_1989_1_008010068_1	57,696	187,581	3.2512
28 Jun 2012 22:44:11	1211096	14839124	hadam3p_pnw_bo4l_1989_1_008010068_1	46,176	149,307	3.2334
27 Jun 2012 22:28:45	1211096	14839124	hadam3p_pnw_bo4l_1989_1_008010068_1	34,656	111,102	3.2059
26 Jun 2012 19:34:24	1211096	14839124	hadam3p_pnw_bo4l_1989_1_008010068_1	23,136	74,011	3.1990
25 Jun 2012 21:22:15	1211096	14839124	hadam3p_pnw_bo4l_1989_1_008010068_1	11,616	37,047	3.1893