Task 14939168

Name	hadcm3n_zmyj_1880_40_008026549_2
Workunit	8181663
Created	18 Jul 2012, 19:36:21 UTC
Sent	18 Jul 2012, 19:36:39 UTC
Report deadline	18 Oct 2012, 3:03:50 UTC
Received	6 Oct 2012, 2:13:39 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1107747
Run time	25 days 21 hours 27 min 38 sec
CPU time	25 days 2 hours 46 min 19 sec
Validate state	Invalid
Credit	11,508.48
Device peak FLOPS	2.80 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 03:47:30 (2656): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 11:46:26 (5892): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:45:22 (5024): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 03:44:19 (4352): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:43:14 (4320): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Restart file copy failed on zmyjka.da903n0 Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold 04:59:36 (4868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 05:05:28 (3188): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 05:32:44 (2908): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 05:43:49 (1784): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 21:28:32 (1360): No heartbeat from core client for 30 sec - exiting 21:28:33 (1360): No heartbeat from core client for 30 sec - exiting 21:28:34 (1360): No heartbeat from core client for 30 sec - exiting 21:28:35 (1360): No heartbeat from core client for 30 sec - exiting 21:28:37 (1360): No heartbeat from core client for 30 sec - exiting 21:28:38 (1360): No heartbeat from core client for 30 sec - exiting 21:28:39 (1360): No heartbeat from core client for 30 sec - exiting 21:28:40 (1360): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 02:44:36 (324): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:43:28 (200): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:42:19 (2744): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:41:11 (5100): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:40:02 (2988): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:38:57 (5168): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:37:49 (3544): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:36:44 (5012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:35:37 (3564): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 05:34:30 (2320): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:33:23 (2328): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:32:16 (4464): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:31:07 (3020): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:30:00 (1192): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:28:52 (588): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:27:49 (3164): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 02:26:41 (5876): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 05:25:36 (1268): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2976, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2976, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2976, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2976, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2976, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2976, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
02 Oct 2012 20:12:39	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	959,040	2,145,986	2.2376
18 Sep 2012 06:55:05	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	933,120	2,086,585	2.2361
17 Sep 2012 14:36:33	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	907,200	2,029,334	2.2369
16 Sep 2012 22:02:26	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	881,280	1,971,370	2.2369
16 Sep 2012 05:28:44	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	855,360	1,912,986	2.2365
15 Sep 2012 12:53:26	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	829,440	1,854,633	2.2360
14 Sep 2012 20:28:45	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	803,520	1,796,734	2.2361
14 Sep 2012 04:01:20	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	777,600	1,738,775	2.2361
27 Aug 2012 23:48:46	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	751,680	1,680,609	2.2358
23 Aug 2012 10:36:47	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	725,760	1,621,851	2.2347
22 Aug 2012 18:08:29	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	699,840	1,563,318	2.2338
22 Aug 2012 01:09:32	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	673,920	1,504,484	2.2324
21 Aug 2012 08:39:52	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	648,000	1,446,710	2.2326
20 Aug 2012 16:15:26	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	622,080	1,388,716	2.2324
17 Aug 2012 07:24:36	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	596,160	1,331,706	2.2338
10 Aug 2012 22:24:46	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	570,240	1,272,238	2.2311
10 Aug 2012 05:21:13	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	544,320	1,212,573	2.2277
09 Aug 2012 13:02:08	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	518,400	1,155,268	2.2285
08 Aug 2012 20:49:24	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	492,480	1,097,838	2.2292
08 Aug 2012 04:20:55	1107747	14939168	hadcm3n_zmyj_1880_40_008026549_2	466,560	1,039,703	2.2284