Task 12743440

Name	hadcm3n_o4gt_1900_40_007201120_1
Workunit	7399400
Created	28 Mar 2011, 14:10:24 UTC
Sent	30 Mar 2011, 13:14:59 UTC
Report deadline	29 Jun 2011, 20:42:10 UTC
Received	13 May 2011, 19:29:33 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	852771
Run time	24 days 8 hours 49 min 35 sec
CPU time	13 days 8 hours 29 min 23 sec
Validate state	Invalid
Credit	7,464.96
Device peak FLOPS	2.02 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> El dispositivo no reconoce el comando. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4940, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 22:30:49 (3752): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 21:13:05 (6008): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN procesController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3864, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 15:11:05 (1052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:11:06 (1052): No heartbeat from core client for 30 sec - exiting 15:11:07 (1052): No heartbeat from core client for 30 sec - exiting 15:11:08 (1052): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=944, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=344, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2392, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3788, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1300, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3656, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2956, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=552, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3060, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2184, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3172, iMonCtr=1 Model crash detected, will try to restart... 18:37:05 (3488): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:37:12 (3488): No heartbeat from core client for 30 sec - exiting 18:37:14 (3488): No heartbeat from core client for 30 sec - exiting 18:37:16 (3488): No heartbeat from core client for 30 sec - exiting 18:37:17 (3488): No heartbeat from core client for 30 sec - exiting 18:37:18 (3488): No heartbeat from core client for 30 sec - exiting 18:37:19 (3488): No heartbeat from core client for 30 sec - exiting 18:37:20 (3488): No heartbeat from core client for 30 sec - exiting 18:37:21 (3488): No heartbeat from core client for 30 sec - exiting 18:37:22 (3488): No heartbeat from core client for 30 sec - exiting 18:37:23 (3488): No heartbeat from core client for 30 sec - exiting 18:37:24 (3488): No heartbeat from core client for 30 sec - exiting 18:37:25 (3488): No heartbeat from core client for 30 sec - exiting 18:37:26 (3488): No heartbeat from core client for 30 sec - exiting 18:37:27 (3488): No heartbeat from core client for 30 sec - exiting 18:37:28 (3488): No heartbeat from core client for 30 sec - exiting 18:37:29 (3488): No heartbeat from core client for 30 sec - exiting 18:37:30 (3488): No heartbeat from core client for 30 sec - exiting 18:37:31 (3488): No heartbeat from core client for 30 sec - exiting 18:37:33 (3488): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3668, iMonCtr=1 Model crash detected, will try to restart... 12:41:07 (3052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:41:11 (3052): No heartbeat from core client for 30 sec - exiting 12:41:12 (3052): No heartbeat from core client for 30 sec - exiting 12:41:13 (3052): No heartbeat from core client for 30 sec - exiting 12:41:14 (3052): No heartbeat from core client for 30 sec - exiting 12:41:15 (3052): No heartbeat from core client for 30 sec - exiting 12:41:16 (3052): No heartbeat from core client for 30 sec - exiting 12:41:17 (3052): No heartbeat from core client for 30 sec - exiting 12:41:18 (3052): No heartbeat from core client for 30 sec - exiting 12:41:19 (3052): No heartbeat from core client for 30 sec - exiting 12:41:20 (3052): No heartbeat from core client for 30 sec - exiting 12:41:21 (3052): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3816, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1584, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=892, iMonCtr=1 Model crash detected, will try to restart... CSuspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1588, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1548, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1548, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1548, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1548, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1548, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1548, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
13 May 2011 01:38:18	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	622,080	1,151,157	1.8505
11 May 2011 01:34:38	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	596,160	1,102,726	1.8497
09 May 2011 19:36:47	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	570,240	1,053,588	1.8476
07 May 2011 19:13:24	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	544,320	1,005,232	1.8468
05 May 2011 13:08:22	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	518,400	956,775	1.8456
04 May 2011 00:32:08	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	492,480	905,608	1.8389
02 May 2011 13:43:26	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	466,560	856,545	1.8359
30 Apr 2011 17:20:45	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	440,640	808,153	1.8340
29 Apr 2011 10:03:40	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	414,720	761,526	1.8362
27 Apr 2011 14:12:34	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	388,800	712,533	1.8326
25 Apr 2011 15:59:14	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	362,880	665,299	1.8334
23 Apr 2011 15:12:50	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	336,960	618,207	1.8347
21 Apr 2011 13:01:11	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	311,040	569,871	1.8321
20 Apr 2011 15:37:25	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	285,120	523,561	1.8363
20 Apr 2011 15:37:25	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	259,200	475,585	1.8348
20 Apr 2011 15:37:25	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	233,280	428,210	1.8356
20 Apr 2011 15:37:25	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	207,360	381,178	1.8382
20 Apr 2011 15:37:25	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	181,440	334,336	1.8427
11 Apr 2011 19:27:42	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	155,520	287,112	1.8461
10 Apr 2011 00:32:05	852771	12743440	hadcm3n_o4gt_1900_40_007201120_1	129,600	237,710	1.8342