Task 13922890

Name	hadcm3n_u6uq_1980_40_007682111_1
Workunit	7837198
Created	15 Jan 2012, 21:25:36 UTC
Sent	15 Jan 2012, 21:25:41 UTC
Report deadline	16 Apr 2012, 4:52:52 UTC
Received	15 Feb 2012, 6:45:03 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1376550
Run time	29 days 17 hours 9 min 32 sec
CPU time	26 days 18 hours 34 min 38 sec
Validate state	Invalid
Credit	7,776.00
Device peak FLOPS	1.13 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 09:18:04 (3384): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:43:18 (7856): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:43:24 (7856): No heartbeat from core client for 30 sec - exiting 10:43:25 (7856): No heartbeat from core client for 30 sec - exiting 10:43:26 (7856): No heartbeat from core client for 30 sec - exiting 10:43:27 (7856): No heartbeat from core client for 30 sec - exiting 10:43:28 (7856): No heartbeat from core client for 30 sec - exiting 10:43:29 (7856): No heartbeat from core client for 30 sec - exiting 10:43:30 (7856): No heartbeat from core client for 30 sec - exiting 10:43:31 (7856): No heartbeat from core client for 30 sec - exiting 10:43:32 (7856): No heartbeat from core client for 30 sec - exiting 10:43:33 (7856): No heartbeat from core client for 30 sec - exiting 10:43:35 (7856): No heartbeat from core client for 30 sec - exiting 10:43:36 (7856): No heartbeat from core client for 30 sec - exiting 10:43:39 (7856): No heartbeat from core client for 30 sec - exiting 10:43:40 (7856): No heartbeat from core client for 30 sec - exiting 10:43:41 (7856): No heartbeat from core client for 30 sec - exiting 10:43:42 (7856): No heartbeat from core client for 30 sec - exiting 10:43:43 (7856): No heartbeat from core client for 30 sec - exiting 10:43:44 (7856): No heartbeat from core client for 30 sec - exiting 10:43:45 (7856): No heartbeat from core client for 30 sec - exiting 10:43:46 (7856): No heartbeat from core client for 30 sec - exiting 10:43:47 (7856): No heartbeat from core client for 30 sec - exiting 10:43:49 (7856): No heartbeat from core client for 30 sec - exiting 10:43:50 (7856): No heartbeat from core client for 30 sec - exiting 10:43:51 (7856): No heartbeat from core client for 30 sec - exiting 10:43:52 (7856): No heartbeat from core client for 30 sec - exiting 10:43:53 (7856): No heartbeat from core client for 30 sec - exiting 10:43:54 (7856): No heartbeat from core client for 30 sec - exiting 10:43:55 (7856): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3460, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6964, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2448, iMonCtr=1 Model crash detected, will try to restart... 01:44:16 (6672): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:44:17 (6672): No heartbeat from core client for 30 sec - exiting 01:44:19 (6672): No heartbeat from core client for 30 sec - exiting 01:44:20 (6672): No heartbeat from core client for 30 sec - exiting 01:44:21 (6672): No heartbeat from core client for 30 sec - exiting 01:44:22 (6672): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4352, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 02:11:06 (7052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2148, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5736, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5736, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5736, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5736, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5736, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5736, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
14 Feb 2012 17:21:56	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	648,000	2,274,554	3.5101
13 Feb 2012 15:11:40	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	622,080	2,184,300	3.5113
12 Feb 2012 15:51:10	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	596,160	2,104,776	3.5306
11 Feb 2012 19:18:20	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	570,240	2,033,359	3.5658
10 Feb 2012 20:07:33	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	544,320	1,953,379	3.5887
09 Feb 2012 20:08:36	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	518,400	1,871,605	3.6103
08 Feb 2012 11:51:31	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	492,480	1,769,710	3.5935
07 Feb 2012 04:57:18	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	466,560	1,676,950	3.5943
05 Feb 2012 22:18:21	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	440,640	1,582,100	3.5905
04 Feb 2012 19:03:53	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	414,720	1,489,687	3.5920
03 Feb 2012 15:53:07	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	388,800	1,398,453	3.5968
02 Feb 2012 12:47:19	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	362,880	1,305,694	3.5981
01 Feb 2012 09:35:45	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	336,960	1,213,967	3.6027
31 Jan 2012 07:30:56	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	311,040	1,120,698	3.6031
29 Jan 2012 20:24:50	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	285,120	1,024,580	3.5935
28 Jan 2012 17:07:50	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	259,200	932,829	3.5989
27 Jan 2012 09:29:31	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	233,280	831,342	3.5637
26 Jan 2012 19:38:40	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	207,360	738,908	3.5634
25 Jan 2012 19:22:24	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	181,440	647,053	3.5662
23 Jan 2012 19:18:22	1119324	13922890	hadcm3n_u6uq_1980_40_007682111_1	155,520	554,043	3.5625