Task 15777990

Name	hadcm3n_3mup_1980_40_008333622_4
Workunit	8484483
Created	11 May 2013, 17:15:17 UTC
Sent	11 May 2013, 17:15:30 UTC
Report deadline	11 Aug 2013, 0:42:41 UTC
Received	23 Jun 2013, 21:06:46 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1166129
Run time	16 days 1 hours 0 min 52 sec
CPU time	12 days 10 hours 52 min 44 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	2.20 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> (unknown error) - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1696, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1696, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4388, iMonCtr=1 Model crash detected, will try to restart... 16:43:55 (4896): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:43:56 (4896): No heartbeat from core client for 30 sec - exiting 16:43:57 (4896): No heartbeat from core client for 30 sec - exiting 16:43:58 (4896): No heartbeat from core client for 30 sec - exiting 16:43:59 (4896): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4196, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4496, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5524, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6016, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1156, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1156, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4884, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4304, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4480, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CCController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4608, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4608, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4580, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4580, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4580, iMonCtr=1 Model crash detected, will try to restart... 10:10:28 (5488): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:10:29 (5488): No heartbeat from core client for 30 sec - exiting 10:10:30 (5488): No heartbeat from core client for 30 sec - exiting 10:10:31 (5488): No heartbeat from core client for 30 sec - exiting 10:10:32 (5488): No heartbeat from core client for 30 sec - exiting 10:10:33 (5488): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4344, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4344, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4612, iMonCtr=1 Model crash detected, will try to restart... 23:31:35 (5272): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:31:36 (5272): No heartbeat from core client for 30 sec - exiting 23:31:37 (5272): No heartbeat from core client for 30 sec - exiting 23:31:38 (5272): No heartbeat from core client for 30 sec - exiting 23:31:39 (5272): No heartbeat from core client for 30 sec - exiting 23:31:40 (5272): No heartbeat from core client for 30 sec - exiting 23:31:41 (5272): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5120, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5396, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6028, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6016, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5000, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4712, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5284, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5332, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5332, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5332, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5332, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5332, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5500, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5388, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5388, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5828, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5864, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5148, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5148, iMonCtr=1 Model crash detected, will try to restart... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x7741EB9D read attempt to address 0x40B60602 Engaging BOINC Windows Runtime Debugger... Cannot serialize file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_3mup_1980_40_008333622/dataout/shmem_restart.day Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
22 Jun 2013 21:06:19	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	518,400	1,075,962	2.0755
20 Jun 2013 13:04:51	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	492,480	1,022,810	2.0769
18 Jun 2013 14:33:27	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	466,560	969,095	2.0771
16 Jun 2013 18:29:29	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	440,640	914,995	2.0765
15 Jun 2013 02:27:08	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	414,720	861,752	2.0779
11 Jun 2013 22:00:08	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	388,800	807,282	2.0763
09 Jun 2013 21:47:04	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	362,880	751,125	2.0699
08 Jun 2013 17:02:33	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	336,960	695,326	2.0635
04 Jun 2013 22:40:53	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	311,040	640,397	2.0589
02 Jun 2013 15:41:12	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	285,120	585,466	2.0534
31 May 2013 23:56:29	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	259,200	529,093	2.0413
30 May 2013 17:12:48	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	233,280	474,839	2.0355
28 May 2013 00:50:45	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	207,360	421,019	2.0304
26 May 2013 01:17:59	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	181,440	380,898	2.0993
24 May 2013 02:30:47	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	155,520	327,308	2.1046
21 May 2013 19:05:39	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	129,600	272,865	2.1054
19 May 2013 17:10:35	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	103,680	217,053	2.0935
18 May 2013 00:05:10	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	77,760	162,287	2.0870
14 May 2013 19:02:53	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	51,840	106,252	2.0496
12 May 2013 23:39:07	1166129	15777990	hadcm3n_3mup_1980_40_008333622_4	25,920	54,148	2.0890