Task 16441731

Name	hadam3p_anz_n0og_2012_1_008576309_2
Workunit	8722821
Created	2 Apr 2014, 7:01:56 UTC
Sent	2 Apr 2014, 7:41:17 UTC
Report deadline	15 Mar 2015, 13:01:17 UTC
Received	6 May 2014, 13:54:05 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	0 (0x00000000)
Computer ID	1311870
Run time	5 days 5 hours 1 min 21 sec
CPU time	6 hours 43 min 43 sec
Validate state	Invalid
Credit	2,497.00
Device peak FLOPS	2.26 GFLOPS
Application version	UK Met Office HadAM3P-HadRM3P Australia New Zealand v6.10 windows_intelx86
Stderr	<core_client_version>7.2.33</core_client_version> <![CDATA[ <stderr_txt> Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4076, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3480, selfPID=2748, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 11:11:14 (5948): No heartbeat from core client for 30 sec - exiting 11:11:15 (5948): No heartbeat from core client for 30 sec - exiting 11:11:16 (5948): No heartbeat from core client for 30 sec - exiting 11:11:17 (5948): No heartbeat from core client for 30 sec - exiting 11:11:18 (5948): No heartbeat from core client for 30 sec - exiting 11:11:19 (5948): No heartbeat from core client for 30 sec - exiting 11:11:20 (5948): No heartbeat from core client for 30 sec - exiting 11:11:21 (5948): No heartbeat from core client for 30 sec - exiting 11:11:22 (5948): No heartbeat from core client for 30 sec - exiting 11:11:23 (5948): No heartbeat from core client for 30 sec - exiting 11:11:24 (5948): No heartbeat from core client for 30 sec - exiting 11:11:25 (5948): No heartbeat from core client for 30 sec - exiting 11:11:26 (5948): No heartbeat from core client for 30 sec - exiting 11:11:27 (5948): No heartbeat from core client for 30 sec - exiting 11:11:28 (5948): No heartbeat from core client for 30 sec - exiting 11:11:29 (5948): No heartbeat from core client for 30 sec - exiting 11:11:30 (5948): No heartbeat from core client for 30 sec - exiting 11:11:31 (5948): No heartbeat from core client for 30 sec - exiting 11:11:32 (5948): No heartbeat from core client for 30 sec - exiting 11:11:33 (5948): No heartbeat from core client for 30 sec - exiting 11:11:34 (5948): No heartbeat from core client for 30 sec - exiting 11:11:35 (5948): No heartbeat from core client for 30 sec - exiting 11:11:36 (5948): No heartbeat from core client for 30 sec - exiting 11:11:37 (5948): No heartbeat from core client for 30 sec - exiting 11:11:38 (5948): No heartbeat from core client for 30 sec - exiting 11:11:39 (5948): No heartbeat from core client for 30 sec - exiting 11:11:40 (5948): No heartbeat from core client for 30 sec - exiting 11:11:41 (5948): No heartbeat from core client for 30 sec - exiting 11:11:42 (5948): No heartbeat from core client for 30 sec - exiting 11:11:43 (5948): No heartbeat from core client for 30 sec - exiting 11:11:44 (5948): No heartbeat from core client for 30 sec - exiting 11:11:45 (5948): No heartbeat from core client for 30 sec - exiting 11:11:46 (5948): No heartbeat from core client for 30 sec - exiting 11:11:47 (5948): No heartbeat from core client for 30 sec - exiting 11:11:48 (5948): No heartbeat from core client for 30 sec - exiting 11:11:49 (5948): No heartbeat from core client for 30 sec - exiting 11:11:50 (5948): No heartbeat from core client for 30 sec - exiting 11:11:51 (5948): No heartbeat from core client for 30 sec - exiting 11:11:52 (5948): No heartbeat from core client for 30 sec - exiting 11:11:53 (5948): No heartbeat from core client for 30 sec - exiting 11:11:54 (5948): No heartbeat from core client for 30 sec - exiting 11:11:55 (5948): No heartbeat from core client for 30 sec - exiting 11:11:56 (5948): No heartbeat from core client for 30 sec - exiting 11:11:57 (5948): No heartbeat from core client for 30 sec - exiting 11:11:58 (5948): No heartbeat from core client for 30 sec - exiting 11:11:59 (5948): No heartbeat from core client for 30 sec - exiting 11:12:00 (5948): No heartbeat from core client for 30 sec - exiting 11:12:01 (5948): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5900, selfPID=4032, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5364, selfPID=5300, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5468, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4984, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 17:35:07 (4152): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6084, selfPID=6084, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2200, selfPID=5548, iMonCtr=1 Model crash detected, will try to restart... Global Worker :P: CPDN pro ess is not running, eng, bRetVal = l = 1, chPckPID=0, fPID=2ID=8, iMonCtr=2 r=ode l crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4660, selfPID=5288, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 09:31:57 (3260): No heartbeat from core client for 30 sec - exiting 09:31:58 (3260): No heartbeat from core client for 30 sec - exiting 09:31:59 (3260): No heartbeat from core client for 30 sec - exiting 09:32:00 (3260): No heartbeat from core client for 30 sec - exiting 09:32:01 (3260): No heartbeat from core client for 30 sec - exiting 09:32:02 (3260): No heartbeat from core client for 30 sec - exiting 09:32:03 (3260): No heartbeat from core client for 30 sec - exiting 09:32:04 (3260): No heartbeat from core client for 30 sec - exiting 09:32:05 (3260): No heartbeat from core client for 30 sec - exiting 09:32:06 (3260): No heartbeat from core client for 30 sec - exiting 09:32:07 (3260): No heartbeat from core client for 30 sec - exiting 09:32:08 (3260): No heartbeat from core client for 30 sec - exiting 09:32:09 (3260): No heartbeat from core client for 30 sec - exiting 09:32:10 (3260): No heartbeat from core client for 30 sec - exiting 09:32:11 (3260): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2908, iMonCtr=2 GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2916, selfPID=2564, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... 12:08:15 (3832): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:08:18 (3832): No heartbeat from core client for 30 sec - exiting 12:08:19 (3832): No heartbeat from core client for 30 sec - exiting 12:08:20 (3832): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=12012, selfPID=12012, iMonCtr=2 11:29:50 (3820): No heartbeat from core client for 30 sec - exiting 11:29:51 (3820): No heartbeat from core client for 30 sec - exiting 11:29:52 (3820): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4812, selfPID=2616, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4224, selfPID=3512, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3656, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5780, selfPID=6088, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 14:44:17 (4744): No heartbeat from core client for 30 sec - exiting 14:44:18 (4744): No heartbeat from core client for 30 sec - exiting 14:44:19 (4744): No heartbeat from core client for 30 sec - exiting 14:44:20 (4744): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2676, selfPID=2676, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2992, selfPID=2992, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5480, selfPID=4484, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadam3p_anz_n0og_2012_1_008576309/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadam3p_anz_n0og_2012_1_008576309/dataout/region_restart.day after 11 attempts Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO tmp/xaakm.pipe_dummy 2048 Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO tmp/xaakg.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_anz_n0og_2012_1_008576309_2_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n0og_2012_1_008576309_2_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n0og_2012_1_008576309_2_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n0og_2012_1_008576309_2_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n0og_2012_1_008576309_2_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n0og_2012_1_008576309_2_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n0og_2012_1_008576309_2_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
27 Apr 2014 15:28:03	1311870	16441731	hadam3p_anz_n0og_2012_1_008576309_2	57,899	248,726	4.2959
21 Apr 2014 13:27:07	1311870	16441731	hadam3p_anz_n0og_2012_1_008576309_2	46,379	199,441	4.3002
19 Apr 2014 19:44:43	1311870	16441731	hadam3p_anz_n0og_2012_1_008576309_2	34,859	150,814	4.3264
17 Apr 2014 15:23:54	1311870	16441731	hadam3p_anz_n0og_2012_1_008576309_2	23,339	101,109	4.3322
03 Apr 2014 19:12:01	1311870	16441731	hadam3p_anz_n0og_2012_1_008576309_2	11,819	51,982	4.3982