Task 18363710

Name	hadam3p_anz_r897_2012_1_008740313_2
Workunit	8886291
Created	26 Apr 2015, 11:22:45 UTC
Sent	27 Apr 2015, 15:57:24 UTC
Report deadline	8 Apr 2016, 21:17:24 UTC
Received	10 May 2016, 19:22:59 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	0 (0x00000000)
Computer ID	892329
Run time	7 days 16 hours 37 min 10 sec
CPU time	5 days 21 hours 17 min 8 sec
Validate state	Invalid
Credit	3,490.64
Device peak FLOPS	2.65 GFLOPS
Application version	UK Met Office HadAM3P-HadRM3P Australia New Zealand v6.10 windows_intelx86
Stderr	<core_client_version>7.4.42</core_client_version> <![CDATA[ <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5700, iMonCtr=2 Model crash detected, will try to restart... 16:54:49 (5092): No heartbeat from core client for 30 sec - exiting 16:54:50 (5092): No heartbeat from core client for 30 sec - exiting 16:54:51 (5092): No heartbeat from core client for 30 sec - exiting 16:54:52 (5092): No heartbeat from core client for 30 sec - exiting 16:54:53 (5092): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5676, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3452, selfPID=5136, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4704, Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8628, selfPID=9884, iMonCtr=1 Model crash detected, will try to restart... 11:29:58 (4892): No heartbeat from core client for 30 sec - exiting 11:29:59 (4892): No heartbeat from core client for 30 sec - exiting 11:30:00 (4892): No heartbeat from core client for 30 sec - exiting 11:30:01 (4892): No heartbeat from core client for 30 sec - exiting 11:30:02 (4892): No heartbeat from core client for 30 sec - exiting 11:30:03 (4892): No heartbeat from core client for 30 sec - exiting 11:30:04 (4892): No heartbeat from core client for 30 sec - exiting 11:30:05 (4892): No heartbeat from core client for 30 sec - exiting 11:30:06 (4892): No heartbeat from core client for 30 sec - exiting 11:30:07 (4892): No heartbeat from core client for 30 sec - exiting 11:30:08 (4892): No heartbeat from core client for 30 sec - exiting 11:30:09 (4892): No heartbeat from core client for 30 sec - exiting 11:30:10 (4892): No heartbeat from core client for 30 sec - exiting 11:30:11 (4892): No heartbeat from core client for 30 sec - exiting 11:30:12 (4892): No heartbeat from core client for 30 sec - exiting 11:30:13 (4892): No heartbeat from core client for 30 sec - exiting 11:30:14 (4892): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2948, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3376, selfPID=5752, iMonCtr=1 Model crash detected, will try to restart... 12:05:07 (5688): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:05:09 (5688): No heartbeat from core client for 30 sec - exiting 12:05:10 (5688): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5936, selfPID=4620, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4312, selfPID=1080, iMonCtr=1 Model crash detected, will try to restart... 12:37:02 (5140): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=808, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5308, selfPID=2348, iMonCtr=1 Model crash detected, will try to restart... 18:58:20 (5664): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:58:22 (5664): No heartbeat from core client for 30 sec - exiting 18:58:23 (5664): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3676, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5692, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5660, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6032, selfPID=5844, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=308, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5496, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6076, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5796, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4588, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6100, selfPID=5584, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 22:37:50 (5568): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:37:53 (5568): No heartbeat from core client for 30 sec - exiting 22:37:54 (5568): No heartbeat from core client for 30 sec - exiting 22:37:55 (5568): No heartbeat from core client for 30 sec - exiting 22:37:56 (5568): No heartbeat from core client for 30 sec - exiting 22:37:57 (5568): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5412, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4788, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4392, selfPID=5616, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 22:25:49 (5520): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:25:55 (5520): No heartbeat from core client for 30 sec - exiting 22:25:56 (5520): No heartbeat from core client for 30 sec - exiting GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5444, selfPID=1568, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5568, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 16:56:28 (5836): No heartbeat from core client for 30 sec - exiting 16:56:29 (5836): No heartbeat from core client for 30 sec - exiting 16:56:30 (5836): No heartbeat from core client for 30 sec - exiting 16:56:31 (5836): No heartbeat from core client for 30 sec - exiting 16:56:32 (5836): No heartbeat from core client for 30 sec - exiting 16:56:33 (5836): No heartbeat from core client for 30 sec - exiting 16:56:34 (5836): No heartbeat from core client for 30 sec - exiting 16:56:35 (5836): No heartbeat from core client for 30 sec - exiting 16:56:36 (5836): No heartbeat from core client for 30 sec - exiting 16:56:37 (5836): No heartbeat from core client for 30 sec - exiting 16:56:38 (5836): No heartbeat from core client for 30 sec - exiting 16:56:39 (5836): No heartbeat from core client for 30 sec - exiting 16:56:40 (5836): No heartbeat from core client for 30 sec - exiting 16:56:41 (5836): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4660, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1512, selfPID=4492, iMonCtr=1 Model crash detected, will try to restart... 18:31:04 (5980): No heartbeat from core client for 30 sec - exiting 18:31:05 (5980): No heartbeat from core client for 30 sec - exiting 18:31:06 (5980): No heartbeat from core client for 30 sec - exiting 18:31:07 (5980): No heartbeat from core client for 30 sec - exiting 18:31:08 (5980): No heartbeat from core client for 30 sec - exiting 18:31:09 (5980): No heartbeat from core client for 30 sec - exiting 18:31:10 (5980): No heartbeat from core client for 30 sec - exiting 18:31:11 (5980): No heartbeat from core client for 30 sec - exiting 18:31:12 (5980): No heartbeat from core client for 30 sec - exiting 18:31:13 (5980): No heartbeat from core client for 30 sec - exiting 18:31:14 (5980): No heartbeat from core client for 30 sec - exiting 18:31:15 (5980): No heartbeat from core client for 30 sec - exiting 18:31:16 (5980): No heartbeat from core client for 30 sec - exiting 18:31:17 (5980): No heartbeat from core client for 30 sec - exiting 18:31:18 (5980): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5224, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5572, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4864, selfPID=5480, iMonCtr=1 Model crash detected, will try to restart... Glontobal WorkerDN process is not running, exiting, bRetVal = 1, checkPcheckPID=0, selfPID=1560, iMonC tr=2 crash detected, will try to restart... Leaving CPDN_Main::Monitor... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadam3p_anz_r897_2012_1_008740313/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadam3p_anz_r897_2012_1_008740313/dataout/region_restart.day after 11 attempts Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO tmp/xaakm.pipe_dummy 2048 Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO tmp/xaakg.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_anz_r897_2012_1_008740313_2_8.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_r897_2012_1_008740313_2_9.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_r897_2012_1_008740313_2_10.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_r897_2012_1_008740313_2_11.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_r897_2012_1_008740313_2_12.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
09 May 2016 20:10:17	892329	18363710	hadam3p_anz_r897_2012_1_008740313_2	80,939	498,945	6.1645
03 Jun 2015 14:46:26	892329	18363710	hadam3p_anz_r897_2012_1_008740313_2	69,419	427,583	6.1595
30 May 2015 19:28:57	892329	18363710	hadam3p_anz_r897_2012_1_008740313_2	57,899	355,063	6.1325
27 May 2015 18:35:39	892329	18363710	hadam3p_anz_r897_2012_1_008740313_2	46,379	282,143	6.0834
24 May 2015 21:29:31	892329	18363710	hadam3p_anz_r897_2012_1_008740313_2	34,859	211,232	6.0596
23 May 2015 08:11:52	892329	18363710	hadam3p_anz_r897_2012_1_008740313_2	23,339	141,342	6.0560
18 May 2015 12:15:13	892329	18363710	hadam3p_anz_r897_2012_1_008740313_2	11,819	71,141	6.0192