climateprediction.net home page
Task 18363710

Task 18363710

Name hadam3p_anz_r897_2012_1_008740313_2
Workunit 8886291
Created 26 Apr 2015, 11:22:45 UTC
Sent 27 Apr 2015, 15:57:24 UTC
Report deadline 8 Apr 2016, 21:17:24 UTC
Received 10 May 2016, 19:22:59 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 892329
Run time 7 days 16 hours 37 min 10 sec
CPU time 5 days 21 hours 17 min 8 sec
Validate state Invalid
Credit 3,490.64
Device peak FLOPS 2.65 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Australia New Zealand v6.10
windows_intelx86
Stderr
<core_client_version>7.4.42</core_client_version>
<![CDATA[
<stderr_txt>
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5700, iMonCtr=2
Model crash detected, will try to restart...
16:54:49 (5092): No heartbeat from core client for 30 sec - exiting
16:54:50 (5092): No heartbeat from core client for 30 sec - exiting
16:54:51 (5092): No heartbeat from core client for 30 sec - exiting
16:54:52 (5092): No heartbeat from core client for 30 sec - exiting
16:54:53 (5092): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5676, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3452, selfPID=5136, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4704, Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8628, selfPID=9884, iMonCtr=1
Model crash detected, will try to restart...
11:29:58 (4892): No heartbeat from core client for 30 sec - exiting
11:29:59 (4892): No heartbeat from core client for 30 sec - exiting
11:30:00 (4892): No heartbeat from core client for 30 sec - exiting
11:30:01 (4892): No heartbeat from core client for 30 sec - exiting
11:30:02 (4892): No heartbeat from core client for 30 sec - exiting
11:30:03 (4892): No heartbeat from core client for 30 sec - exiting
11:30:04 (4892): No heartbeat from core client for 30 sec - exiting
11:30:05 (4892): No heartbeat from core client for 30 sec - exiting
11:30:06 (4892): No heartbeat from core client for 30 sec - exiting
11:30:07 (4892): No heartbeat from core client for 30 sec - exiting
11:30:08 (4892): No heartbeat from core client for 30 sec - exiting
11:30:09 (4892): No heartbeat from core client for 30 sec - exiting
11:30:10 (4892): No heartbeat from core client for 30 sec - exiting
11:30:11 (4892): No heartbeat from core client for 30 sec - exiting
11:30:12 (4892): No heartbeat from core client for 30 sec - exiting
11:30:13 (4892): No heartbeat from core client for 30 sec - exiting
11:30:14 (4892): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2948, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3376, selfPID=5752, iMonCtr=1
Model crash detected, will try to restart...
12:05:07 (5688): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
12:05:09 (5688): No heartbeat from core client for 30 sec - exiting
12:05:10 (5688): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5936, selfPID=4620, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4312, selfPID=1080, iMonCtr=1
Model crash detected, will try to restart...
12:37:02 (5140): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=808, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5308, selfPID=2348, iMonCtr=1
Model crash detected, will try to restart...
18:58:20 (5664): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:58:22 (5664): No heartbeat from core client for 30 sec - exiting
18:58:23 (5664): No heartbeat from core client for 30 sec - exiting
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3676, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5692, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5660, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6032, selfPID=5844, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=308, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5496, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6076, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5796, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4588, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6100, selfPID=5584, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
22:37:50 (5568): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
22:37:53 (5568): No heartbeat from core client for 30 sec - exiting
22:37:54 (5568): No heartbeat from core client for 30 sec - exiting
22:37:55 (5568): No heartbeat from core client for 30 sec - exiting
22:37:56 (5568): No heartbeat from core client for 30 sec - exiting
22:37:57 (5568): No heartbeat from core client for 30 sec - exiting
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5412, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4788, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4392, selfPID=5616, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
22:25:49 (5520): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
22:25:55 (5520): No heartbeat from core client for 30 sec - exiting
22:25:56 (5520): No heartbeat from core client for 30 sec - exiting
GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5444, selfPID=1568, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5568, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
16:56:28 (5836): No heartbeat from core client for 30 sec - exiting
16:56:29 (5836): No heartbeat from core client for 30 sec - exiting
16:56:30 (5836): No heartbeat from core client for 30 sec - exiting
16:56:31 (5836): No heartbeat from core client for 30 sec - exiting
16:56:32 (5836): No heartbeat from core client for 30 sec - exiting
16:56:33 (5836): No heartbeat from core client for 30 sec - exiting
16:56:34 (5836): No heartbeat from core client for 30 sec - exiting
16:56:35 (5836): No heartbeat from core client for 30 sec - exiting
16:56:36 (5836): No heartbeat from core client for 30 sec - exiting
16:56:37 (5836): No heartbeat from core client for 30 sec - exiting
16:56:38 (5836): No heartbeat from core client for 30 sec - exiting
16:56:39 (5836): No heartbeat from core client for 30 sec - exiting
16:56:40 (5836): No heartbeat from core client for 30 sec - exiting
16:56:41 (5836): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4660, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1512, selfPID=4492, iMonCtr=1
Model crash detected, will try to restart...
18:31:04 (5980): No heartbeat from core client for 30 sec - exiting
18:31:05 (5980): No heartbeat from core client for 30 sec - exiting
18:31:06 (5980): No heartbeat from core client for 30 sec - exiting
18:31:07 (5980): No heartbeat from core client for 30 sec - exiting
18:31:08 (5980): No heartbeat from core client for 30 sec - exiting
18:31:09 (5980): No heartbeat from core client for 30 sec - exiting
18:31:10 (5980): No heartbeat from core client for 30 sec - exiting
18:31:11 (5980): No heartbeat from core client for 30 sec - exiting
18:31:12 (5980): No heartbeat from core client for 30 sec - exiting
18:31:13 (5980): No heartbeat from core client for 30 sec - exiting
18:31:14 (5980): No heartbeat from core client for 30 sec - exiting
18:31:15 (5980): No heartbeat from core client for 30 sec - exiting
18:31:16 (5980): No heartbeat from core client for 30 sec - exiting
18:31:17 (5980): No heartbeat from core client for 30 sec - exiting
18:31:18 (5980): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5224, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5572, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4864, selfPID=5480, iMonCtr=1
Model crash detected, will try to restart...
Glontobal WorkerDN process is not running, exiting, bRetVal = 1, checkPcheckPID=0, selfPID=1560, iMonC
tr=2
 crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadam3p_anz_r897_2012_1_008740313/dataout/atmos_restart.day after 11 attempts
cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadam3p_anz_r897_2012_1_008740313/dataout/region_restart.day after 11 attempts

Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO                                                                                                                                                                                           tmp/xaakm.pipe_dummy                                                            2048    

Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO                                                                                                                                                                                           tmp/xaakg.pipe_dummy                                                            2048    
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_anz_r897_2012_1_008740313_2_8.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_anz_r897_2012_1_008740313_2_9.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_anz_r897_2012_1_008740313_2_10.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_anz_r897_2012_1_008740313_2_11.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_anz_r897_2012_1_008740313_2_12.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
09 May 2016 20:10:17 892329 18363710 hadam3p_anz_r897_2012_1_008740313_2 80,939 498,945 6.1645
03 Jun 2015 14:46:26 892329 18363710 hadam3p_anz_r897_2012_1_008740313_2 69,419 427,583 6.1595
30 May 2015 19:28:57 892329 18363710 hadam3p_anz_r897_2012_1_008740313_2 57,899 355,063 6.1325
27 May 2015 18:35:39 892329 18363710 hadam3p_anz_r897_2012_1_008740313_2 46,379 282,143 6.0834
24 May 2015 21:29:31 892329 18363710 hadam3p_anz_r897_2012_1_008740313_2 34,859 211,232 6.0596
23 May 2015 08:11:52 892329 18363710 hadam3p_anz_r897_2012_1_008740313_2 23,339 141,342 6.0560
18 May 2015 12:15:13 892329 18363710 hadam3p_anz_r897_2012_1_008740313_2 11,819 71,141 6.0192


©2024 climateprediction.net