climateprediction.net home page
Task 14669222

Task 14669222

Name hadam3p_saf_1gqh_2000_1_006959121_1
Workunit 7162437
Created 15 May 2012, 10:21:11 UTC
Sent 15 May 2012, 10:21:24 UTC
Report deadline 27 Apr 2013, 15:41:24 UTC
Received 5 Jun 2012, 14:00:50 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1140577
Run time 5 days 7 hours 27 min 1 sec
CPU time 4 days 1 hours 2 min 1 sec
Validate state Invalid
Credit 2,057.21
Device peak FLOPS 2.56 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Southern Africa v6.09
windows_intelx86
Stderr
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=292, iMonCtr=2
Model crash detected, will try to restart...
14:46:01 (4552): No heartbeat from core client for 30 sec - exiting
14:46:02 (4552): No heartbeat from core client for 30 sec - exiting
14:46:03 (4552): No heartbeat from core client for 30 sec - exiting
14:46:05 (4552): No heartbeat from core client for 30 sec - exiting
14:46:06 (4552): No heartbeat from core client for 30 sec - exiting
14:46:07 (4552): No heartbeat from core client for 30 sec - exiting
14:46:08 (4552): No heartbeat from core client for 30 sec - exiting
14:46:09 (4552): No heartbeat from core client for 30 sec - exiting
14:46:10 (4552): No heartbeat from core client for 30 sec - exiting
14:46:11 (4552): No heartbeat from core client for 30 sec - exiting
14:46:12 (4552): No heartbeat from core client for 30 sec - exiting
14:46:13 (4552): No heartbeat from core client for 30 sec - exiting
14:46:14 (4552): No heartbeat from core client for 30 sec - exiting
14:46:15 (4552): No heartbeat from core client for 30 sec - exiting
14:46:17 (4552): No heartbeat from core client for 30 sec - exiting
14:46:18 (4552): No heartbeat from core client for 30 sec - exiting
14:46:19 (4552): No heartbeat from core client for 30 sec - exiting
14:46:20 (4552): No heartbeat from core client for 30 sec - exiting
14:46:21 (4552): No heartbeat from core client for 30 sec - exiting
14:46:22 (4552): No heartbeat from core client for 30 sec - exiting
14:46:23 (4552): No heartbeat from core client for 30 sec - exiting
14:46:24 (4552): No heartbeat from core client for 30 sec - exiting
14:46:25 (4552): No heartbeat from core client for 30 sec - exiting
14:46:26 (4552): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4128, selfPID=4160, iMonCtr=1
Model crash detected, will try to restart...
GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5068, iMonCtr=2
Model crash detected, will try to restart...
lSuspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4396, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6044, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5236, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3984, selfPID=4664, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3620, selfPID=4236, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4332, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2668, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4744, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDLeaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4724, selfPID=4680, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6132, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4976, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5012, selfPID=5948, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5428, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5292, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3148, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3436, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6092, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
15:00:35 (932): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
15:00:36 (932): No heartbeat from core client for 30 sec - exiting
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4424, selfPID=3764, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5028, selfPID=5728, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6060, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4960, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4512, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4204, iMonCtr=2
Model crash detected, will try to restart...

</stderr_txt>
<message>
<file_xfer_error>
  <file_name>hadam3p_saf_1gqh_2000_1_006959121_1_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
04 Jun 2012 20:18:55 1140577 14669222 hadam3p_saf_1gqh_2000_1_006959121_1 126,816 323,260 2.5490
04 Jun 2012 20:18:55 1140577 14669222 hadam3p_saf_1gqh_2000_1_006959121_1 115,296 294,291 2.5525
01 Jun 2012 14:05:16 1140577 14669222 hadam3p_saf_1gqh_2000_1_006959121_1 103,776 264,404 2.5478
01 Jun 2012 08:27:16 1140577 14669222 hadam3p_saf_1gqh_2000_1_006959121_1 92,256 236,108 2.5593
01 Jun 2012 08:27:16 1140577 14669222 hadam3p_saf_1gqh_2000_1_006959121_1 80,736 208,216 2.5790
30 May 2012 16:37:09 1140577 14669222 hadam3p_saf_1gqh_2000_1_006959121_1 69,216 178,799 2.5832
29 May 2012 07:59:00 1140577 14669222 hadam3p_saf_1gqh_2000_1_006959121_1 57,696 149,521 2.5915
29 May 2012 07:59:00 1140577 14669222 hadam3p_saf_1gqh_2000_1_006959121_1 46,176 119,707 2.5924
23 May 2012 10:03:20 1140577 14669222 hadam3p_saf_1gqh_2000_1_006959121_1 34,671 89,341 2.5768
23 May 2012 09:07:07 1140577 14669222 hadam3p_saf_1gqh_2000_1_006959121_1 34,656 88,864 2.5642
21 May 2012 15:24:12 1140577 14669222 hadam3p_saf_1gqh_2000_1_006959121_1 23,136 58,915 2.5465
21 May 2012 08:38:38 1140577 14669222 hadam3p_saf_1gqh_2000_1_006959121_1 11,616 29,830 2.5680


©2024 climateprediction.net