climateprediction.net home page
Task 12225183

Task 12225183

Name hadam3p_saf_1d0g_1975_1_006944696_0
Workunit 7148012
Created 22 Nov 2010, 16:00:03 UTC
Sent 10 Mar 2011, 8:10:25 UTC
Report deadline 20 Feb 2012, 13:30:25 UTC
Received 28 Sep 2011, 14:59:22 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1003407
Run time 7 days 14 hours 48 min 16 sec
CPU time 5 days 14 hours 27 min 21 sec
Validate state Invalid
Credit 1,683.45
Device peak FLOPS 2.00 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Southern Africa v6.08
windows_intelx86
Stderr
<core_client_version>6.10.56</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=168, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5712, selfPID=5732, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2088, selfPID=2880, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3952, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1572, selfPID=4928, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2036, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
15:19:34 (4344): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
15:19:35 (4344): No heartbeat from core client for 30 sec - exiting
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5644, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
14:26:32 (5696): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4120, selfPID=6504, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
12:42:38 (2668): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
12:42:40 (2668): No heartbeat from core client for 30 sec - exiting
12:42:41 (2668): No heartbeat from core client for 30 sec - exiting
12:42:42 (2668): No heartbeat from core client for 30 sec - exiting
12:42:43 (2668): No heartbeat from core client for 30 sec - exiting
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3740, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6688, selfPID=11000, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7616, iMonCtr=2
Model crash detected, will try to restart...
CCGntroller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5420, iMonCtr=2
Model crash detected, will try to restart...
lobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6836, iMonCtr=2
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3192, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3332, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6620, selfPID=4584, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6476, selfPID=5044, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3232, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6176, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10564, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4964, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3888, selfPID=6156, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5492, selfPID=7128, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=11592, selfPID=3456, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=9556, selfPID=6264, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=11388, selfPID=11408, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
14:33:54 (2904): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
14:33:55 (2904): No heartbeat from core client for 30 sec - exiting
14:33:56 (2904): No heartbeat from core client for 30 sec - exiting
14:33:57 (2904): No heartbeat from core client for 30 sec - exiting
14:33:58 (2904): No heartbeat from core client for 30 sec - exiting
14:34:00 (2904): No heartbeat from core client for 30 sec - exiting
14:34:01 (2904): No heartbeat from core client for 30 sec - exiting
14:34:02 (2904): No heartbeat from core client for 30 sec - exiting
14:37:17 (6596): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10212, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
16:04:22 (10684): No heartbeat from core client for 30 sec - exiting
Suspended CPDN Monitor - No 'heartbeat' from BOINC...
16:04:23 (10684): No heartbeat from core client for 30 sec - exiting
16:04:24 (10684): No heartbeat from core client for 30 sec - exiting
16:04:30 (10684): No heartbeat from core client for 30 sec - exiting
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
20:13:44 (6800): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
17:19:40 (7996): No heartbeat from core client for 30 sec - exiting
Suspended CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4760, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6520, iMonCtr=2

Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO                                                                                                                                                                                           tmp/xaakm.pipe_dummy                                                            2048    
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8116, selfPID=8116, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8116, selfPID=7556, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
09:58:32 (7556): called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
  <file_name>hadam3p_saf_1d0g_1975_1_006944696_0_10.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_1d0g_1975_1_006944696_0_11.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_1d0g_1975_1_006944696_0_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
27 Sep 2011 10:54:31 1003407 12225183 hadam3p_saf_1d0g_1975_1_006944696_0 103,776 465,008 4.4809
15 Sep 2011 10:03:38 1003407 12225183 hadam3p_saf_1d0g_1975_1_006944696_0 92,256 413,996 4.4875
31 Aug 2011 14:22:52 1003407 12225183 hadam3p_saf_1d0g_1975_1_006944696_0 80,752 365,842 4.5304
31 Aug 2011 13:21:46 1003407 12225183 hadam3p_saf_1d0g_1975_1_006944696_0 80,736 365,048 4.5215
25 Aug 2011 15:13:50 1003407 12225183 hadam3p_saf_1d0g_1975_1_006944696_0 69,218 313,938 4.5355
25 Aug 2011 14:12:37 1003407 12225183 hadam3p_saf_1d0g_1975_1_006944696_0 69,216 313,145 4.5242
06 May 2011 08:44:35 1003407 12225183 hadam3p_saf_1d0g_1975_1_006944696_0 57,696 261,752 4.5367
27 Apr 2011 08:23:35 1003407 12225183 hadam3p_saf_1d0g_1975_1_006944696_0 46,178 212,695 4.6060
25 Apr 2011 14:18:17 1003407 12225183 hadam3p_saf_1d0g_1975_1_006944696_0 46,176 211,568 4.5818
01 Apr 2011 07:29:03 1003407 12225183 hadam3p_saf_1d0g_1975_1_006944696_0 34,656 161,144 4.6498
20 Mar 2011 15:04:35 1003407 12225183 hadam3p_saf_1d0g_1975_1_006944696_0 23,136 108,142 4.6742
18 Mar 2011 10:31:03 1003407 12225183 hadam3p_saf_1d0g_1975_1_006944696_0 11,616 53,810 4.6324


©2024 climateprediction.net