climateprediction.net home page
Task 13895861

Task 13895861

Name hadam3p_eu_88jt_2000_1_007661106_1
Workunit 7816193
Created 10 Jan 2012, 22:02:44 UTC
Sent 10 Jan 2012, 22:10:26 UTC
Report deadline 23 Dec 2012, 3:30:26 UTC
Received 21 Jan 2012, 18:26:30 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1165859
Run time 3 days 20 hours 56 min 3 sec
CPU time 3 days 12 hours 2 min 15 sec
Validate state Invalid
Credit 1,790.21
Device peak FLOPS 2.20 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Europe v6.09
windows_intelx86
Stderr
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<stderr_txt>
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5936, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4312, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
21:54:11 (4484): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
23:53:07 (5964): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
01:51:38 (3004): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
03:54:59 (4552): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
10:28:43 (5312): No heartbeat from core client for 30 sec - exiting
10:28:44 (5312): No heartbeat from core client for 30 sec - exiting
10:28:45 (5312): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1016, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1020, selfPID=708, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
02:03:06 (5736): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
14:17:19 (3988): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=660, selfPID=660, iMonCtr=2
18:16:15 (404): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:16:16 (404): No heartbeat from core client for 30 sec - exiting
18:16:17 (404): No heartbeat from core client for 30 sec - exiting
22:14:55 (4292): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4480, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4492, selfPID=5132, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5684, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5192, selfPID=2124, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
23:12:36 (2784): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
23:12:38 (2784): No heartbeat from core client for 30 sec - exiting
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4124, selfPID=4624, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
13:47:14 (2240): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
13:47:16 (2240): No heartbeat from core client for 30 sec - exiting
13:47:17 (2240): No heartbeat from core client for 30 sec - exiting
13:47:18 (2240): No heartbeat from core client for 30 sec - exiting
13:47:19 (2240): No heartbeat from core client for 30 sec - exiting
13:47:20 (2240): No heartbeat from core client for 30 sec - exiting
13:47:21 (2240): No heartbeat from core client for 30 sec - exiting
15:49:42 (4404): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
16:48:25 (2072): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2116, selfPID=2116, iMonCtr=2
17:46:55 (1284): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5904, selfPID=5904, iMonCtr=2
18:45:18 (4212): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
19:44:22 (6064): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5764, selfPID=5764, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5984, selfPID=5328, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5648, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...

Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO                                                                                                                                                                                           tmp/xaakm.pipe_dummy                                                            2048    
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_eu_88jt_2000_1_007661106_1_10.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_88jt_2000_1_007661106_1_11.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_88jt_2000_1_007661106_1_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
19 Jan 2012 17:38:55 1165859 13895861 hadam3p_eu_88jt_2000_1_007661106_1 103,776 273,177 2.6324
18 Jan 2012 16:51:10 1165859 13895861 hadam3p_eu_88jt_2000_1_007661106_1 92,256 243,475 2.6391
17 Jan 2012 15:49:58 1165859 13895861 hadam3p_eu_88jt_2000_1_007661106_1 80,736 213,243 2.6412
16 Jan 2012 08:44:38 1165859 13895861 hadam3p_eu_88jt_2000_1_007661106_1 69,216 182,602 2.6381
15 Jan 2012 11:56:05 1165859 13895861 hadam3p_eu_88jt_2000_1_007661106_1 57,700 150,802 2.6136
15 Jan 2012 10:55:52 1165859 13895861 hadam3p_eu_88jt_2000_1_007661106_1 57,696 150,404 2.6068
14 Jan 2012 14:23:20 1165859 13895861 hadam3p_eu_88jt_2000_1_007661106_1 46,176 121,231 2.6254
13 Jan 2012 12:48:33 1165859 13895861 hadam3p_eu_88jt_2000_1_007661106_1 34,656 90,378 2.6079
12 Jan 2012 17:11:31 1165859 13895861 hadam3p_eu_88jt_2000_1_007661106_1 23,136 60,441 2.6124
12 Jan 2012 00:00:42 1165859 13895861 hadam3p_eu_88jt_2000_1_007661106_1 11,616 30,313 2.6096


©2024 climateprediction.net