climateprediction.net home page
Task 13359220

Task 13359220

Name hadcm3n_t33x_1940_40_007447451_1
Workunit 7644954
Created 9 Sep 2011, 18:02:43 UTC
Sent 9 Sep 2011, 18:07:25 UTC
Report deadline 10 Dec 2011, 1:34:36 UTC
Received 15 Nov 2011, 17:08:53 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 22 (0x00000016) Unknown error code
Computer ID 1041608
Run time 20 days 13 hours 4 min 31 sec
CPU time 19 days 8 hours 11 min 44 sec
Validate state Invalid
Credit 8,398.08
Device peak FLOPS 1.80 GFLOPS
Application version UK Met Office Coupled Model Full Resolution Ocean v6.07
windows_intelx86
Stderr
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
The device does not recognize the command. (0x16) - exit code 22 (0x16)
</message>
<stderr_txt>
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5448, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
15:01:34 (3940): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4344, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
18:26:22 (5960): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:26:25 (5960): No heartbeat from core client for 30 sec - exiting
18:26:26 (5960): No heartbeat from core client for 30 sec - exiting
18:26:27 (5960): No heartbeat from core client for 30 sec - exiting
18:26:28 (5960): No heartbeat from core client for 30 sec - exiting
18:26:29 (5960): No heartbeat from core client for 30 sec - exiting
18:26:30 (5960): No heartbeat from core client for 30 sec - exiting
18:26:31 (5960): No heartbeat from core client for 30 sec - exiting
18:26:32 (5960): No heartbeat from core client for 30 sec - exiting
Suspended CPDN Monitor - Suspend request from BOINC...
10:35:24 (4828): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
10:35:25 (4828): No heartbeat from core client for 30 sec - exiting
10:35:26 (4828): No heartbeat from core client for 30 sec - exiting
10:35:27 (4828): No heartbeat from core client for 30 sec - exiting
Atmos Hold Restart file rename failed on atmos_restart.hold
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
17:15:55 (4672): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
19:55:06 (6560): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
20:01:50 (7448): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
21:09:31 (6916): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3340, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5840, iMonCtr=1
Model crash detected, will try to restart...
18:14:44 (5704): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:14:48 (5704): No heartbeat from core client for 30 sec - exiting
18:14:49 (5704): No heartbeat from core client for 30 sec - exiting
18:14:50 (5704): No heartbeat from core client for 30 sec - exiting
18:14:51 (5704): No heartbeat from core client for 30 sec - exiting
18:14:52 (5704): No heartbeat from core client for 30 sec - exiting
18:14:53 (5704): No heartbeat from core client for 30 sec - exiting
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5752, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4964, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5672, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
16:45:02 (5612): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
16:45:05 (5612): No heartbeat from core client for 30 sec - exiting
16:45:06 (5612): No heartbeat from core client for 30 sec - exiting
13:59:16 (5792): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
15:57:48 (4392): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
BUFFOUT: C I/O Error - Return code = 32

Model crashed: WRITDUMP: BAD BUFFOUT OF DATA                                                                                                                                                                                                                                   tmp/pipe_dummy                                                                  2048    
Suspended CPDN Monitor - Suspend request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5788, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6944, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6104, iMonCtr=1
Model crash detected, will try to restart...
16:25:51 (5572): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...

Model crashed: TEMPHIST: Failed in OPEN of history file                                                                                                                                                                                                                        tmp/pipe_dummy                                                                  2048    
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
20:59:30 (2648): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
20:59:31 (2648): No heartbeat from core client for 30 sec - exiting

Model crashed: TEMPHIST: Failed in OPEN of history file                                                                                                                                                                                                                        tmp/pipe_dummy                                                                  2048    
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5956, iMonCtr=1
Model crash detected, will try to restart...
BUFFOUT: C I/O Error - Return code = 32

Model crashed: WRITDUMP: BAD BUFFOUT OF DATA                                                                                                                                                                                                                                   tmp/pipe_dummy                                                                  2048    
BUFFOUT: C I/O Error - Return code = 32

Model crashed: WRITDUMP: BAD BUFFOUT OF DATA                                                                                                                                                                                                                                   tmp/pipe_dummy                                                                  2048    
BUFFOUT: C I/O Error - Return code = 32

Model crashed: WRITDUMP: BAD BUFFOUT OF DATA                                                                                                                                                                                                                                   tmp/pipe_dummy                                                                  2048    
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5516, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5552, iMonCtr=1
Model crash detected, will try to restart...
01:40:25 (1784): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
17:41:27 (5116): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
19:18:56 (6072): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
20:19:00 (5864): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
20:54:31 (2364): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...

Model crashed: TEMPHIST: Failed in OPEN of history file                                                                                                                                                                                                                        tmp/pipe_dummy                                                                  2048    
Suspended CPDN Monitor - Suspend request from BOINC...

Model crashed: TEMPHIST: Failed in OPEN of history file                                                                                                                                                                                                                        tmp/pipe_dummy                                                                  2048    
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5708, iMonCtr=1
Model crash detected, will try to restart...
20:29:58 (5664): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
17:17:18 (5496): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
17:17:19 (5496): No heartbeat from core client for 30 sec - exiting
19:08:42 (3344): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
18:08:48 (5000): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:08:51 (5000): No heartbeat from core client for 30 sec - exiting
18:08:52 (5000): No heartbeat from core client for 30 sec - exiting
18:08:53 (5000): No heartbeat from core client for 30 sec - exiting
18:08:56 (5000): No heartbeat from core client for 30 sec - exiting
19:00:48 (5580): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6076, iMonCtr=1
Model crash detected, will try to restart...
20:08:37 (5692): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
20:08:39 (5692): No heartbeat from core client for 30 sec - exiting
20:08:40 (5692): No heartbeat from core client for 30 sec - exiting
20:15:33 (5052): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
20:15:35 (5052): No heartbeat from core client for 30 sec - exiting
09:22:17 (5800): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
09:55:47 (5548): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
09:55:49 (5548): No heartbeat from core client for 30 sec - exiting
13:51:20 (5852): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
14:06:17 (3712): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
14:22:05 (3560): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
14:22:07 (3560): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
18:11:01 (3372): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:11:02 (3372): No heartbeat from core client for 30 sec - exiting
18:11:03 (3372): No heartbeat from core client for 30 sec - exiting
18:11:04 (3372): No heartbeat from core client for 30 sec - exiting
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4764, iMonCtr=1
Model crash detected, will try to restart...
BUFFOUT: C I/O Error - Return code = 32

Model crashed: WRITDUMP: BAD BUFFOUT OF DATA                                                                                                                                                                                                                                   tmp/pipe_dummy                                                                  2048    
BUFFOUT: C I/O Error - Return code = 32

Model crashed: WRITDUMP: BAD BUFFOUT OF DATA                                                                                                                                                                                                                                   tmp/pipe_dummy                                                                  2048    
BUFFOUT: C I/O Error - Return code = 32

Model crashed: WRITDUMP: BAD BUFFOUT OF DATA                                                                                                                                                                                                                                   tmp/pipe_dummy                                                                  2048    
forrtl: There is not enough space on the disk.

Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5688, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5688, iMonCtr=1
Model crash detected, will try to restart...
11:34:30 (5684): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
11:34:31 (5684): No heartbeat from core client for 30 sec - exiting
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5516, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
CPDN Monitor - Quit request from BOINC...
10:57:58 (5832): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
10:58:01 (5832): No heartbeat from core client for 30 sec - exiting
10:58:02 (5832): No heartbeat from core client for 30 sec - exiting
10:58:03 (5832): No heartbeat from core client for 30 sec - exiting
BUFFOUT: C I/O Error - Return code = 32

Model crashed: WRITDUMP: BAD BUFFOUT OF DATA                                                                                                                                                                                                                                   tmp/pipe_dummy                                                                  2048    
BUFFOUT: C I/O Error - Return code = 32

Model crashed: WRITDUMP: BAD BUFFOUT OF DATA                                                                                                                                                                                                                                   tmp/pipe_dummy                                                                  2048    
BUFFOUT: C I/O Error - Return code = 32

Model crashed: WRITDUMP: BAD BUFFOUT OF DATA                                                                                                                                                                                                                                   tmp/pipe_dummy                                                                  2048    
BUFFOUT: C I/O Error - Return code = 32

Model crashed: WRITDUMP: BAD BUFFOUT OF DATA                                                                                                                                                                                                                                   tmp/pipe_dummy                                                                  2048    
BUFFOUT: C I/O Error - Return code = 32

Model crashed: WRITDUMP: BAD BUFFOUT OF DATA                                                                                                                                                                                                                                   tmp/pipe_dummy                                                                  2048    
Suspended CPDN Monitor - Suspend request from BOINC...
CPDN Monitor - Quit request from BOINC...

Model crashed: TEMPHIST: Failed in OPEN of history file                                                                                                                                                                                                                        tmp/pipe_dummy                                                                  2048    
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5552, iMonCtr=1
Model crash detected, will try to restart...
10:58:56 (5672): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
11:56:26 (2452): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Signal 22 received, exiting...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1588, iMonCtr=1
Model crash detected, will try to restart...
Signal 22 received, exiting...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1588, iMonCtr=1
Model crash detected, will try to restart...
Signal 22 received, exiting...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1588, iMonCtr=1
Model crash detected, will try to restart...
Signal 22 received, exiting...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1588, iMonCtr=1
Model crash detected, will try to restart...
Signal 22 received, exiting...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1588, iMonCtr=1
Model crash detected, will try to restart...
Signal 22 received, exiting...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1588, iMonCtr=1
Model crash detected, will try to restart...
Sorry, too many model crashes! :-(
Called boinc_finish

</stderr_txt>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
15 Nov 2011 17:34:19 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 699,840 1,669,783 2.3859
09 Nov 2011 20:21:36 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 673,920 1,602,500 2.3779
06 Nov 2011 18:42:19 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 648,000 1,551,085 2.3936
05 Nov 2011 12:41:48 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 622,080 1,524,257 2.4503
04 Nov 2011 21:27:18 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 596,160 1,456,056 2.4424
01 Nov 2011 20:51:18 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 570,240 1,386,147 2.4308
31 Oct 2011 20:03:32 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 544,320 1,325,111 2.4344
31 Oct 2011 18:38:44 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 518,400 1,267,273 2.4446
31 Oct 2011 17:38:03 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 492,480 1,211,235 2.4595
31 Oct 2011 17:14:46 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 466,560 1,146,243 2.4568
31 Oct 2011 17:14:45 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 440,640 1,078,114 2.4467
31 Oct 2011 17:14:45 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 414,720 1,011,028 2.4379
17 Oct 2011 15:33:56 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 388,800 954,253 2.4544
16 Oct 2011 02:27:17 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 362,880 898,521 2.4761
15 Oct 2011 10:28:30 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 336,960 842,287 2.4997
14 Oct 2011 13:18:27 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 311,040 784,375 2.5218
10 Oct 2011 16:23:56 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 285,120 732,586 2.5694
06 Oct 2011 15:26:21 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 259,200 666,814 2.5726
02 Oct 2011 13:30:03 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 233,280 601,892 2.5801
29 Sep 2011 15:21:39 1041608 13359220 hadcm3n_t33x_1940_40_007447451_1 207,360 536,763 2.5886


©2024 climateprediction.net