climateprediction.net home page
Task 13104580

Task 13104580

Name hadcm3n_ydi2_1900_40_007350356_0
Workunit 7547786
Created 6 Jul 2011, 14:05:27 UTC
Sent 17 Jul 2011, 1:14:29 UTC
Report deadline 16 Oct 2011, 8:41:40 UTC
Received 20 Aug 2011, 21:37:51 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 22 (0x00000016) Unknown error code
Computer ID 1041683
Run time 10 days 23 hours 19 min 17 sec
CPU time 9 days 10 hours 13 min 55 sec
Validate state Invalid
Credit 7,153.92
Device peak FLOPS 3.14 GFLOPS
Application version UK Met Office Coupled Model Full Resolution Ocean v6.07
windows_intelx86
Stderr
<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
The device does not recognize the command. (0x16) - exit code 22 (0x16)
</message>
<stderr_txt>
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5136, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5136, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5136, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6012, iMonCtr=1
Model crash detected, will try to restart...
17:24:17 (4164): No heartbeat from core client for 30 sec - exiting
17:24:18 (4164): No heartbeat from core client for 30 sec - exiting
17:24:19 (4164): No heartbeat from core client for 30 sec - exiting
17:24:28 (4164): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6012, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5168, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5168, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5064, iMonCtr=1
Model crash detected, will try to restart...
11:48:38 (3988): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
12:33:36 (4012): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
12:33:37 (4012): No heartbeat from core client for 30 sec - exiting
12:33:38 (4012): No heartbeat from core client for 30 sec - exiting
12:33:39 (4012): No heartbeat from core client for 30 sec - exiting
12:33:40 (4012): No heartbeat from core client for 30 sec - exiting
12:33:41 (4012): No heartbeat from core client for 30 sec - exiting
12:33:42 (4012): No heartbeat from core client for 30 sec - exiting
12:33:43 (4012): No heartbeat from core client for 30 sec - exiting
12:33:44 (4012): No heartbeat from core client for 30 sec - exiting
12:33:45 (4012): No heartbeat from core client for 30 sec - exiting
12:33:46 (4012): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Ocean Restart file copy failed on ydi2ko.dab57q0
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3104, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3104, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3104, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3104, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6124, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Signal 22 received, exiting...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3988, iMonCtr=1
Model crash detected, will try to restart...
Signal 22 received, exiting...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3988, iMonCtr=1
Model crash detected, will try to restart...
Signal 22 received, exiting...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3988, iMonCtr=1
Model crash detected, will try to restart...
16:35:34 (3988): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
16:35:35 (3988): No heartbeat from core client for 30 sec - exiting
16:35:36 (3988): No heartbeat from core client for 30 sec - exiting
16:35:37 (3988): No heartbeat from core client for 30 sec - exiting
16:35:38 (3988): No heartbeat from core client for 30 sec - exiting
16:35:39 (3988): No heartbeat from core client for 30 sec - exiting
16:35:40 (3988): No heartbeat from core client for 30 sec - exiting
16:35:41 (3988): No heartbeat from core client for 30 sec - exiting
16:35:42 (3988): No heartbeat from core client for 30 sec - exiting
16:35:43 (3988): No heartbeat from core client for 30 sec - exiting
16:35:44 (3988): No heartbeat from core client for 30 sec - exiting
Signal 22 received, exiting...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2604, iMonCtr=1
Model crash detected, will try to restart...
Signal 22 received, exiting...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2604, iMonCtr=1
Model crash detected, will try to restart...
Signal 22 received, exiting...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2604, iMonCtr=1
Model crash detected, will try to restart...
Sorry, too many model crashes! :-(
Called boinc_finish

</stderr_txt>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
20 Aug 2011 11:12:41 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 596,160 786,114 1.3186
20 Aug 2011 00:07:28 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 570,240 752,149 1.3190
19 Aug 2011 06:39:55 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 544,320 718,260 1.3196
17 Aug 2011 15:14:54 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 492,480 651,654 1.3232
17 Aug 2011 04:51:26 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 466,560 619,271 1.3273
16 Aug 2011 01:25:48 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 440,640 585,225 1.3281
15 Aug 2011 15:35:12 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 414,720 552,649 1.3326
15 Aug 2011 05:40:35 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 388,800 520,121 1.3378
14 Aug 2011 09:34:49 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 362,880 484,968 1.3364
13 Aug 2011 20:43:36 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 336,960 448,671 1.3315
11 Aug 2011 23:28:32 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 311,040 414,483 1.3326
08 Aug 2011 22:04:32 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 285,120 380,533 1.3346
07 Aug 2011 14:56:30 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 259,200 345,851 1.3343
06 Aug 2011 18:02:38 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 233,280 311,383 1.3348
05 Aug 2011 03:30:12 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 207,360 277,351 1.3375
03 Aug 2011 22:21:29 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 181,440 242,874 1.3386
31 Jul 2011 18:35:04 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 155,520 207,817 1.3363
30 Jul 2011 19:43:44 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 129,600 172,780 1.3332
29 Jul 2011 23:26:47 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 103,680 137,741 1.3285
27 Jul 2011 01:35:15 1041683 13104580 hadcm3n_ydi2_1900_40_007350356_0 77,760 103,032 1.3250


©2024 climateprediction.net