Name | hadcm3n_o2ql_1940_40_007743467_2 |
Workunit | 7898575 |
Created | 31 Jan 2012, 1:52:17 UTC |
Sent | 31 Jan 2012, 4:16:44 UTC |
Report deadline | 1 May 2012, 11:43:55 UTC |
Received | 8 Mar 2012, 17:42:12 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1301249 |
Run time | 28 days 13 hours 38 min 59 sec |
CPU time | 22 days 22 hours 2 min 41 sec |
Validate state | Invalid |
Credit | 11,508.48 |
Device peak FLOPS | 2.09 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 00:18:33 (4436): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... 10:57:39 (1116): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... 00:04:23 (2624): Can't acquire lockfile (32) - waiting 35s 00:04:28 (5832): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... 10:22:22 (6124): Can't acquire lockfile (32) - waiting 35s 10:22:52 (5708): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 00:20:54 (4372): Can't acquire lockfile (32) - waiting 35s 00:21:23 (2476): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 01:48:27 (5660): Can't acquire lockfile (32) - waiting 35s 01:48:46 (752): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:48:47 (752): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 03:28:37 (2864): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:28:38 (2864): No heartbeat from core client for 30 sec - exiting 03:28:39 (2864): No heartbeat from core client for 30 sec - exiting 03:28:41 (2864): No heartbeat from core client for 30 sec - exiting 03:28:42 (2864): No heartbeat from core client for 30 sec - exiting 03:28:43 (2864): No heartbeat from core client for 30 sec - exiting 03:28:44 (2864): No heartbeat from core client for 30 sec - exiting 03:28:45 (2864): No heartbeat from core client for 30 sec - exiting 03:28:46 (2864): No heartbeat from core client for 30 sec - exiting 03:28:47 (2864): No heartbeat from core client for 30 sec - exiting 03:28:48 (2864): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3380, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 22:20:09 (3556): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:20:13 (3556): No heartbeat from core client for 30 sec - exiting 22:27:39 (2508): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:27:40 (2508): No heartbeat from core client for 30 sec - exiting 22:27:41 (2508): No heartbeat from core client for 30 sec - exiting 22:27:42 (2508): No heartbeat from core client for 30 sec - exiting 22:27:43 (2508): No heartbeat from core client for 30 sec - exiting 22:27:44 (2508): No heartbeat from core client for 30 sec - exiting 22:27:45 (2508): No heartbeat from core client for 30 sec - exiting 22:27:46 (2508): No heartbeat from core client for 30 sec - exiting 22:27:47 (2508): No heartbeat from core client for 30 sec - exiting 22:27:48 (2508): No heartbeat from core client for 30 sec - exiting 22:27:49 (2508): No heartbeat from core client for 30 sec - exiting Atmos Hold Restart file rename failed on atmos_restart.hold CPDN Monitor - Quit request from BOINC... 23:55:54 (4620): Can't acquire lockfile (32) - waiting 35s 23:56:12 (3624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:56:13 (3624): No heartbeat from core client for 30 sec - exiting 23:56:15 (3624): No heartbeat from core client for 30 sec - exiting 23:56:16 (3624): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold Model crashed: U_MODEL: Illegal combination of submodels tmp/pipe_dummy 2048 Atmos Hold Restart file rename failed on atmos_restart.hold Model crashed: U_MODEL: Illegal combination of submodels tmp/pipe_dummy 2048 Atmos Hold Restart file rename failed on atmos_restart.hold Model crashed: U_MODEL: Illegal combination of submodels tmp/pipe_dummy 2048 Atmos Hold Restart file rename failed on atmos_restart.hold Model crashed: U_MODEL: Illegal combination of submodels tmp/pipe_dummy 2048 Atmos Hold Restart file rename failed on atmos_restart.hold Model crashed: U_MODEL: Illegal combination of submodels tmp/pipe_dummy 2048 Atmos Hold Restart file rename failed on atmos_restart.hold Model crashed: U_MODEL: Illegal combination of submodels tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
07 Mar 2012 00:26:53 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 959,040 | 2,012,208 | 2.0981 |
06 Mar 2012 02:35:24 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 933,120 | 1,958,468 | 2.0988 |
03 Mar 2012 10:40:57 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 907,200 | 1,906,316 | 2.1013 |
02 Mar 2012 14:57:51 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 881,280 | 1,854,013 | 2.1038 |
01 Mar 2012 19:28:50 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 855,360 | 1,801,049 | 2.1056 |
01 Mar 2012 19:28:50 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 829,440 | 1,746,338 | 2.1054 |
28 Feb 2012 14:55:31 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 803,520 | 1,693,256 | 2.1073 |
27 Feb 2012 20:25:19 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 777,600 | 1,640,413 | 2.1096 |
27 Feb 2012 03:15:41 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 751,680 | 1,586,517 | 2.1106 |
26 Feb 2012 08:27:58 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 725,760 | 1,531,838 | 2.1107 |
25 Feb 2012 15:07:36 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 699,840 | 1,475,860 | 2.1089 |
24 Feb 2012 18:31:55 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 673,920 | 1,419,819 | 2.1068 |
24 Feb 2012 01:25:06 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 648,000 | 1,364,813 | 2.1062 |
23 Feb 2012 08:34:39 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 622,080 | 1,308,977 | 2.1042 |
22 Feb 2012 14:40:16 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 596,160 | 1,252,858 | 2.1015 |
21 Feb 2012 19:17:53 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 570,240 | 1,195,227 | 2.0960 |
21 Feb 2012 00:22:11 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 544,320 | 1,140,705 | 2.0957 |
20 Feb 2012 05:32:24 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 518,400 | 1,086,980 | 2.0968 |
19 Feb 2012 06:45:00 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 492,480 | 1,033,616 | 2.0988 |
18 Feb 2012 07:44:26 | 1179306 | 14035565 | hadcm3n_o2ql_1940_40_007743467_2 | 466,560 | 979,038 | 2.0984 |
©2024 climateprediction.net