|
Name | hadcm3n_o31a_2140_40_008280608_2 |
Workunit | 8431743 |
Created | 29 Dec 2012, 23:27:08 UTC |
Sent | 29 Dec 2012, 23:40:42 UTC |
Report deadline | 31 Mar 2013, 7:07:53 UTC |
Received | 21 Jan 2013, 22:22:16 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1124940 |
Run time | 17 days 9 hours 50 min 46 sec |
CPU time | 14 days 4 hours 55 min 45 sec |
Validate state | Invalid |
Credit | 9,331.20 |
Device peak FLOPS | 2.92 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 11:33:07 (7524): No heartbeat from core client for 30 sec - exiting 11:33:13 (7524): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:33:14 (7524): No heartbeat from core client for 30 sec - exiting 11:33:15 (7524): No heartbeat from core client for 30 sec - exiting 11:33:16 (7524): No heartbeat from core client for 30 sec - exiting 11:33:17 (7524): No heartbeat from core client for 30 sec - exiting 11:33:18 (7524): No heartbeat from core client for 30 sec - exiting 11:33:19 (7524): No heartbeat from core client for 30 sec - exiting 11:33:21 (7524): No heartbeat from core client for 30 sec - exiting 11:33:22 (7524): No heartbeat from core client for 30 sec - exiting 11:33:23 (7524): No heartbeat from core client for 30 sec - exiting 11:33:24 (7524): No heartbeat from core client for 30 sec - exiting 11:33:25 (7524): No heartbeat from core client for 30 sec - exiting 11:33:26 (7524): No heartbeat from core client for 30 sec - exiting 11:33:27 (7524): No heartbeat from core client for 30 sec - exiting 11:33:28 (7524): No heartbeat from core client for 30 sec - exiting 11:33:29 (7524): No heartbeat from core client for 30 sec - exiting 11:33:30 (7524): No heartbeat from core client for 30 sec - exiting 11:33:32 (7524): No heartbeat from core client for 30 sec - exiting 11:33:33 (7524): No heartbeat from core client for 30 sec - exiting 11:33:34 (7524): No heartbeat from core client for 30 sec - exiting 11:33:35 (7524): No heartbeat from core client for 30 sec - exiting 11:33:37 (7524): No heartbeat from core client for 30 sec - exiting 11:36:16 (9688): No heartbeat from core client for 30 sec - exiting 11:37:03 (9688): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:37:58 (9688): No heartbeat from core client for 30 sec - exiting 11:37:59 (9688): No heartbeat from core client for 30 sec - exiting 11:38:00 (9688): No heartbeat from core client for 30 sec - exiting 11:38:01 (9688): No heartbeat from core client for 30 sec - exiting 11:38:02 (9688): No heartbeat from core client for 30 sec - exiting 11:38:03 (9688): No heartbeat from core client for 30 sec - exiting 11:38:05 (9688): No heartbeat from core client for 30 sec - exiting 11:38:06 (9688): No heartbeat from core client for 30 sec - exiting Atmos Hold Restart file rename failed on atmos_restart.hold Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... MainError: 12:28:31 AM No files match the supplied pattern. MainError: 12:28:31 AM No files match the supplied pattern. MainError: 12:44:09 AM No files match the supplied pattern. MainError: 12:44:09 AM No files match the supplied pattern. Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... MainError: 12:04:47 AM No files match the supplied pattern. MainError: 12:04:47 AM No files match the supplied pattern. MainError: 02:44:03 PM No files match the supplied pattern. MainError: 02:44:03 PM No files match the supplied pattern. MainError: 05:27:45 AM No files match the supplied pattern. MainError: 05:27:45 AM No files match the supplied pattern. MainError: 08:12:37 PM No files match the supplied pattern. MainError: 08:12:37 PM No files match the supplied pattern. MainError: 10:48:37 AM No files match the supplied pattern. MainError: 10:48:37 AM No files match the supplied pattern. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1200, iMonCtr=1 Model crash detected, will try to restart... MainError: 01:24:47 AM No files match the supplied pattern. MainError: 01:24:47 AM No files match the supplied pattern. MainError: 03:22:02 PM No files match the supplied pattern. MainError: 03:22:02 PM No files match the supplied pattern. MainError: 05:51:16 AM No files match the supplied pattern. MainError: 05:51:16 AM No files match the supplied pattern. Error converting file to netcdf: dataout/o31aka.ph11c10 Error converting file to netcdf: dataout/o31aka.pg11c10 Error converting file to netcdf: dataout/o31aka.pe11c10 MainError: 08:53:30 PM No files match the supplied pattern. MainError: 08:53:30 PM No files match the supplied pattern. BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
21 Jan 2013 20:57:26 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 777,600 | 1,392,428 | 1.7907 |
21 Jan 2013 05:55:41 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 751,680 | 1,344,721 | 1.7890 |
20 Jan 2013 15:40:55 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 725,760 | 1,297,176 | 1.7873 |
20 Jan 2013 01:28:15 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 699,840 | 1,249,356 | 1.7852 |
19 Jan 2013 10:50:33 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 673,920 | 1,201,871 | 1.7834 |
18 Jan 2013 20:41:41 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 648,000 | 1,154,504 | 1.7816 |
18 Jan 2013 05:29:08 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 622,080 | 1,106,953 | 1.7794 |
17 Jan 2013 15:01:38 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 596,160 | 1,059,515 | 1.7772 |
17 Jan 2013 00:07:44 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 570,240 | 1,011,449 | 1.7737 |
16 Jan 2013 12:45:12 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 544,320 | 971,071 | 1.7840 |
16 Jan 2013 00:30:29 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 518,400 | 927,785 | 1.7897 |
15 Jan 2013 11:34:11 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 492,480 | 881,730 | 1.7904 |
14 Jan 2013 22:46:05 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 466,560 | 836,102 | 1.7921 |
14 Jan 2013 09:53:57 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 440,640 | 790,280 | 1.7935 |
13 Jan 2013 20:58:52 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 414,720 | 744,001 | 1.7940 |
13 Jan 2013 07:34:15 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 388,800 | 697,566 | 1.7942 |
12 Jan 2013 18:31:30 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 362,880 | 651,049 | 1.7941 |
12 Jan 2013 04:53:39 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 336,960 | 604,662 | 1.7945 |
11 Jan 2013 15:05:36 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 311,040 | 558,180 | 1.7946 |
11 Jan 2013 01:13:01 | 1124940 | 15517324 | hadcm3n_o31a_2140_40_008280608_2 | 285,120 | 511,896 | 1.7954 |
©2024 climateprediction.net