Name | hadam3p_pnw_b1gx_1984_1_007887636_2 |
Workunit | 8042748 |
Created | 20 May 2012, 14:44:14 UTC |
Sent | 20 May 2012, 14:52:01 UTC |
Report deadline | 2 May 2013, 20:12:01 UTC |
Received | 21 Jul 2012, 8:48:58 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 0 (0x00000000) |
Computer ID | 1178558 |
Run time | 4 days 7 hours 51 min 42 sec |
CPU time | 9 hours 32 min 3 sec |
Validate state | Invalid |
Credit | 1,754.30 |
Device peak FLOPS | 1.91 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Pacific North West v6.09 windows_intelx86 |
Stderr | <core_client_version>6.12.34</core_client_version> <![CDATA[ <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1544, selfPID=3340, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6196, selfPID=1108, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3472, selfPID=3500, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6212, selfPID=5060, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6536, selfPID=6536, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4196, selfPID=4928, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 1 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5060, selfPID=4828, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 1 23:04:05 (4788): No heartbeat from core client for 30 sec - exiting 23:04:06 (4788): No heartbeat from core client for 30 sec - exiting 23:04:08 (4788): No heartbeat from core client for 30 sec - exiting 23:04:09 (4788): No heartbeat from core client for 30 sec - exiting 23:04:10 (4788): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5752, selfPID=4812, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1704, selfPID=3784, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 2 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4772, selfPID=4828, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6056, selfPID=6056, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3980, selfPID=4968, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4932, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5252, selfPID=1100, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 3 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2360, selfPID=5108, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 3 CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3404, selfPID=3404, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4244, selfPID=4464, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5752, selfPID=5752, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6612, selfPID=6612, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2496, selfPID=2496, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4392, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6652, selfPID=6652, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=340, selfPID=4628, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 4 CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5864, selfPID=5864, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5476, selfPID=5476, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4316, selfPID=4552, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6056, selfPID=6056, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2272, selfPID=4604, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2388, selfPID=2388, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6300, selfPID=6300, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6796, selfPID=6796, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4548, selfPID=4548, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5992, selfPID=4452, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4248, selfPID=4248, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6300, selfPID=6300, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5296, selfPID=4956, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... zip error: Could not create output file (was replacing the original zip file) CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5096, selfPID=3960, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7504, selfPID=7504, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5844, selfPID=4604, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 6 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4940, selfPID=3860, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 6 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6012, selfPID=3864, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 7 Called boinc_finish Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4176, selfPID=4668, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 0 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5388, selfPID=996, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 0 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4164, selfPID=4120, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4824, selfPID=3760, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 0 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5228, selfPID=4980, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 0 cpdnmonitor: cannot open input file C:\Datos\boinc/projects/climateprediction.net/hadam3p_pnw_b1gx_1984_1_007887636/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Datos\boinc/projects/climateprediction.net/hadam3p_pnw_b1gx_1984_1_007887636/dataout/region_restart.day after 11 attempts Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO tmp/xaakm.pipe_dummy 2048 Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO tmp/xaakg.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 0 Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_pnw_b1gx_1984_1_007887636_2_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_b1gx_1984_1_007887636_2_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_b1gx_1984_1_007887636_2_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_b1gx_1984_1_007887636_2_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_b1gx_1984_1_007887636_2_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
18 Jul 2012 10:14:24 | 1178558 | 14725190 | hadam3p_pnw_b1gx_1984_1_007887636_2 | 80,736 | 294,938 | 3.6531 |
16 Jul 2012 15:37:53 | 1178558 | 14725190 | hadam3p_pnw_b1gx_1984_1_007887636_2 | 69,220 | 254,290 | 3.6736 |
15 Jul 2012 21:08:22 | 1178558 | 14725190 | hadam3p_pnw_b1gx_1984_1_007887636_2 | 69,217 | 253,842 | 3.6673 |
15 Jul 2012 17:06:08 | 1178558 | 14725190 | hadam3p_pnw_b1gx_1984_1_007887636_2 | 69,216 | 253,386 | 3.6608 |
11 Jul 2012 06:48:19 | 1178558 | 14725190 | hadam3p_pnw_b1gx_1984_1_007887636_2 | 57,697 | 211,459 | 3.6650 |
10 Jul 2012 21:10:51 | 1178558 | 14725190 | hadam3p_pnw_b1gx_1984_1_007887636_2 | 57,696 | 211,037 | 3.6577 |
06 Jul 2012 15:35:26 | 1178558 | 14725190 | hadam3p_pnw_b1gx_1984_1_007887636_2 | 46,178 | 168,589 | 3.6509 |
05 Jul 2012 15:54:43 | 1178558 | 14725190 | hadam3p_pnw_b1gx_1984_1_007887636_2 | 46,176 | 168,119 | 3.6408 |
02 Jul 2012 17:11:27 | 1178558 | 14725190 | hadam3p_pnw_b1gx_1984_1_007887636_2 | 34,656 | 123,764 | 3.5712 |
25 Jun 2012 08:49:49 | 1178558 | 14725190 | hadam3p_pnw_b1gx_1984_1_007887636_2 | 23,136 | 86,504 | 3.7389 |
07 Jun 2012 09:56:08 | 1178558 | 14725190 | hadam3p_pnw_b1gx_1984_1_007887636_2 | 11,616 | 46,199 | 3.9772 |
©2024 climateprediction.net