Task 16004908

Name	hadcm3n_o11f_1980_40_008384490_2
Workunit	8535349
Created	5 Sep 2013, 20:30:45 UTC
Sent	5 Sep 2013, 21:08:24 UTC
Report deadline	6 Dec 2013, 4:35:35 UTC
Received	1 Feb 2014, 9:44:45 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	4681
Run time	67 days 8 hours 1 min 14 sec
CPU time	11 days 17 hours 45 min 53 sec
Validate state	Invalid
Credit	3,110.40
Device peak FLOPS	0.45 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2528, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2232, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2232, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2232, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1080, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2208, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2208, iMonCtr=1 Model crash detected, will try to restart... 12:54:20 (2208): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3060, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3060, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3060, iMonCtr=1 Model crash detected, will try to restart... 22:16:12 (3060): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3252, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3252, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3252, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3252, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3252, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3584, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3584, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3584, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1708, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1300, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=836, iMonCtr=1 Model crash detected, will try to restart... Signal 4 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=836, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/o11fko.pji4c10 Error converting file to netcdf: dataout/o11fko.pii4c10 Error converting file to netcdf: dataout/o11fko.pfi4c10 Error converting file to netcdf: dataout/o11fka.phi4c10 Error converting file to netcdf: dataout/o11fka.pgi4c10 Error converting file to netcdf: dataout/o11fka.pei4c10 Error converting file to netcdf: dataout/o11fka.pdi4c10 08:47:13 (2528): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/o11fko.pji4c10 Error converting file to netcdf: dataout/o11fko.pii4c10 Error converting file to netcdf: dataout/o11fko.pfi4c10 Error converting file to netcdf: dataout/o11fka.phi4c10 Error converting file to netcdf: dataout/o11fka.pgi4c10 Error converting file to netcdf: dataout/o11fka.pei4c10 Error converting file to netcdf: dataout/o11fka.pdi4c10 Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... 22:59:33 (4040): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3924, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4060, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4060, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4060, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4060, iMonCtr=1 Model crash detected, will try to restart... 23:05:34 (4060): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 08:00:32 (3960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:00:33 (3960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3624, iMonCtr=1 Model crash detected, will try to restart... 02:43:56 (3624): No heartbeat from core client for 30 sec - exiting 02:43:57 (3624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... zip error: Could not create output file (was replacing the original zip file) cpdnmonitor: cannot open input file C:\Documents and Settings\All Users\Application Data\BOINC/projects/climateprediction.net/hadcm3n_o11f_1980_40_008384490/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Documents and Settings\All Users\Application Data\BOINC/projects/climateprediction.net/hadcm3n_o11f_1980_40_008384490/dataout/ocean_restart.day after 11 attempts Model crashed: READ_FLH: I/O error tmp/pipe_dummy 2048 cpdnmonitor: cannot open input file C:\Documents and Settings\All Users\Application Data\BOINC/projects/climateprediction.net/hadcm3n_o11f_1980_40_008384490/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Documents and Settings\All Users\Application Data\BOINC/projects/climateprediction.net/hadcm3n_o11f_1980_40_008384490/dataout/ocean_restart.day after 11 attempts Model crashed: READ_FLH: I/O error tmp/pipe_dummy 2048 cpdnmonitor: cannot open input file C:\Documents and Settings\All Users\Application Data\BOINC/projects/climateprediction.net/hadcm3n_o11f_1980_40_008384490/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Documents and Settings\All Users\Application Data\BOINC/projects/climateprediction.net/hadcm3n_o11f_1980_40_008384490/dataout/ocean_restart.day after 11 attempts Model crashed: READ_FLH: I/O error tmp/pipe_dummy 2048 cpdnmonitor: cannot open input file C:\Documents and Settings\All Users\Application Data\BOINC/projects/climateprediction.net/hadcm3n_o11f_1980_40_008384490/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Documents and Settings\All Users\Application Data\BOINC/projects/climateprediction.net/hadcm3n_o11f_1980_40_008384490/dataout/ocean_restart.day after 11 attempts Model crashed: READ_FLH: I/O error tmp/pipe_dummy 2048 cpdnmonitor: cannot open input file C:\Documents and Settings\All Users\Application Data\BOINC/projects/climateprediction.net/hadcm3n_o11f_1980_40_008384490/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Documents and Settings\All Users\Application Data\BOINC/projects/climateprediction.net/hadcm3n_o11f_1980_40_008384490/dataout/ocean_restart.day after 11 attempts Model crashed: READ_FLH: I/O error tmp/pipe_dummy 2048 cpdnmonitor: cannot open input file C:\Documents and Settings\All Users\Application Data\BOINC/projects/climateprediction.net/hadcm3n_o11f_1980_40_008384490/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Documents and Settings\All Users\Application Data\BOINC/projects/climateprediction.net/hadcm3n_o11f_1980_40_008384490/dataout/ocean_restart.day after 11 attempts Model crashed: READ_FLH: I/O error tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
01 Feb 2014 08:46:46	4681	16004908	hadcm3n_o11f_1980_40_008384490_2	259,200	1,014,883	3.9154
23 Jan 2014 05:09:35	4681	16004908	hadcm3n_o11f_1980_40_008384490_2	233,280	984,750	4.2213
15 Jan 2014 09:19:04	4681	16004908	hadcm3n_o11f_1980_40_008384490_2	207,360	859,460	4.1448
08 Jan 2014 03:59:24	4681	16004908	hadcm3n_o11f_1980_40_008384490_2	181,440	812,004	4.4753
31 Dec 2013 11:55:31	4681	16004908	hadcm3n_o11f_1980_40_008384490_2	155,520	706,539	4.5431
24 Dec 2013 01:46:55	4681	16004908	hadcm3n_o11f_1980_40_008384490_2	129,600	779,243	6.0127
16 Dec 2013 15:35:58	4681	16004908	hadcm3n_o11f_1980_40_008384490_2	103,680	590,415	5.6946
25 Oct 2013 02:09:59	4681	16004908	hadcm3n_o11f_1980_40_008384490_2	77,760	535,334	6.8844
09 Oct 2013 08:27:23	4681	16004908	hadcm3n_o11f_1980_40_008384490_2	51,840	327,207	6.3119
12 Sep 2013 21:53:30	4681	16004908	hadcm3n_o11f_1980_40_008384490_2	25,920	276,306	10.6600