climateprediction.net home page
Task 16184773

Task 16184773

Name hadcm3n_oc4w_1900_40_008470851_1
Workunit 8621690
Created 31 Dec 2013, 21:11:12 UTC
Sent 31 Dec 2013, 21:11:19 UTC
Report deadline 2 Apr 2014, 4:38:30 UTC
Received 8 Feb 2014, 22:09:40 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 22 (0x00000016) Unknown error code
Computer ID 1297364
Run time 1 days 18 hours 39 min 5 sec
CPU time 1 days 17 hours 45 min 2 sec
Validate state Invalid
Credit 622.08
Device peak FLOPS 2.27 GFLOPS
Application version UK Met Office Coupled Model Full Resolution Ocean v6.07
i686-pc-linux-gnu
Stderr
<core_client_version>7.1.0</core_client_version>
<![CDATA[
<message>
process exited with code 22 (0x16, -234)
</message>
<stderr_txt>
17:15:18 (28275): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
04:37:05 (30101): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
05:47:37 (30461): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
06:58:21 (30781): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
08:11:52 (31332): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
09:34:06 (32099): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
14:52:26 (1783): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
16:50:32 (2263): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
16:50:54 (2263): No heartbeat from core client for 30 sec - exiting
16:50:55 (2263): No heartbeat from core client for 30 sec - exiting
16:50:56 (2263): No heartbeat from core client for 30 sec - exiting
17:54:13 (2849): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
17:54:33 (2849): No heartbeat from core client for 30 sec - exiting
17:54:34 (2849): No heartbeat from core client for 30 sec - exiting
17:54:35 (2849): No heartbeat from core client for 30 sec - exiting
17:54:36 (2849): No heartbeat from core client for 30 sec - exiting
17:54:37 (2849): No heartbeat from core client for 30 sec - exiting
17:54:38 (2849): No heartbeat from core client for 30 sec - exiting
17:54:39 (2849): No heartbeat from core client for 30 sec - exiting
17:54:40 (2849): No heartbeat from core client for 30 sec - exiting
17:55:57 (3041): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
19:00:42 (3063): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
19:01:44 (3063): No heartbeat from core client for 30 sec - exiting
20:02:54 (3311): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
20:05:03 (3658): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
21:06:30 (3683): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
21:10:37 (4048): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
22:14:11 (4073): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
01:14:03 (4293): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
02:19:31 (4826): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
04:33:19 (5013): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
04:34:18 (5013): No heartbeat from core client for 30 sec - exiting
Atmos Hold Restart file rename failed on atmos_restart.hold
05:45:28 (5784): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
06:53:57 (6438): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
09:03:16 (6951): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
11:10:09 (7593): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
11:11:23 (7770): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
16:43:42 (8486): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
16:46:20 (8923): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Atmos Hold Restart file rename failed on atmos_restart.hold
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
15:48:29 (21681): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 63 - Return code = 1

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 64 - Return code = 1

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 65 - Return code = 1

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 66 - Return code = 1

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 67 - Return code = 1

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 68 - Return code = 1

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 69 - Return code = 1
Error: Input file: dataout/oc4wko.pja2c10 is not a valid UM file.
Error converting file to netcdf: dataout/oc4wko.pja2c10
Error: Input file: dataout/oc4wko.pia2c10 is not a valid UM file.
Error converting file to netcdf: dataout/oc4wko.pia2c10
Error: Input file: dataout/oc4wko.pfa2c10 is not a valid UM file.
Error converting file to netcdf: dataout/oc4wko.pfa2c10
Error: Input file: dataout/oc4wka.pha2c10 is not a valid UM file.
Error converting file to netcdf: dataout/oc4wka.pha2c10
Error: Input file: dataout/oc4wka.pga2c10 is not a valid UM file.
Error converting file to netcdf: dataout/oc4wka.pga2c10
Error: Input file: dataout/oc4wka.pea2c10 is not a valid UM file.
Error converting file to netcdf: dataout/oc4wka.pea2c10
Error: Input file: dataout/oc4wka.pda2c10 is not a valid UM file.
Error converting file to netcdf: dataout/oc4wka.pda2c10
CPDN Monitor - Quit request from BOINC...

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 63 - Return code = 1

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 64 - Return code = 1

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 65 - Return code = 1

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 66 - Return code = 1

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 67 - Return code = 1

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 68 - Return code = 1

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error feof - Unit 69 - Return code = 1
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
19:12:47 (1215): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
19:15:30 (1505): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
04:05:15 (29333): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
06:05:20 (29588): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
06:07:58 (29920): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
08:08:56 (29936): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
SIGABRT: abort called
Stack trace (9 frames):
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x840da8f]
[0xf7732400]
[0xf7732430]
/usr/lib/libc.so.6(gsignal+0x46)[0xf753d936]
/usr/lib/libc.so.6(abort+0x143)[0xf753f173]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x83400c3]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x838f395]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x839bdf8]
/usr/lib/libc.so.6(__libc_start_main+0xf3)[0xf7528963]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=30275, iMonCtr=1
Model crash detected, will try to restart...
SIGABRT: abort called
Stack trace (9 frames):
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x840da8f]
[0xf7787400]
[0xf7787430]
/usr/lib/libc.so.6(gsignal+0x46)[0xf7592936]
/usr/lib/libc.so.6(abort+0x143)[0xf7594173]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x83400c3]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x838f395]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x839bdf8]
/usr/lib/libc.so.6(__libc_start_main+0xf3)[0xf757d963]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=30275, iMonCtr=1
Model crash detected, will try to restart...
SIGABRT: abort called
Stack trace (9 frames):
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x840da8f]
[0xf77ad400]
[0xf77ad430]
/usr/lib/libc.so.6(gsignal+0x46)[0xf75b8936]
/usr/lib/libc.so.6(abort+0x143)[0xf75ba173]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x83400c3]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x838f395]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x839bdf8]
/usr/lib/libc.so.6(__libc_start_main+0xf3)[0xf75a3963]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=30275, iMonCtr=1
Model crash detected, will try to restart...
SIGABRT: abort called
Stack trace (9 frames):
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x840da8f]
[0xf77c3400]
[0xf77c3430]
/usr/lib/libc.so.6(gsignal+0x46)[0xf75ce936]
/usr/lib/libc.so.6(abort+0x143)[0xf75d0173]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x83400c3]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x838f395]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x839bdf8]
/usr/lib/libc.so.6(__libc_start_main+0xf3)[0xf75b9963]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=30275, iMonCtr=1
Model crash detected, will try to restart...
SIGABRT: abort called
Stack trace (9 frames):
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x840da8f]
[0xf772c400]
[0xf772c430]
/usr/lib/libc.so.6(gsignal+0x46)[0xf7537936]
/usr/lib/libc.so.6(abort+0x143)[0xf7539173]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x83400c3]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x838f395]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x839bdf8]
/usr/lib/libc.so.6(__libc_start_main+0xf3)[0xf7522963]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=30275, iMonCtr=1
Model crash detected, will try to restart...
SIGABRT: abort called
Stack trace (9 frames):
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x840da8f]
[0xf76fd400]
[0xf76fd430]
/usr/lib/libc.so.6(gsignal+0x46)[0xf7508936]
/usr/lib/libc.so.6(abort+0x143)[0xf750a173]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x83400c3]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x838f395]
/usr/local/ondrejch/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x839bdf8]
/usr/lib/libc.so.6(__libc_start_main+0xf3)[0xf74f3963]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=30275, iMonCtr=1
Model crash detected, will try to restart...
Sorry, too many model crashes! :-(
Called boinc_finish

</stderr_txt>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
08 Feb 2014 08:20:44 1297364 16184773 hadcm3n_oc4w_1900_40_008470851_1 51,840 123,012 2.3729
02 Jan 2014 02:33:54 1297364 16184773 hadcm3n_oc4w_1900_40_008470851_1 25,920 59,234 2.2853


©2024 climateprediction.net