Monday, November 12, 2012

Hadoop bug on SmartOS



Recently I had a chance to help with a problem that occurred when trying to run a Hadoop benchmark on SmartOS.  Basically, some of the Java code written for Hadoop was making an implicit assumption that the code was being run on Linux.  When running the benchmark, the following error showed up:


12/10/01 20:58:49 INFO mapred.JobClient: Task Id : attempt_201209262235_0003_m_000003_0, Status : FAILED
ENOENT: No such file or directory
at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:312)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:385)
at org.apache.hadoop.mapred.Child.main(Child.java:229)

The NativeIO.open call basically calls the open(2) system call.  Here, it is being called from
createForWrite() in SecureIOUtils.java at line 161.  Here is the code for SecureIOUtils.java:

 /**
  * Open the specified File for write access, ensuring that it does not exist.
  * @param f the file that we want to create
  * @param permissions we want to have on the file (if security is enabled)
  *
  * @throws AlreadyExistsException if the file already exists
  * @throws IOException if any other error occurred
  */
 public static FileOutputStream createForWrite(File f, int permissions)
 throws IOException {
   if (skipSecurity) {
     return insecureCreateForWrite(f, permissions);
   } else {
     // Use the native wrapper around open(2)
     try {
       FileDescriptor fd = NativeIO.open(f.getAbsolutePath(),  <-- 161="161" line="line" span="span">
         NativeIO.O_WRONLY | NativeIO.O_CREAT | NativeIO.O_EXCL,
         permissions);
       return new FileOutputStream(fd);
     } catch (NativeIOException nioe) {
       if (nioe.getErrno() == Errno.EEXIST) {
         throw new AlreadyExistsException(nioe);
       }
       throw nioe;
     }
   }
 }

So, the open is called with O_WRONLY, O_CREAT, and O_EXCL flags.  However, the truss(1) output
shows a different story.  We started the following truss on a slave machine, and ran the test again:

# truss -f -a -wall -topen,close,fork,write,stat,fstat -o ~/mapred.truss -p $(pgrep -f Djava.library.path)

And here is the relevant truss output:

51039/28: open("/opt/local/hadoop/bin/../logs/userlogs/job_201210171129_0008/attempt_201210171129_0008_m_000002_1/log.tmp", O_WRONLY|O_DSYNC|O_NONBLOCK) Err#2 ENOENT

The error message is emitted shortly after the above open(2) system call.  So, the code shows O_WRONLY, O_CREAT, and O_EXCL, which is what one
would expect for a routine that is called createForWrite().  However, the flags actually passed to open() are: O_WRONLY, O_DSYNC, and O_NONBLOCK.
Why the difference?

Grepping for O_CREAT in the hadoop source finds it defined at:

./trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java:

/**
* JNI wrappers for various native IO-related calls not available in Java.
* These functions should generally be used alongside a fallback to another
* more portable mechanism.
*/
public class NativeIO {
 // Flags for open() call from bits/fcntl.h
 public static final int O_RDONLY   =    00;
 public static final int O_WRONLY   =    01;
 public static final int O_RDWR     =    02;
 public static final int O_CREAT    =  0100;
 public static final int O_EXCL     =  0200;
 public static final int O_NOCTTY   =  0400;
 public static final int O_TRUNC    = 01000;
 public static final int O_APPEND   = 02000;
 public static final int O_NONBLOCK = 04000;
 public static final int O_SYNC   =  010000;
 public static final int O_ASYNC  =  020000;
 public static final int O_FSYNC = O_SYNC;
 public static final int O_NDELAY = O_NONBLOCK;

The comment in the above code says that the flags for the open(2) call are coming from bit/fcntl.h.
However, on SmartOS (as well as illumos and Solaris), the same flags in sys/fcntl.h show:

/*
* Flag values accessible to open(2) and fcntl(2)
* The first five can only be set (exclusively) by open(2).
*/
#define   O_RDONLY        0
#define        O_WRONLY        1
#define        O_RDWR          2
#define        O_SEARCH        0x200000
#define O_EXEC          0x400000
#if defined(__EXTENSIONS__) || !defined(_POSIX_C_SOURCE)
#define O_NDELAY        0x04    /* non-blocking I/O */
#endif /* defined(__EXTENSIONS__) || !defined(_POSIX_C_SOURCE) */
#define  O_APPEND        0x08    /* append (writes guaranteed at the end) */
#if defined(__EXTENSIONS__) || !defined(_POSIX_C_SOURCE) || \
       (_POSIX_C_SOURCE > 2) || defined(_XOPEN_SOURCE)
#define  O_SYNC          0x10    /* synchronized file update option */
#define    O_DSYNC         0x40    /* synchronized data update option */
#define    O_RSYNC         0x8000  /* synchronized file update option */
                          /* defines read/write file integrity */
#endif /* defined(__EXTENSIONS__) || !defined(_POSIX_C_SOURCE) ... */
#define     O_NONBLOCK      0x80    /* non-blocking I/O (POSIX) */
#ifdef    _LARGEFILE_SOURCE
#define        O_LARGEFILE     0x2000
#endif

/*
* Flag values accessible only to open(2).
*/
#define      O_CREAT         0x100   /* open with file create (uses third arg) */
#define     O_TRUNC         0x200   /* open with truncation */
#define       O_EXCL          0x400   /* exclusive open */
#define     O_NOCTTY        0x800   /* don't allocate controlling tty (POSIX) */
#define     O_XATTR         0x4000  /* extended attribute */
#define O_NOFOLLOW      0x20000 /* don't follow symlinks */
#define      O_NOLINKS       0x40000 /* don't allow multiple hard links */

The O_CREAT flag (from bits/fcntl.h) is 0100 (octal) in the NativeIO.java file, but 0x100 on SmartOS.  The 0100 value is 0x40, which corresponds to O_DSYNC on SmartOS. Similarly, the O_EXCL value of 0200 is hex value 0x80, which is O_NONBLOCK on SmartOS.  Whoever wrote this code made an assumption that they were running on a Linux system.  The flags are different yet again on FreeBSD and Mac OS (for instance, O_CREAT is 0x200 on these systems).  My colleague, Filip Hajny, changed the flags to match the SmartOS flags, and rebuilt everything to fix the problem.

This problem reminds me how many little things like this can occur when porting an application that was developed on one operating system to run on another operating system.  It is possible that for all but the simplest of applications, some changes are going to be needed.  For the above problem, POSIX specifies the flags that open(2) can take (O_CREAT, O_RDWR, etc.), but does not specify the values of those flags.  Basically, if the code could include the correct header file (fcntl.h in both cases), the problem would not occur.  It is an important reminder that all code should be reviewed and tested on as many different systems as possible.

No comments: