You are viewing djberg96

Testing 1,2,3... - Why not use FFI? Here's why. [entries|archive|friends|userinfo]
djberg96

[ website | Sapphire ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Links
[Links:| Ruby Home RubyForge RAA comp.lang.ruby Ruby Documentation ]

Why not use FFI? Here's why. [Jun. 2nd, 2010|07:22 pm]
Previous Entry Add to Memories Share Next Entry
[Tags|, ]
[mood |coldcold]

On the FFI wiki there's a nice list of reasons why you should use FFI. That's the foreign function interface for Ruby. I imagine other languages have some similar facility.

It's all Unicorns and Rainbows in theory. It'll be pure Ruby! It's cross platform! You can use it with JRuby and Rubinius!

Unfortunately, my experience has not been too keen with FFI. I'm going to lay out a few reasons why you might not want to switch to FFI unless you absolutely, positively have to get your extensions working with JRuby, Rubinius, or IronRuby.

First, the last time I checked (and someone correct me if this has changed), you can't build FFI with anything except the GNU tool chain. That means no support for a Ruby built with the Sun Studio compiler, the HP-UX compiler and, perhaps most importantly, the Microsoft tool chain. The good news for Microsofties is that you have the mingw one-click installer option. If you're using that you'll be ok. Otherwise, tough crap.

Second, the alternative implementations may not give you the low level access you need. For example, my file-temp library does not work with JRuby even though it uses FFI because JRuby cannot deal with low level file descriptors. That kills a lot of low level systems programming right out of the gate.

Third, the up front declarations, combined with cross platform support, are proving to be extremely burdensome. Consider a simple interface for the getpwent() function. You might naively start with something like this on Linux:
attach_function :getpwent, [], :pointer
attach_function :setpwent, [], :void
attach_function :endpwent, [], :void

class PasswdStruct < FFI::Struct
  layout(
    :pw_name,   :string,
    :pw_passwd, :string,
    :pw_uid,    :uint,
    :pw_gid,    :uint,
    :pw_gecos,  :string,
    :pw_dir,    :string,
    :pw_shell,  :string
  )
end

This works fine. On Linux. But you immediately run into trouble the moment you try to run this on Solaris. Why? Because the passwd struct on Solaris not only contains different members, it contains some of the same struct members, but in a different order.

For those of you who might not be C programmers, I'll elaborate a bit on why the order matters. You see, when you declare a variable in C, you're really reserving memory. With a struct you're essentially reserving a block of contiguous memory. That's the important bit.

With C this doesn't matter. You can just access the memory by name, e.g. pwd->pw_name, pwd->pw_uid, and so on. At worst you'll have to add an #ifdef check before trying to access it. I don't have to worry about the ordering, because it's already been ordered for me by a header file included on the operating system.

With FFI this becomes a major hassle. It's easier to show you why if I show you what the declaration would have to look like on Solaris:
class PasswdStruct < FFI::Struct
  layout(
    :pw_name,    :string,
    :pw_passwd,  :string,
    :pw_uid,     :uint,
    :pw_gid,     :uint,
    :pw_age,     :string,
    :pw_comment, :string,
    :pw_gecos,   :string,
    :pw_dir,     :string,
    :pw_shell,   :string
  )
end

There are two things to notice. First, it contains two additional members, pw_age and pw_comment. Second, it also has a pw_gecos field, but it's not in the same position. That's where that contiguous memory comes into play. I can't simply reference :pw_gecos by name on any old Unix platform and call it a day the way I can with C, because it's a different segment of memory. To be more specific, :pw_gecos on Linux should be held in bytes 16-19, while on Solaris it's 24-27.

So, if you had thoughts of just declaring one massive struct that contains every struct member from every platform you can think of you're out of luck because, while you can reference that data, it's probably not going to return the data you think it will because it's the wrong chunk of memory.

So now what do we do?

We could create an array of members first, and adjust it based on platform like this:
# Danger...
members = [
  :pw_name,    :string,
  :pw_passwd,  :string,
  :pw_uid,     :uint,
  :pw_gid,     :uint,
  :pw_gecos,   :string,
  :pw_dir,     :string,
  :pw_shell,   :string
]
members.insert(8, :pw_age, :string, :pw_comment, :string) if CONFIG['host_os'] =~ /solaris/
layout(*members)

Unfortunately, there are a host of problems with this approach.

First, it means you now have have to eyeball every struct definition on every platform to see what the declaration order is. That means sprinkling your code with a bunch of platform checks. Even then you might get it wrong, because the struct definitions may be different on earlier or later versions of the operating system. For 3rd party libraries, the definitions could change between releases, and you're again relegated to eyeballing the struct declarations.

Second, it wouldn't be so bad, except that 3rd party libraries (and some operating systems) have a habit of declaring their own variable types. Now you not only have to know the struct definition, you have to figure out what the hell a type "foo_int_t" is (or whatever) so that you're sure to reserve the right amount of memory for it.

Third, some struct members are opaque, and you simply can't declare the variable type, because there's no way for you to figure it out. Now you're relegated to using FFI::Pointers and extra work.

Lastly, the Ruby community might be good at local testing, but it has proven to be exceedingly bad when it comes to cross-platform testing. In practice most testing only occurs on Linux and OS X (and in some cases only the latter), with either no thought whatsoever given to other platforms, or simply no ability to access those other platforms. Now you're relying much more heavily on 3rd party patches.

So, what do we do in practice then? Well, you could do what the JRuby guys did and just create separate source files for every single platform where you have this kind of issue. To wit:
$ find . -name "etc.rb"
./ruby/site_ruby/shared/ffi/platform/i386-openbsd/etc.rb
./ruby/site_ruby/shared/ffi/platform/powerpc-aix/etc.rb
./ruby/site_ruby/shared/ffi/platform/i386-linux/etc.rb
./ruby/site_ruby/shared/ffi/platform/sparc-solaris/etc.rb
./ruby/site_ruby/shared/ffi/platform/x86_64-darwin/etc.rb
./ruby/site_ruby/shared/ffi/platform/x86_64-solaris/etc.rb
./ruby/site_ruby/shared/ffi/platform/x86_64-linux/etc.rb
./ruby/site_ruby/shared/ffi/platform/powerpc-darwin/etc.rb
./ruby/site_ruby/shared/ffi/platform/i386-windows/etc.rb
./ruby/site_ruby/shared/ffi/platform/i386-solaris/etc.rb
./ruby/site_ruby/shared/ffi/platform/i386-darwin/etc.rb
./ruby/site_ruby/shared/ffi/platform/sparcv9-solaris/etc.rb

Wow, that looks like a real joy to maintain, doesn't it?

The other solution is to sprinkle your code with a bunch of platform checks. I also released mkmf-lite just this week to help with this problem, too, but it's like putting a band aid on a fractured arm really.

Anyway, the upshot of all this work is that, in my opinion, FFI is actually more difficult to use than a C extension in practice for all but the simplest libraries.

You've been warned.
linkReply

Comments:
From: http://www.google.com/profiles/JACortinas
2010-06-03 03:26 am (UTC)

That was a great article

(Link)

You have a lot of good points! I've had issues with plugins that use FFI on solaris, and it is extremely frustrating having to deal with these cross-compatibility issues. Good job, once again!
From: https://www.google.com/accounts/o8/id?id=AItOawlijTeiZOON-IFeRFAUKnKOkH1uDuxeeac
2010-06-03 08:18 pm (UTC)

Good points, wrong conclusion

(Link)

These aren't reasons not to use FFI...these are things that need to be improved in FFI.

* Cross-platform and structs: I definitely recognize that structs are a hard problem, but they're primarily a pain because you don't have the C compiler to help you with layout. If we can enlist the appropriate parts of the C toolchain into the FFI binding process, this would be a solved problem. There are a few attempts at that already like ffi_inliner and the struct generation libraries. The same logic applies to type widths and so on. FFI is a low-level API that needs a better high-level binding toolkit.
* GNU toolchain: Patches accepted!
* File descriptors not working on some impls: This is certainly something we could do better in JRuby, like having a NIO Channel that can wrap arbitrary file descriptors. But it's not a failing of FFI, it's a gap in the JVM that we need to solve. We'd appreciate help.

I'll say it again: FFI is a low-level binding API that needs better support for high-level layout and mapping. So your points are all valid, but the conclusion should be that raw FFI is hard to do right and we need better tools to make it easy.
I've run in to pretty much all of these problems. It sucks. Problems like these make developing with FFI an incredible pain.

Even worse is I think you've missed a few problems:

1. Struct layouts differ not only per platform but *per version* of a particular library too! That means you have to do platform *and* version detection.

2. Since the extension isn't compiled, you can't do feature detection using the preprocessor. Many libraries will define preprocessor macros so that you can detect which functions it makes available. With FFI, you have to find some sort of test you can perform at runtime.

3. Even if you make your extension work on all those ruby platforms, it *still* won't work on Google App Engine or Android.

I definitely think FFI has it's place. It is a good tool to have in the toolbox. But it is not the cross implementation savior it's been billed as.
[User Picture]From: djberg96
2010-06-06 06:31 am (UTC)

Re: Thank You!

(Link)

1. Actually I did mention the per version issue. But, yeah, that's going to be a pain. Not that it's not a pain with C extensions, but it's going to be worse with FFI.

2. That was actually the motivation for mkmf-lite. Mind you, it still requires a C compiler on your system.

3. I'm not familiar with either platform, but thanks for the info.

Agreed, having FFI is certainly a good thing. It's certainly better than nothing!
From: (Anonymous)
2010-06-04 03:17 am (UTC)

GNU Toolchain

(Link)

One thing we could start working on is freeing libffi from autoconf. A CMake build system would be just as good, and would allow easier building on Windows with Visual Studios (http://stackoverflow.com/questions/395169/using-cmake-to-generate-visual-studio-c-project-files). Many projects ship with both configure.ac and CMakeLists.txt files (http://src.opensolaris.org/source/xref/webstack/webstack/trunk/src/mysql-5.0.45/CMakeLists.txt).

I just successfully compiled libffi using clang (from the LLVM project). So libffi isn't in that horrible of a state.

Dynamic struct layout detection would be *awesome*.
[User Picture]From: djberg96
2010-06-06 06:32 am (UTC)

Re: GNU Toolchain

(Link)

Any reason we can't use rake-compiler? But, if that's not an option, CMake would be nice.