Of Kqueues and Max Open Files

February 04, 2007 at 2:27 AM

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /home/darkshadow/nightproductions.net/blog/wp-includes/formatting.php on line 74

Kqueues are a great thing. They let you keep an eye on files, sockets — anything that has a file descriptor. It takes very little code, and you don’t have to worry about manually polling or the like. Awesome stuff.

There is one thing you do have to worry about, though. Your app’s number of open files. I ran into this problem in a spectacularly bad way. When I first released my application Picture Switcher (it’s a status menu item, which you can use to switch your desktop picture(s)), I had it going through the file system and checking the modification dates of the folders containing the pictures Picture Switcher knows about — when the user clicked on the status menu. This was a Really Bad Way to do things — it made the menu take a while to actually show (and if the user had a lot of folders, it could show the SPOD). So, when I was working on version 1.1 I decided to use kqueues to watch the folders instead of going through them when the user opened the menu. It was great, I didn’t need to worry about manually looking through all the folders anymore.

I get that all fixed up, and release 1.1 into the wild. And the crash reports started pouring in. I was genuinely puzzled by the reports — it had Picture Switcher crashing while trying to load a system framework. After looking at the crash logs, I thought maybe I had some sort of incompatibility with some haxies. The users who were sending me the crash reports all had quite a few of them installed. So, I began installing haxies on my system. One after another after another. And I still couldn’t reproduce the crash. I finally decided to make up a version that logged quite a bit of debugging info to a file on the Desktop, and sent that to a few of the people who were trying to help me figure it all out. This was after a few weeks had gone by — yes, I probably should have tried that earlier. I began to see what was going on, though I still didn’t know why. I saw that it was failing to open some of the file descriptors for the files, and then crashing a bit later when it tried to load a system framework. Finally, I figured it out by meticulously reproducing a user’s folder hierarchy on my system — I was hitting the max number of file descriptors I could have open and Picture Switcher was crashing after that because it couldn’t load anymore. I came to find out that system frameworks and plugins and such count as open files, and then on top of that I was keeping a file descriptor open on the user’s folders.

It didn’t take me long to search and find out how to raise the number of file descriptors I could have open. A simple call to setrlimit() would up my max number from the default 256 to the system defined max number of file descriptors an app can have open (by default, it’s at 10240 right now). And so finally, I implemented that and released version 1.1.1 (almost a month after the previous release). My users were happy, I was happy, and I learned it was a Really Good Idea to do a bit more testing before releasing, especially on something variable like this.

Do any of y’all see the flaw there? If not, don’t feel bad — I didn’t either until just recently. Or rather had it come back at me.

A few weeks ago, a user emailed me to tell me that Picture Switcher was crashing on launch. I didn’t get a crash report with it, but immediately I thought “Hmm, this seems like the max open files crashing bug.” So I emailed him back, asking if he was using the current version and asking for the crash report. He emailed me back, letting me know that he was indeed using the latest version, and also attached the crash log. Looking at it, it seemed at first that it was a different bug — the code was crashing in a method where I get the display names. So I whipped up a test version to make sure I didn’t have a bug there, and asked him to run it and send me the console log. He did, and lo and behold, there in the console output, were a few messages letting me know that it was indeed the max open files bug. I had added a log message that output an error letting me know it couldn’t open any more file descriptors when I was debugging this and had forgotten to take it out — a good thing in this case.

My immediate response to that was “Ack, it’s the specter of the max open files raising its ugly head at me again!” At first, I thought that maybe the setrlimit() call was failing on his machine for some reason, and sent him another test version that logged it. But no, it was working fine. This, of course, made me realize that I hadn’t totally thought things through the last time. 10240 file descriptors seems like a lot, but obviously it is a limit and I should have thought about that during the last release. All I can say to that is mea culpa — I had worked a month trying to figure it out and just wanted to get the fix out there. I’m sure some of y’all have been in that situation. But still: Bad programmer, bad!

I sat there for a while thinking through this. I realized this was a really bad thing from the get go. While I could raise the limit of open files, I really hadn’t thought what that might do to the system. The system has a max number of open files for the entire system — set at 12288. That’s only 2,048 files more than what I set. So, actually thinking that through, I realized that if someone had enough folders that Picture Switcher is watching, I could potentially greatly limit the rest of the system’s resources here. Doing things like that is something akin to using a sledge hammer to put a tack in the wall — way overkill, and potentially damaging. Not a nice thing to do. Bad programmer, again.

This time, I set down some ground rules for myself to follow. 1) Don’t raise my number of open files to the max, 2) figure out some way to know when I’m about to hit that number, and 3) bring back manual polling if I do go over that number. Just a bit better this time.

After doing a lot of searching, I found out it’s not trivial to know the number of open files you have. I then decided to download the source to lsof from the Darwin source code to see how it’s done. I have to admit, it was a bit beyond me. So I needed to figure something else out. Finally, I came up with a solution. It’s not 100% foolproof, but I did do a bit of coding to fix things up if it doesn’t work. Here’s my solution.

I raise the max file descriptors to the max per process divided by 10, plus 300. That puts me at 1340, by default. The added 300 is a bit of padding so that I have extra descriptors around when I need to open things, and for the system frameworks and plugins that are loaded. In my header file, I added an instance variable for maxFileDescriptor, then used this code in the init method of the class:

struct rlimit myLimit; int mib[2], maxOpenPerProc = 0; size_t inputSize = 0; mib[0] = CTL_KERN; mib[1] = KERN_MAXFILESPERPROC; inputSize = sizeof(maxOpenPerProc); sysctl(mib, 2, &maxOpenPerProc, &inputSize, NULL, 0); myLimit.rlim_cur = myLimit.rlim_max = (maxOpenPerProc / 10) + 300; setrlimit( RLIMIT_NOFILE, &myLimit ); maxFileDescriptor = getdtablesize(); maxFileDescriptor -= 300; if ( maxFileDescriptor < 0 )     maxFileDescriptor = 0;

In the method I call to add items to the kqueue, I test to see if maxFileDescriptor is a valid file descriptor. This is a sort of kludge, but it does work. It’s not 100% foolproof, as I mentioned before, because there’s a possibility that there were more than 300 system frameworks and plugins that got loaded, which don’t get a file descriptor and hence this test would report a false negative. This is a slight possibility, but I’m not taking chances this time.

if ( isatty( maxFileDescriptor) || errno == ENOTTY ) {     /* Code to add it to a manual poll method */ } else {     /* Add it to the kqueue */ }

What isatty() does is check to see if maxFileDescriptor is a TTY. This isn’t ever going to be the case with Picture Switcher, but isatty() will set the global errno to ENOTTY if the file descriptor is valid but not a TTY (otherwise, it sets errno to EBADF).

If the check reports a negative, but I can’t open the file descriptor, I do a bit of cleaning — I add it to the manual poll method, then I remove a few of the items from the kqueue and add those to the manual poll method as well to free up a few file descriptors.

So for any of y’all out there using kqueues, do keep an eye on your open files count (you can do this with lsof in Terminal, or with Activity Monitor). If you’re even coming close to the max open files you have set (256 by default, if you haven’t changed it), you may well want to do something similar in your code.

2 Responses to “Of Kqueues and Max Open Files”

Uli Kusterer Comments:

I’m kind of confused how you might be hitting that limit … you’re aware you can just watch a folder and get write notifications when files in it change, right? This sounds suspiciously like you’re watching each desktop picture separately. Can you tell us more about what you’re doing?


Darkshadow Comments:

Good question, Uli. I’m not watching the individual picture files - that would be suicide. I’ve had users tell me the amount of pictures they have. Some of them have well over 20,000 of ‘em. No, I strictly watch the folders, though you could possibly say I watch too many. I don’t just watch the immediate folder that the pictures are in, but their parent folders as well (starting with whatever is selected in the app to add) and then subfolders as well, going six deep (though the deeper ones are only watched if there are actually pictures in them). This means I am watching some empty folders, but I did this on purpose. If someone decided to add some pictures or a new folder into one of these, then I want to be able to pick that up as well. The reason this user hit against the max was because of an Aperture library. On top of however many folders were already being watched. Picture Switcher currently happily goes into bundles to look for pictures as well. This is something I’m going to make a preference of in my next release, with it off by default. But anyway, an Aperture library has a tortuous folder hierarchy, and it’s what put this user over the top.



Comments are closed.