|August 27th, 2011|
Rewind to 2006, I was just winding down active development of Vision, my OpenGL Window Sever/UI Framework. I had started work on Vision in college 3 years earlier and had been churning on it full-time for the previous 2 years. I had decided that it was finally time to get a job and so I interviewed around and accepted a position at Apple. I had two weeks until my start date and I wanted to do some programming for fun that was completely different from what I had been doing.
It was during that two week period of not yet working for Apple that Wolf Rentzsch started the (now defunct) Iron Coder contest. The way it worked was that the organizer announces an API that each of the contestants must use somehow in their entry and then 24 hours later a theme is announced that entries must also somehow incorporate. I thought it was just what I needed. A fun, small-scoped project with a little bit of competition. So the day of the very first Iron Coder arrived and the contest API was announced: The Accessibility API. Accessibility API? What’s that? Until that moment I had not been aware of it but it was actually just what I had been looking for to solve a different problem I had. I started researching it and immediately there were portions of it that were very interesting to me. Specifically, the ability for programs to inspect and copy data (like displayed text) out of other running applications was of particular interest to me.
Oh yea, I get it, uh huh…
You see, I had a problem. Ever since I returned to the states after living in Japan I wanted a way to keep up my reading skills on the computer. Putting my computer into Japanese language mode and surfing Japanese web sites seemed to be a great way for me to do this. There was just one little problem with that. That problem was 漢字 (kanji). I had minored in Japanese in college and at my peak I was fairly proficient in about 1000 of the approximately 2000 kanji considered to be an adult reading level. In most languages you can just read the written script and even if you don’t know the word you can still pronounce it and perhaps gather some meaning from the way it sounds, word roots and the larger context of the sentence. But with Chinese and Japanese if you come across a kanji character you don’t know, you are pretty much dead in the water. There may be a chance that you could deconstruct it into it’s individual radicals and infer a vague meaning but it is not likely unless you are a kanji scholar and inferring pronunciation is even harder. So what you end up doing is:
- Find the predominant radical in the character
- Count the number of strokes in the predominant radical
- Break out your kanji dictionary
- Look up the radical by stroke number
- Look the actual character up by the total number of strokes in the character
- Hope that the specific combination of that character and the ones around it are listed in the dictionary (depending on what characters are proximate to the character in question, they can affect it’s pronunciation)
By then, you’ve lost any momentum and context you had in reading and you kind of have to back up a little to get your flow back. I always thought that it was very silly that despite the fact that all the character information was present in the computer already, there didn’t appear to be any way to get at it in general situations. I just wanted to read my Japanese web pages, hover my mouse above any character I didn’t know and see the furigana (small sub-scripted hiragana) for the kanji and maybe even a meaning lookup. I had searched for some tool that would provide this functionality on the Mac but my efforts had always come up empty. If only there were some way to grab the text from UI elements in the operating system! Fast forward back to Iron Coder in 2006. I now had become aware of the missing piece in my kanji reading puzzle.
Prototype Is Prototypical
I played around with some of Apple’s sample code for the Accessibility API and quickly found that I could grab the text from any application’s native AppKit UI elements! This was exactly what I wanted. In an afternoon I had rigged up an app that would grab the text from any Cocoa control that I hovered my mouse over, send it to Jim Breen’s WWJDIC for kanji lookups and then present the results in a WebView. All in all I was pretty pleased with the result. It was simple, slick and very useful. I couldn’t wait for the Iron Coder theme to be announced the next day. When it was however, I was a bit dismayed. Mardi Gras!? What was I supposed do with that? I couldn’t think of any reasonable way that I could construe what I had done into being related to Mardi Gras (In retrospect I should have just entered anyway. Oh well). I wasn’t too dismayed though. I now had the tool I had been looking so long for.
I now had this rough prototype of a tool that I found extremely useful. I thought that there were probably other people who shared my problem. I decided to broaden my experience base and productize my prototype while I could still grandfather-in outside projects before starting my job. I had written a lot of code in my life up until that point but nothing that had ever been released to the public for sale. There was a whole slew of important skillsets and experiences that I didn’t have like:
- Payment Processing
- Customer Support
- Software Updates
- Copy Protection/DRM
So I decided to take them. I started developing Language Aid in my free time and amazingly enough I was able to release version 1.0 about 3 months after I had written the first line of code for the prototype. Looking back on it I am not sure how I did it. It was a complete product from end-to-end brought from concept to first dollar of income in just 3 months. Although it never sold especially well (I didn’t really promote it) and it probably made me less $2000 over it’s 5 year lifetime, the knowledge and experience of figuring out how to turn code into money was priceless. The most fame it ever attained was being published on the MacPeople DVD and August 2006 magazine issue. Now that over 5 years has passed since it’s initial release and sales have trickled down to almost nothing I have decided that it is probably time to officially retire Language Aid and release the source code. I have been hesitant to do so because honestly, the code is really hacky and ugly (it has experienced some code rot). I have learned so much since then, found much smoother ways to do things and just become a better coder but in the end I just decided that I didn’t care if it was ugly code released it anyway.
Some Interesting Development Points:
Versions 1.0 and 1.0.1 of Language Aid were simply quick front ends to Jim Breen’s WWJDIC and as such the app was really only useful for other people in my position who were looking to keep up their Japanese. It was obvious that Language Aid’s text grabbing and lookup capabilities could be applied to other languages for things like translation, Wikipedia lookups and the like. In version 1.1 I introduced plugin modules. Now you could have Language Aid funnel the text that you grabbed into any plugin module of your choosing for any sort of processing that you wanted. I wrote lookup modules for Google Search, Google Translate, IMDB, MDBG, Wikipedia and of course the WWJDIC. I had a lot of experience with loading and unloading code at runtime from my work on Vision and so writing a plugin system went pretty quickly. I even released a developer SDK for anyone who wanted to write their own plugins (no one to my knowledge has done so yet). I also enforced a code signing policy that meant that plugins had to be explicitly signed by Aoren Software in order to run without the user’s consent. This would prevent some malware author from writing and distributing a plugin that would run in Language Aid by default. By far the most useful plugin was the one that interfaced with Google Translate. Especially after they added a “Detect Language” option for the incoming text language, it made surfing foreign language web pages a breeze. See some text in a language you don’t know? One mouse hover and keystroke and now you know exactly what that says.
Reverse Engineering The Dictionary.app Lookup Port
When I started hacking around with the Accessibility API I found that I could basically grab two different kinds of text ranges out from under the mouse cursor. I could grab the full chunk of text in the element or I could grab the currently selected text which could potentially span multiple elements. However, I found that there were some cases in which it would be useful to grab the exact, whitespace denoted word that my cursor was hovering over. Unfortunately, there was no way to figure out what exact word the cursor was over from the Accessibility API or any other one for that matter. I was about to leave it at that but I then noticed that the Mac OS X Dictionary.app lookup WAS in fact able to grab the individual word that the cursor was hovering over. This irked me to no end. Disassembling the Dictionary.app binary showed that it was indeed using the Accessibility API to grab text but there was something else afoot. I attached with
gdb and watched things very closely as things executed (PPC gets my vote as the most readable assembly language. I can simply watch values move and flow through routines but with x86 I need a pad of paper so I can visualize the stack). After an intense reverse engineering session I had figured out that there was a Mach port for AppKit processes named “
com.apple.DictionaryServiceComponent-(PID)” that the Dictionary.app was communicating with. It appeared like this port was open for pretty much every process that was linked with AppKit. I found that I could also send interesting values to it and get stuff back like a
202 would return the attributed string currently being hovered over,
201 would return the individual word boundaries and finally
200 would return the individual word. I finally had a way to find out the individual word that the cursor was hovered over and it worked wonderfully! …in Tiger. When Mac OS X Leopard was released the feature appeared to no longer be working. After a little examination it appeared that other processes now had some sort of handshaking that had to occur before it would respond with results and so for Leopard and forward I abandoned the functionality of grabbing the individual word.
From a technological standpoint this is probably what came out of Language Aid that I am most proud of. I have always had a very strong hacker streak in me and although I have cracked open some pretty interesting stuff (including one AAA title for the Mac released in 2005 which I never distributed the crack for) I am most definitely “white hat“. I really love the cat and mouse between hackers and security researchers. Before releasing Language Aid 1.0 I took about a week and basically went through this thought process:
- “If I were trying to crack this program, what would I do?“
- “Ok, how would I defend against that?“
- “So how would I now get around that protection?“
- Go To Step 2
This was my idea of fun and after about a week or two of coding up protection after counter-measure after defense I had things locked down way more tight than any $20 program would ever need. Perhaps in the future I will write a blog post going into detail about what I did but for now those things must remain secret. Language Aid shipped with all of its DRM intertwined but after a couple of update releases I decided to rip out all of the DRM code and turn it into its own product called Deadlock. Obviously it didn’t get hacked on too much because Language Aid didn’t get huge circulation but I did perform searches for Language Aid cracks occasionally and never found anything.
“This is UNIX, I know this!” – Jurassic Park
To respond to processed payments and to register Language Aid for use on the computers of new users I decided make a C-based UNIX daemon instead of going the more popular and easier maintained routine of a PHP page or the like. I wanted to know what kind of stuff it took to actually keep a server process up and running long term. I remembered the old pattern from school of
fork() and writing a
SIGCHLD handler. The program I wrote would first establish a connection to the payments database on launch and then block on
accept(). When a connection got made I would
fork() and service the connection which was usually a new user asking my daemon to return a cryptographic signature of a hash of their machine identifier. After doing so it would close the connection and
exit(). Using this standard UNIX daemon model made it so that I could leave the process up and running for years without the worry of memory leaks or the fear that a bug/malicious client might take the entire thing down.
Now that Language Aid has been open sourced it hopefully has a new chapter open ahead of it. I can’t devote too much time for it because of other projects but if someone is willing to resurrect it and give it new life let me know and I will help make it happen.