Why Github Can Open-Source Their Libraries

November 01, 2009

One thing I love about Github is that they open-source a lot of their internal tools that power the site. What's interesting is that, unlike SourceForge, they open source little bits and pieces; tiny libraries that do one specific thing. These things are supremely useful (I use Grit and Jekyll quite often).

This is a huge benefit to them; their products become higher-quality through contribution, and their talent-pool increases due to their contribution to the community; they are positioned as a technical leader and social force in the development community. I've often wondered why more companies don't do this and what's really involved?

There are three main hurdles to overcome in order to do this:

  • Usefulness - do you have code that someone will find useful?
  • Legalish - are you comfortable giving away some part of your company's intellectual property?
  • Technical - does your technical infrastructure support extraction, collaboration, and re-integration?

Usefulness

I think comparing Github to Sourceforge makes this point very clear. While SourceForge does more than, say, BERT, it's just a big huge all-or-nothing proposition; Github has extracted small parts of their infrastructure useful outside the realm of "software project hosting", resulting in many useful, fine-grained libraries.

Legalish

The issue here is essentially to determine if the gains achieved by open-sourcing some of your code are greater than the competitive advantage lost by doing so. Again, the ability to extract small, focused, and useful pieces of functionality is key. Github isn't open-sourcing their entire infrastructure; just the parts that are really useful and not incidental to their success (though one may argue that their IP has nothing to do with their success).

Technical

Here's where things get interesting. Extracting a small, useful piece of technology from your application, without revealing any trade secrets or other IP can be a challenging task. Add to that the infrastructure needed to manage and incorporate contributions, and your technical infrastructure could be the main barrier to interacting with the community (and not the lawyers!).

My company struggles with this daily, as our tools were just not designed to make extraction and management of dependencies easy. Our problem is almost entirely based on tooling and technology choice (which I'll get into in a further post).

Conclusion

The advantages to open-sourcing some of your code are obvious: you can improve the quality of your codebase while improving your standing in the community (which enahnces your ability to attract top-talent). You just need to make sure your technical infrastructure is built for it, and that the lines of communication between the development staff and the legal staff are clear and functional.

That Github routinely open-sources bits of their code speaks volumes to the choices they've made as an organization, as well as their technical prowess.

An anecdote about Joel Spolksy at DevDays DC

October 27, 2009

I had signed up to volunteer at DC's DevDays, however My Company decided to be a local sponsor. Part of that involed us having some time for a 15-minute presentation during the lunch break. We were told that other sponsors would be doing something similar. I worked up a brief presentation on how we improve our development process with each iteration. I figured it would be of interest and not be too (obviously) self-promoting.

We get there and find out that not only are none of the other sponsors doing this, but we can't go at lunch, because Joel very-much wanted these things to be "optional" and, The State Theater being near almost nowhere to go have lunch, everyone would be stuck there and "forced" to watch our presentation. So, we'd be going after the entire conference was over.

I have no problem with this at all; it's his/their conference after all. But, we probably wouldn't have bothered if we had known this ahead of time. Nevertheless, I tweeted to the #DevDays hashtag (showing up on the big screen between speakers) and looked forward to it anyway. As a member of a crappy local band, I've played some gigs to no-one, so it's no big deal.

For most of the day the end of the schedule looked like:

  • 4:15 - JQuery
  • 5:30 - Goodbye!
  • (unspecified showcase for OPOWER (formerly Positive Energy))

Toward the end, Joel came by, introduced himself, and got my info so he could mention it in his closing remarks. We discussed some logistics and he was very polite, but obviously had a lot going on.

While Richard Worth was finishing up his JQuery talk, I head to the back. The stagehand gets me set up and Joel is about to go deliver the "goodbye and thanks" speech. Here is my mostly accurate transcription of our brief discussion:

  • Joel: OK, I'm gonna go out, say goodbye, and then I'll play a song that goes for about 4 minutes. After that, you can head out and start
  • Me: No problem, can you or someone cue me when the song is almost over? [you couldn't hear much back there and I didn't know the song]
  • Joel: My experience running these things is that people will start leaving pretty soon, so once it looks like everyone's left, you can head out there
  • Me: Um, ok [mostly amused at his comment and a bit nervious]
  • Joel: [slighly chuckling] We try to have a strict separation of Editorial and Advertorial content at these
  • Me: [not quite parsing the word "Advertorial" and also thinking about the hour-long demo on FogBugz]
As I said, it was all good, and a few people DID stick around, but I found the whole thing rather amusing in retrospect. And I think he actually paid attention to about half of my talk!

Mocking Enumeration in Scala with EasyMock

October 21, 2009

I'm working on a Scala url shortener (to be hosted at my awesomely problem-causing ❺➠.ws domain). Since this is such a small application, I'm rolling it ground-up from the Servlet spec only, to get a feel for Scala without having to worry about a lot of dependencies. As such, one of the things I need to do is parse the requestor's "representation", i.e. determine if I'm serving JSON, XML, or HTML. Since this comes from the Accept: header, my tests will need to mock HttpServletRequest. ```scala val enumeration = createMock( classOf[java.util.Enumeration[String]]) expect(enumeration.hasMoreElements).andReturn(true) expect(enumeration.nextElement).andReturn("TEXT/HTML") expect(enumeration.hasMoreElements).andReturn(false) // the following call doesn't compile!!! expect(request.getHeaders(parser.ACCEPT_HEADER)) .andReturn(enumeration) expect(request.getParameter(parser.TYPE_PARAM)) .andReturn(null) replay(request) replay(enumeration) val t = parser.determineRepresentation(request) t should equal (Some("text/html")) ``` When I compile this test, I get the following baffling error message: ```scala TestRepresentationParser.scala:21: type mismatch; found : java.util.Enumeration[String] required: java.util.Enumeration[?0] where type ?0 EasyMock.expect(request.getHeaders(parser.ACCEPT_HEADER)) .andReturn(enumeration) ``` Um, OK? I tried zillions of ways to cast things, even creating my own implementation of Enumeration[String], to no avail. There seems to be some problem with the fact that HttpServletRequest returns a non-parameterized Enumeration in its interface, but Scala won't let me create such a thing. I had given up on testing this for a while, but eventually the simple solution prevailed: ```scala EasyMock.expect(request.getHeaders(parser.ACCEPT_HEADER)) // this call is obviously not type-checked, so it works expectLastCall.andReturn(enumeration) ``` Kinda cheesy, and I kinda feel stupid for not thinking of it sooner.

Moved my blog to Jekyll

October 17, 2009

Kinda wanted a change of pace, so I moved my website over to Jekyll, which has been fun to set up. Plus, I actually spent more than 10 seconds on a site design. I'm assuming my many hours of labour could've been done by a real designer.

At any rate, Jekyll seems reasonably easy to deal with. I got a very small taste of Capistrano as well; I push my site to a remote bare git repo on this server and then have the capfile update it. I suppose it could be more sophisticiated, but this seems to work for now.

My own personal tour of Scala

September 10, 2009

So, the Main Scala Website has a "tour" of the features of Scala. I liked the idea, but found a lot of the examples and descriptions a bit terse and uninspiring (some of them are downright confusing). However, I really wanted to learn about these features. So, I set about understanding each feature, trying to answer the question "What problem does this solve?". The results are here at www.naildrivin5.com/scalatour. This site was constructed using my homebrew wiki software, Halen (which I created to test out my Gliffy Ruby client).

It was a fun experience. As someone coming from Java application development (and who knows enough Ruby to feel some pain in Java), I tried hard to map the features to real-world problems a "blue-collar developer" might be facing. A lot of Scala enthusiasts seem to be functional programming nerds, and I know FP can turn a lot of people off. I think Scala is a great way to learn and appreciate functional programming without having to swallow a huge amount of info at once. So, I figured approaching Scala's features from a different angle would be useful.

I did have to supplement my learning with info from Odersky's awesome Programming Scala book, but most of what I learned, I learned by playing around with code; all the code on the site should compile and run (at least in Scala 2.7.x). The coolest thing was that by "touring" all of Scala's (often weird) features, I got a good feel for how they all fit together. The language feels like a very elaborate Rube-Goldberg Machine that ends up being rather elegant. I feel that instead of casting Scala as a functional language, or a hybrid language, I would say that Scala is "static typing done right (or as right as is possible)".

The way Scala allows for duck typing, for example, is really cool, and something I think Ruby code could benefit from. Stuff like type variance is pretty heavy stuff, but when you dig into, and understand how functions are implemented, it ends up making some sense. I even found a "real world" use for explictly-typed self-references, which I thought was actually a sick joke for a while :) Anyway, I hope that what I've learned doing this will be helpful to others.

Intro to Scala for Java Developers - slides

August 17, 2009

Thought I'd post the slides of a talk I gave at work on Scala. We're primarily a Java shop, and every week we do either a code review or a tech-related presentation.

Our domain at work is analyzing residential energy data, so the examples herein are tailored to that:

  • Read or Meter Read - Some amount of energy used over a period, e.g. "100kwh in the month of June"
  • Service Point - meta-data about an electric meter (the "point" at which "service" is available).

I also omitted a code demo where I refactored part of our codebase into Scala to show the difference (trust me, it was awesome!).

Simple Metrics for Team and Process Improvement

June 29, 2009

Recently, the development team where I work has started collecting bona-fide metrics, based on our ticketing system. So few development shops (especially small ones) collect real information on how they work that it's exciting that we're doing it.

Here's what we're doing:

  • Number of releases during QA (we do a daily release, so more than daily is an indicator)
  • Defects found, by severity and priority
  • Average time from accepting a ticket (starting work) to resolving it (sending it for testing)
  • Number of re-opens (i.e. a defect was sent to testing, but not fixed)
  • Average time from resolving to closing (i.e. testing the fix)
  • Defects due to coding errors vs. unclear requirements (this is really great to be able to collect; with our company so new and small, we can introduce this and use it without ruffling a lot of feathers)

The tricky thing about metrics is that they are not terribly meaningful by themselves; rather they indicate areas for focussed investigation. For example, if it takes an average of 1 day to resolve a ticket, but 3 days to test and close it, we don't just conclude that testing is inefficient; we have to investigate why. Perhaps we don't have enough testers. Perhaps our testing environment isn't stable enough. Perhaps there are too many show-stoppers that put the testers on the bench while developers are fixing them.

Another way to interpret these values is to watch them over time. If the number of critical defects is decreasing, it stands to reason we're doing a good job. If the number of re-opens is increasing, we are packing too much into one iteration and possibly not doing sufficient requirements analysis. We just started collecting these on the most recent iteration, so in the coming months, it will be pretty cool to see what happens.

These metrics are pretty basic, but it's great to be collecting them. The one thing that can make hard-core analysis of these numbers (esp. over time as the team grows and new projects are created) is the lack of normalization. If we introduced twice as many critical bugs this iteration than last, are we necessarily "doing worse"? What if the requirements were more complex, or the code required was just...bigger?

Normalizing factors like cyclomatic complexity, lines of code, etc, can shed some more light on these questions. These normalizing factors aren't always popular, but interpreted the right way, could be very informative. We're the same team, using the same language, working on the same product. If iteration 14 adds 400 lines of code, with 3 critical bugs, but iteration 15 adds 800 lines of code with 4 critical bugs, I think we can draw some real conclusions (i.e. we're getting better).

Another interesting bit of data would be to incorporate our weekly code review. We typically review fresh-but-not-too-fresh code, mostly for knowledge sharing and general "architectural consistency". If we were to actively review code in development, before it is sent to testing, we could then have real data on the effectiveness of our code reviews. Are we finding lots of coding errors at testing time? Maybe more code reviews would help? Are we finding fewer critical bugs in iteration 25, than in iteration 24 and 23, where we weren't doing reviews? Reviews helped a lot.

These are actually really simple things to do (especially with a small, cohesive team), and can shed real light on the development process. What else can be done?

Stand While You Work!

June 20, 2009

After experiencing some back troubles recently, I was encouraged to work standing up. The pain relief was immediate, and for the past several months, it's been great. I work most of the time standing, sitting for a few minutes if I get a bit tired. Not only is this great for my back, but it ensures I don't work insane hours...I simply can't stand for more than 8 hours a day. When I first brought the subject of standing up with my company's office manager, she was open to whatever I wanted to do; I figured since it's my issue to solve (and since I wasn't yet sold on the idea), I'd make do with something and bring it in.

While Joel Spolsky outfits his offices with super fancy motorized desks that can go from standing to sitting with the flick of a switch, those desks were way out of my price range. Further, fixed height desks were also quite expensive (much like the word "wedding", attaching the word "ergonominc" to something seems to double its price). Enter the Ikea Utby! The perfect size and perfect height, it looks great and was under $200!

Some might think it's a bit small, but I find the more space I have, the bigger mess I make. The Utby is, for me, the perfect amount of space. Though, it's so cheap, you could get two of them and make an awesome corner desk. I work from home on occasion and also work on side projects after work. Until recently I enjoyed the venerable (and, sadly, discontinued), Ikea Jerker. Last week, however, I was home recovering from back surgery, and was forbade by the doctor from sitting down. I had to use my own makeshift stand up desk out of a keyboard stand and ironing board. Pretty ghetto.

So, the Jerker is now in pieces and has been replaced by a second Utby at home. The sitting problem, both at home and at work is simple: a bar chair. I've got some plush comfy ones at home and bought a (reasonably) cheap Henriksdal for work. So, for less than $300, I have a nice looking desk at which I can stand or sit, and should have continued good back health. Even if you don't have back problems, I highly recommend standing; it keeps me alert and focused and feels great. You just have to make sure you have comfortable shoes.

Lead or Bleed

May 25, 2009

After reading all of The Passionate Programmer over a week or so, I'm going back through and looking at some of the "Act On It!" sections, where Chad Fowler recommends specific actions to kickstart/sustain/boost your career. The very first one, titled "Lead or Bleed?" suggests making a map of technologies, with "on the way out" on the left side and "bleeding edge" on the right side, then highlighting how well you know each thing. Here's my stab at it:

Technololgies: Lead or Bleed

Green are things are know really well; yellow are things I could do at a job but am by no means an expert.

Obviously this is shaped by my own reality and what I perceive on the 'net, and I omitted things like "C", "UNIX" and "Windows", because those are not really "on the way out" in the same way that C++ is (or that COBOL was, etc.).

Learning Cucumber - With Dynamic types must come documentation

May 21, 2009

Finally pulled the trigger on Cucumber, which allows one to write human-readable "features" that are essentially acceptance test cases. You can then execute them by adding some glue code in a mish-mash of Ruby DSLs and verify functionality.

It took me quite a while to decide to get going on this because the examples and documentation that are available are extremely hypothetical and very hand-wavy. A lot of the information glosses over the fact that you are still writing code and you still need to know what the keywords are and what data you will be given. Arcane non-human-readable symbols are almost preferable when getting started, because you don't get distracted by English. This is why Applescript drives me insane.

At any rate, I found this page, which was pretty helpful. It shows testing a rails app using, among other things, webrat (another tool severely lacking in documentation but that is, nonetheless, pretty awesome).

I'm writing a basic wiki (for science) and so I thought a good feature would be "home page shows all pages in sorted order", so I wrote it up:

Feature: pages are sorted
 As anyone
 The pages should be listed on the home page in sorted order

 Scenario: Visit Homepage
   As anyone
   When I visit the homepage
   Then The list of pages should be sorted
Now, the webrat/cucumber integration provided by rails means that the "plain English" has to actually conform to a subset of phrasing and terminology or you have to write the steps yourself (the features are everything under "Scenario"). It's not hard to do that, and it's not hard to modify the default webrat steps, but it was a distraction intially.

Next up, you implement the steps and here is where the crazy intersection of Ruby DSLs really made things difficult. The first two steps were pretty easy ("anyone" didn't require any setup, and webrat successfully handled "When I visit the homepage"):

Then /The list of pages should be sorted/ do
  response.should # I have no clue wtf to do here

end
A puts response.class and a puts response.methods gave me no useful information. I eventually deduced that since Cucumber is a successor/add-on/something to RSpec, perhaps should comes from RSpec. This takes a Matcher and webrat provides many. Specifically, have_selector is available that allows selecting HTML elements based on the DOM.
Then /The list of pages should be sorted/ do
  response.should have_selector("ul.pages")
end
Success! (sort of). My feature executing is all green, meaning the home page contains <ul class="pages">. have_selector also takes a block (totally undocumented as to what it is or does in the webrat documentation):
Then /The list of pages should be sorted/ do
  response.should have_selector("ul.pages") do |pages|
    # WTF is pages and what can I do with it?

  end
end
A puts pages.class later and I realize this is a Nokogiri NodeSet. Now, I'm in business, though it would've been nice to be told WTF some of this stuff was and what I can/should do with it. At this point it was pretty trivial to select the page names from my HTML and check if they are sorted:
response.should have_selector('ul.pages') do |pages|
   page_names = []
   pages.should have_selector('li') do |li|
     li.should have_selector('.pagename') do |name|
       name.each do |one_name|
         page_names &lt;&gt; one_name.content
       end
     end
   end
   assert_equal page_names,page_names.sort
   true
 end
(For some reason assert_equal doesn't evaluate to true when it succeeds and the block evaluates to false and then Cucumber/RSpec/Webrat/Ruby Gods claim my page is missing the ul tag). My initial implementation walked the DOM using Nokogiri's API directly, because I didn't realize that should had been mixed in (on?) to the objects I was being given. I'm still not sure if using that is the intention, but it seemed a bit cleaner to me.

So, this took me a couple of hours, mostly because of a combination of dynamic typing and lack of documentation. I'm all for dynamic typing, and I totally realize that these are free tools and all that. I think if the Ruby community (and the dynamic typing community in general) wants to succeed and make a case that dynamic typing, DSLs, meta-programming and all this (admittedly awesome and powerful) stuff enhance productivity, there has to be documentation as to the types of user-facing objects.

Now, given Github's general awesomeness, I'm totally willing to fork a repo, beef up the rdoc and request a pull, however I'm not even sure whose RDoc I could update to make this clear. Just figuring out that the have_selector in response.should have_selector is part of webrat was nontrivial (I had to just guess that should was part of RSpec and that the Webrat::Matchers module was mixed in). This is a problem and it's not clear to me how to solve it.

That being said, I was then able to create three more features using this system in about 10 minutes, so overall, I'm really happy with how things are working. Certainly if this were Java, I'd still be feeding XML to maven or futzing with missing semicolons. So, it's a net win for me.