Deconstructing the Scala Map Literal

November 12, 2009

I find that Scala is one giant Rube Goldberg Machine that manages to do something not easily be done otherwise. By this I mean that Scala has many features that, by themselves, seem very strange, but, in combination, enable some very cool functionality. This is why I initially started my personal tour of Scala. I read stuff like explicitly typed self-references and was left scratching my head.

I thought it might be fun to deconstruct the "map literal" in Scala and observie how the features interact to create a very handy piece of code that isn't baked into the language. This assumes and understanding of some Scala basics.

Although Java 7 is getting map literals, Scala already has it (or so it appears):

val band = Map("Dave" -> "Bass",
                "Tony" -> "Guitar",
                "Greg" -> "Drums")
This is not actually a literal, but enabled by Scala features to make it look like a literal. Which means that you can use these facilities to make your own literals. So, how does this work?

Most surprising to a Java programmer is the -> operator. This makes use of two Scala features:

It turns out that the -> operator is on the class Predef.ArrowAssoc. Predef is automatically imported in every Scala program, so you don't need to prefix anything with Predef. It returns a tuple of its caller and its argument, e.g.

val dave = new ArrowAssoc("Dave")
val entry = dave -> "Bass"
// entry is now ("Dave","Bass")
// which is a Tuple2[String,String]

Of course, we aren't creating ArrowAssoc instances anywhere, so how does this get called? This is where implicits come in. Suppose we change our simple example to:

val dave = "Dave"
val entry = dave -> "Bass"
// entry is still ("Dave","Bass")
// which is a Tuple2[String,String]
Here, Scala sees that the method -> needs to be called on an ArrowAssoc, but is being called on a String. Instead of giving up, Scala notices the method:
implicit def any2ArrowAssoc[A](x: A): 
  ArrowAssoc[A] = new ArrowAssoc(x)
This means that anything at all can be converted into an ArrowAssoc if there's some reason to. And we have a reason to here.

This means our code is now effectively:

val band = Map(("Dave" , "Bass"),
                ("Tony" , "Guitar"),
                ("Greg" , "Drums"))
It's not hard to imagine a Map constructor taking Tuple2, using the first part as the key and the second part as the value, however where is the constructor? Scala creates objects via the new keyword, just as Java does. So, what's going on here?

This use two additional Scala features:

  1. apply() shortcutting
  2. Scala singleton objects
This is much simpler to decode than the -> method; there is simply an object in scope named Map, and it has an apply method that takes a variable list of Tuple2 objects. Scala interprets a method-call syntax on an object, but lacking a method name, as a call to the apply method of that object (if it exists). So, removing this, we have:
val band = Map.apply(("Dave" , "Bass"),
                      ("Tony" , "Guitar"),
                      ("Greg" , "Drums"))

That's all there is to it! A few things to note about this:

  • Without the application of some Scala features, it's pretty ugly
  • The language itself didn't need to implement a special "map literal"; it simply combines smaller features in a way to make it appear as though it does. You can even create your own "literals" rather than waiting for the language to implement them

Why Github Can Open-Source Their Libraries

November 01, 2009

One thing I love about Github is that they open-source a lot of their internal tools that power the site. What's interesting is that, unlike SourceForge, they open source little bits and pieces; tiny libraries that do one specific thing. These things are supremely useful (I use Grit and Jekyll quite often).

This is a huge benefit to them; their products become higher-quality through contribution, and their talent-pool increases due to their contribution to the community; they are positioned as a technical leader and social force in the development community. I've often wondered why more companies don't do this and what's really involved?

There are three main hurdles to overcome in order to do this:

  • Usefulness - do you have code that someone will find useful?
  • Legalish - are you comfortable giving away some part of your company's intellectual property?
  • Technical - does your technical infrastructure support extraction, collaboration, and re-integration?

Usefulness

I think comparing Github to Sourceforge makes this point very clear. While SourceForge does more than, say, BERT, it's just a big huge all-or-nothing proposition; Github has extracted small parts of their infrastructure useful outside the realm of "software project hosting", resulting in many useful, fine-grained libraries.

Legalish

The issue here is essentially to determine if the gains achieved by open-sourcing some of your code are greater than the competitive advantage lost by doing so. Again, the ability to extract small, focused, and useful pieces of functionality is key. Github isn't open-sourcing their entire infrastructure; just the parts that are really useful and not incidental to their success (though one may argue that their IP has nothing to do with their success).

Technical

Here's where things get interesting. Extracting a small, useful piece of technology from your application, without revealing any trade secrets or other IP can be a challenging task. Add to that the infrastructure needed to manage and incorporate contributions, and your technical infrastructure could be the main barrier to interacting with the community (and not the lawyers!).

My company struggles with this daily, as our tools were just not designed to make extraction and management of dependencies easy. Our problem is almost entirely based on tooling and technology choice (which I'll get into in a further post).

Conclusion

The advantages to open-sourcing some of your code are obvious: you can improve the quality of your codebase while improving your standing in the community (which enahnces your ability to attract top-talent). You just need to make sure your technical infrastructure is built for it, and that the lines of communication between the development staff and the legal staff are clear and functional.

That Github routinely open-sources bits of their code speaks volumes to the choices they've made as an organization, as well as their technical prowess.

An anecdote about Joel Spolksy at DevDays DC

October 27, 2009

I had signed up to volunteer at DC's DevDays, however My Company decided to be a local sponsor. Part of that involed us having some time for a 15-minute presentation during the lunch break. We were told that other sponsors would be doing something similar. I worked up a brief presentation on how we improve our development process with each iteration. I figured it would be of interest and not be too (obviously) self-promoting.

We get there and find out that not only are none of the other sponsors doing this, but we can't go at lunch, because Joel very-much wanted these things to be "optional" and, The State Theater being near almost nowhere to go have lunch, everyone would be stuck there and "forced" to watch our presentation. So, we'd be going after the entire conference was over.

I have no problem with this at all; it's his/their conference after all. But, we probably wouldn't have bothered if we had known this ahead of time. Nevertheless, I tweeted to the #DevDays hashtag (showing up on the big screen between speakers) and looked forward to it anyway. As a member of a crappy local band, I've played some gigs to no-one, so it's no big deal.

For most of the day the end of the schedule looked like:

  • 4:15 - JQuery
  • 5:30 - Goodbye!
  • (unspecified showcase for OPOWER (formerly Positive Energy))

Toward the end, Joel came by, introduced himself, and got my info so he could mention it in his closing remarks. We discussed some logistics and he was very polite, but obviously had a lot going on.

While Richard Worth was finishing up his JQuery talk, I head to the back. The stagehand gets me set up and Joel is about to go deliver the "goodbye and thanks" speech. Here is my mostly accurate transcription of our brief discussion:

  • Joel: OK, I'm gonna go out, say goodbye, and then I'll play a song that goes for about 4 minutes. After that, you can head out and start
  • Me: No problem, can you or someone cue me when the song is almost over? [you couldn't hear much back there and I didn't know the song]
  • Joel: My experience running these things is that people will start leaving pretty soon, so once it looks like everyone's left, you can head out there
  • Me: Um, ok [mostly amused at his comment and a bit nervious]
  • Joel: [slighly chuckling] We try to have a strict separation of Editorial and Advertorial content at these
  • Me: [not quite parsing the word "Advertorial" and also thinking about the hour-long demo on FogBugz]
As I said, it was all good, and a few people DID stick around, but I found the whole thing rather amusing in retrospect. And I think he actually paid attention to about half of my talk!

Mocking Enumeration in Scala with EasyMock

October 21, 2009

I'm working on a Scala url shortener (to be hosted at my awesomely problem-causing ❺➠.ws domain). Since this is such a small application, I'm rolling it ground-up from the Servlet spec only, to get a feel for Scala without having to worry about a lot of dependencies. As such, one of the things I need to do is parse the requestor's "representation", i.e. determine if I'm serving JSON, XML, or HTML. Since this comes from the Accept: header, my tests will need to mock HttpServletRequest. ```scala val enumeration = createMock( classOf[java.util.Enumeration[String]]) expect(enumeration.hasMoreElements).andReturn(true) expect(enumeration.nextElement).andReturn("TEXT/HTML") expect(enumeration.hasMoreElements).andReturn(false) // the following call doesn't compile!!! expect(request.getHeaders(parser.ACCEPT_HEADER)) .andReturn(enumeration) expect(request.getParameter(parser.TYPE_PARAM)) .andReturn(null) replay(request) replay(enumeration) val t = parser.determineRepresentation(request) t should equal (Some("text/html")) ``` When I compile this test, I get the following baffling error message: ```scala TestRepresentationParser.scala:21: type mismatch; found : java.util.Enumeration[String] required: java.util.Enumeration[?0] where type ?0 EasyMock.expect(request.getHeaders(parser.ACCEPT_HEADER)) .andReturn(enumeration) ``` Um, OK? I tried zillions of ways to cast things, even creating my own implementation of Enumeration[String], to no avail. There seems to be some problem with the fact that HttpServletRequest returns a non-parameterized Enumeration in its interface, but Scala won't let me create such a thing. I had given up on testing this for a while, but eventually the simple solution prevailed: ```scala EasyMock.expect(request.getHeaders(parser.ACCEPT_HEADER)) // this call is obviously not type-checked, so it works expectLastCall.andReturn(enumeration) ``` Kinda cheesy, and I kinda feel stupid for not thinking of it sooner.

Moved my blog to Jekyll

October 17, 2009

Kinda wanted a change of pace, so I moved my website over to Jekyll, which has been fun to set up. Plus, I actually spent more than 10 seconds on a site design. I'm assuming my many hours of labour could've been done by a real designer.

At any rate, Jekyll seems reasonably easy to deal with. I got a very small taste of Capistrano as well; I push my site to a remote bare git repo on this server and then have the capfile update it. I suppose it could be more sophisticiated, but this seems to work for now.

My own personal tour of Scala

September 10, 2009

So, the Main Scala Website has a "tour" of the features of Scala. I liked the idea, but found a lot of the examples and descriptions a bit terse and uninspiring (some of them are downright confusing). However, I really wanted to learn about these features. So, I set about understanding each feature, trying to answer the question "What problem does this solve?". The results are here at www.naildrivin5.com/scalatour. This site was constructed using my homebrew wiki software, Halen (which I created to test out my Gliffy Ruby client).

It was a fun experience. As someone coming from Java application development (and who knows enough Ruby to feel some pain in Java), I tried hard to map the features to real-world problems a "blue-collar developer" might be facing. A lot of Scala enthusiasts seem to be functional programming nerds, and I know FP can turn a lot of people off. I think Scala is a great way to learn and appreciate functional programming without having to swallow a huge amount of info at once. So, I figured approaching Scala's features from a different angle would be useful.

I did have to supplement my learning with info from Odersky's awesome Programming Scala book, but most of what I learned, I learned by playing around with code; all the code on the site should compile and run (at least in Scala 2.7.x). The coolest thing was that by "touring" all of Scala's (often weird) features, I got a good feel for how they all fit together. The language feels like a very elaborate Rube-Goldberg Machine that ends up being rather elegant. I feel that instead of casting Scala as a functional language, or a hybrid language, I would say that Scala is "static typing done right (or as right as is possible)".

The way Scala allows for duck typing, for example, is really cool, and something I think Ruby code could benefit from. Stuff like type variance is pretty heavy stuff, but when you dig into, and understand how functions are implemented, it ends up making some sense. I even found a "real world" use for explictly-typed self-references, which I thought was actually a sick joke for a while :) Anyway, I hope that what I've learned doing this will be helpful to others.

Intro to Scala for Java Developers - slides

August 17, 2009

Thought I'd post the slides of a talk I gave at work on Scala. We're primarily a Java shop, and every week we do either a code review or a tech-related presentation.

Our domain at work is analyzing residential energy data, so the examples herein are tailored to that:

  • Read or Meter Read - Some amount of energy used over a period, e.g. "100kwh in the month of June"
  • Service Point - meta-data about an electric meter (the "point" at which "service" is available).

I also omitted a code demo where I refactored part of our codebase into Scala to show the difference (trust me, it was awesome!).

Simple Metrics for Team and Process Improvement

June 29, 2009

Recently, the development team where I work has started collecting bona-fide metrics, based on our ticketing system. So few development shops (especially small ones) collect real information on how they work that it's exciting that we're doing it.

Here's what we're doing:

  • Number of releases during QA (we do a daily release, so more than daily is an indicator)
  • Defects found, by severity and priority
  • Average time from accepting a ticket (starting work) to resolving it (sending it for testing)
  • Number of re-opens (i.e. a defect was sent to testing, but not fixed)
  • Average time from resolving to closing (i.e. testing the fix)
  • Defects due to coding errors vs. unclear requirements (this is really great to be able to collect; with our company so new and small, we can introduce this and use it without ruffling a lot of feathers)

The tricky thing about metrics is that they are not terribly meaningful by themselves; rather they indicate areas for focussed investigation. For example, if it takes an average of 1 day to resolve a ticket, but 3 days to test and close it, we don't just conclude that testing is inefficient; we have to investigate why. Perhaps we don't have enough testers. Perhaps our testing environment isn't stable enough. Perhaps there are too many show-stoppers that put the testers on the bench while developers are fixing them.

Another way to interpret these values is to watch them over time. If the number of critical defects is decreasing, it stands to reason we're doing a good job. If the number of re-opens is increasing, we are packing too much into one iteration and possibly not doing sufficient requirements analysis. We just started collecting these on the most recent iteration, so in the coming months, it will be pretty cool to see what happens.

These metrics are pretty basic, but it's great to be collecting them. The one thing that can make hard-core analysis of these numbers (esp. over time as the team grows and new projects are created) is the lack of normalization. If we introduced twice as many critical bugs this iteration than last, are we necessarily "doing worse"? What if the requirements were more complex, or the code required was just...bigger?

Normalizing factors like cyclomatic complexity, lines of code, etc, can shed some more light on these questions. These normalizing factors aren't always popular, but interpreted the right way, could be very informative. We're the same team, using the same language, working on the same product. If iteration 14 adds 400 lines of code, with 3 critical bugs, but iteration 15 adds 800 lines of code with 4 critical bugs, I think we can draw some real conclusions (i.e. we're getting better).

Another interesting bit of data would be to incorporate our weekly code review. We typically review fresh-but-not-too-fresh code, mostly for knowledge sharing and general "architectural consistency". If we were to actively review code in development, before it is sent to testing, we could then have real data on the effectiveness of our code reviews. Are we finding lots of coding errors at testing time? Maybe more code reviews would help? Are we finding fewer critical bugs in iteration 25, than in iteration 24 and 23, where we weren't doing reviews? Reviews helped a lot.

These are actually really simple things to do (especially with a small, cohesive team), and can shed real light on the development process. What else can be done?

Stand While You Work!

June 20, 2009

After experiencing some back troubles recently, I was encouraged to work standing up. The pain relief was immediate, and for the past several months, it's been great. I work most of the time standing, sitting for a few minutes if I get a bit tired. Not only is this great for my back, but it ensures I don't work insane hours...I simply can't stand for more than 8 hours a day. When I first brought the subject of standing up with my company's office manager, she was open to whatever I wanted to do; I figured since it's my issue to solve (and since I wasn't yet sold on the idea), I'd make do with something and bring it in.

While Joel Spolsky outfits his offices with super fancy motorized desks that can go from standing to sitting with the flick of a switch, those desks were way out of my price range. Further, fixed height desks were also quite expensive (much like the word "wedding", attaching the word "ergonominc" to something seems to double its price). Enter the Ikea Utby! The perfect size and perfect height, it looks great and was under $200!

Some might think it's a bit small, but I find the more space I have, the bigger mess I make. The Utby is, for me, the perfect amount of space. Though, it's so cheap, you could get two of them and make an awesome corner desk. I work from home on occasion and also work on side projects after work. Until recently I enjoyed the venerable (and, sadly, discontinued), Ikea Jerker. Last week, however, I was home recovering from back surgery, and was forbade by the doctor from sitting down. I had to use my own makeshift stand up desk out of a keyboard stand and ironing board. Pretty ghetto.

So, the Jerker is now in pieces and has been replaced by a second Utby at home. The sitting problem, both at home and at work is simple: a bar chair. I've got some plush comfy ones at home and bought a (reasonably) cheap Henriksdal for work. So, for less than $300, I have a nice looking desk at which I can stand or sit, and should have continued good back health. Even if you don't have back problems, I highly recommend standing; it keeps me alert and focused and feels great. You just have to make sure you have comfortable shoes.

Lead or Bleed

May 25, 2009

After reading all of The Passionate Programmer over a week or so, I'm going back through and looking at some of the "Act On It!" sections, where Chad Fowler recommends specific actions to kickstart/sustain/boost your career. The very first one, titled "Lead or Bleed?" suggests making a map of technologies, with "on the way out" on the left side and "bleeding edge" on the right side, then highlighting how well you know each thing. Here's my stab at it:

Technololgies: Lead or Bleed

Green are things are know really well; yellow are things I could do at a job but am by no means an expert.

Obviously this is shaped by my own reality and what I perceive on the 'net, and I omitted things like "C", "UNIX" and "Windows", because those are not really "on the way out" in the same way that C++ is (or that COBOL was, etc.).