Learning Cucumber - With Dynamic types must come documentation

May 21, 2009

Finally pulled the trigger on Cucumber, which allows one to write human-readable "features" that are essentially acceptance test cases. You can then execute them by adding some glue code in a mish-mash of Ruby DSLs and verify functionality.

It took me quite a while to decide to get going on this because the examples and documentation that are available are extremely hypothetical and very hand-wavy. A lot of the information glosses over the fact that you are still writing code and you still need to know what the keywords are and what data you will be given. Arcane non-human-readable symbols are almost preferable when getting started, because you don't get distracted by English. This is why Applescript drives me insane.

At any rate, I found this page, which was pretty helpful. It shows testing a rails app using, among other things, webrat (another tool severely lacking in documentation but that is, nonetheless, pretty awesome).

I'm writing a basic wiki (for science) and so I thought a good feature would be "home page shows all pages in sorted order", so I wrote it up:

Feature: pages are sorted
 As anyone
 The pages should be listed on the home page in sorted order

 Scenario: Visit Homepage
   As anyone
   When I visit the homepage
   Then The list of pages should be sorted
Now, the webrat/cucumber integration provided by rails means that the "plain English" has to actually conform to a subset of phrasing and terminology or you have to write the steps yourself (the features are everything under "Scenario"). It's not hard to do that, and it's not hard to modify the default webrat steps, but it was a distraction intially.

Next up, you implement the steps and here is where the crazy intersection of Ruby DSLs really made things difficult. The first two steps were pretty easy ("anyone" didn't require any setup, and webrat successfully handled "When I visit the homepage"):

Then /The list of pages should be sorted/ do
  response.should # I have no clue wtf to do here

end
A puts response.class and a puts response.methods gave me no useful information. I eventually deduced that since Cucumber is a successor/add-on/something to RSpec, perhaps should comes from RSpec. This takes a Matcher and webrat provides many. Specifically, have_selector is available that allows selecting HTML elements based on the DOM.
Then /The list of pages should be sorted/ do
  response.should have_selector("ul.pages")
end
Success! (sort of). My feature executing is all green, meaning the home page contains <ul class="pages">. have_selector also takes a block (totally undocumented as to what it is or does in the webrat documentation):
Then /The list of pages should be sorted/ do
  response.should have_selector("ul.pages") do |pages|
    # WTF is pages and what can I do with it?

  end
end
A puts pages.class later and I realize this is a Nokogiri NodeSet. Now, I'm in business, though it would've been nice to be told WTF some of this stuff was and what I can/should do with it. At this point it was pretty trivial to select the page names from my HTML and check if they are sorted:
response.should have_selector('ul.pages') do |pages|
   page_names = []
   pages.should have_selector('li') do |li|
     li.should have_selector('.pagename') do |name|
       name.each do |one_name|
         page_names &lt;&gt; one_name.content
       end
     end
   end
   assert_equal page_names,page_names.sort
   true
 end
(For some reason assert_equal doesn't evaluate to true when it succeeds and the block evaluates to false and then Cucumber/RSpec/Webrat/Ruby Gods claim my page is missing the ul tag). My initial implementation walked the DOM using Nokogiri's API directly, because I didn't realize that should had been mixed in (on?) to the objects I was being given. I'm still not sure if using that is the intention, but it seemed a bit cleaner to me.

So, this took me a couple of hours, mostly because of a combination of dynamic typing and lack of documentation. I'm all for dynamic typing, and I totally realize that these are free tools and all that. I think if the Ruby community (and the dynamic typing community in general) wants to succeed and make a case that dynamic typing, DSLs, meta-programming and all this (admittedly awesome and powerful) stuff enhance productivity, there has to be documentation as to the types of user-facing objects.

Now, given Github's general awesomeness, I'm totally willing to fork a repo, beef up the rdoc and request a pull, however I'm not even sure whose RDoc I could update to make this clear. Just figuring out that the have_selector in response.should have_selector is part of webrat was nontrivial (I had to just guess that should was part of RSpec and that the Webrat::Matchers module was mixed in). This is a problem and it's not clear to me how to solve it.

That being said, I was then able to create three more features using this system in about 10 minutes, so overall, I'm really happy with how things are working. Certainly if this were Java, I'd still be feeding XML to maven or futzing with missing semicolons. So, it's a net win for me.

Why maven drives me absolutely batty

May 13, 2009

Although my maven bitching has been mostly snarky, I have come to truly believe it is the wrong tool for a growing enterprise and, like centralized version control, will lead to a situation where tools dictate process (and design).

But, what is maven actually good at?

  • Maven is great for getting started -- you don't have author an ant file (or copy one from an existing project)
  • Maven is great for enforcing a standard project structure -- if you always use maven, your projects always look the same
This is about where it ends for me; everything else maven does - manage dependencies, automated process, etc., is done much better and much more quickly by other technology. It's pretty amazing that someone can make a tool worse than ant, but maven is surely it

Dependency management is not a build step

Maven is the equivalent of doing a sudo gem update everytime you call rake, or doing a sudo yum update before running make. That's just insane. While automated dependency management is a key feature of a sophisticated development process, this is a separate process from developing my application.

Maven's configuration is incredibly verbose

It requires 36 lines of human-readable XML to have my webapp run during integration tests. Thirty Six! It requires six lines just to state a dependency. Examining a maven file and tying to figure out where you are in its insane hierarchy is quite difficult. It's been pretty well-established outside the Java community that XML is horrible configuration file format; formats like YAML have a higher signal to noise ration, and using (gasp) actual scripting language code can be even more compact (and readable and maintainable).

The jars you use are at the mercy of Maven

If you want to use a third-party library, and maven doesn't provide it (or doesn't provide the version you need), you have to set up your own maven repo. You then have to include that repo in your pom file, or in every single developer's local maven settings. If you secure your repo? More XML configuration (and, until the most recent version, you had to have your password in cleartext...in a version 2 application). The fallout here is that you will tend to stick with the versions available publicly, and we see how well that worked out for Debian.

Modifying default behavior is very difficult

Since maven is essentially a very, very high-level abstraction, you are the mercy of the plugin developers as to what you can do. For example, it is not possible to run your integration tests through Cobertura. The plugin developers didn't provide this and there's no way to do it without some major hacking of your test code organization and pom file. This is bending your process to fit a tool's shortcoming. This is limitation designed into maven. This is fundamentally different that "opinionated software" like Rails; Rails doesn't punish you so harshly for wanting to tweak things; maven makes it very difficult (or impossible). There was no thought given in Maven's design to using non-default behavior.

Extending Maven requires learning a plugin API

While you can throw in random Ant code into maven, the only way to create re-usable functionality is to learn a complex plugin API. Granted, this isn't complex like J2EE is complex, but for scripting a build, it's rather ludicrous.

Maven is hard to understand

I would be willing to bet that every one of my gripes is addressed through some crazy incantation. But that's not good enough. The combined experience of the 7 developers at my company is about 70 years and not one of us can explain maven's phases, identify the available targets, or successfully add new functionality for a pom without at least an hour on the net and maven's documentation.

A great example is the release plugin. All five developers here that have used it go through the same cycle of having no idea what it's doing, having it fail with a baffling error message, starting over and finally figuring out the one environment tweak that makes it work. At the end of this journey each one (myself included) has realized all this is a HUGE wrapper around scp and a few svn commands. Running two commands to do a source code tag and artifact copy shouldn't be this difficult.

Maven's command line output is a straight-up lie

[INFO] ------------------------------------------------------------------------
[ERROR] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Compilation failure
"Compilation failure", but it's own definition is a failure and therefore an error (not an informational message). Further, most build failures do not exit with nonzero. This makes maven completely unscriptable.

Maven doesn't solve the problems of make

Ant's whole reason for being is "tabs are evil", and that tells you something. While maven's description of itself is a complete fabrication, it at least has its heart in the right place. However, it STILL fails to solve make's shortcomings wrt to java:

  • Maven doesn't recompile the java classes that are truly out-of-date
  • Maven recompiles java classes that are not out-of-date
  • Maven doesn't allow for sophisiticated behavior through scripting
  • Maven replaces arcane magic symbols with arcane magic nested XML (e.g. pom files aren't more readable than a Makefile)

Maven is slow

My test/debug cycle is around a minute. It should be 5 seconds (and it shouldn't require an IDE).

Conclusion

Apache's Ivy + Ant is probably a better environment than maven for getting things done; a bit of up-front work is required, but it's not an ongoing cost, and maintenance is much simpler and more straightforward. Tools like Buildr and Raven seem promising, but it might be like discussing the best braking system for a horse-drawn carriage; utterly futile and irrelevant.

Git Workflow with SVN

April 28, 2009

The best way to get started with Git and have a better experience at work if you have to use SVN is to use git svn as a client to Subversion. You can take advantage of Git's awesomeness while not requiring your team or infrastructure to change immediately.

Setup

git svn clone -t tags -T trunk -b branches svn+ssh://your.svn.com/path/to/svn/root (This may take a while for a large or old svn repo)

Working on Trunk

The initial clone should leave you on git's master branch, which is connected to svn's trunk.
  1. git svn rebase # Optional: only if you want to get work from svn; you don't have to
  2. Hack some code
  3. git add any new files you created.txt
  4. git commit -a
  5. Repeat from step 2 until done

Sharing Your Changes

You will rebase your changes against's SVN's (this means git will pretend you made all your changes from SVN's current HEAD, not the HEAD you started with [you do this to avoid conflicts and merges, which SVN cannot handle]).
  1. git svn rebase
  2. git svn dcommit

If you got Conflicts

  1. Git will tell you about them, so go and resolve them
  2. For each file you had to resolve, git add the_filename
  3. git rebase --continue
  4. Repeat until done

Working with SVN's branches

Suppose you need to do some work on a branch called 1.3.x in your SVN repo:
  1. git svn fetch # This updates your local copy of remote branches
  2. git checkout 1.3.x# This checks out a remote branch, which you shouldn't work directly on
  3. git checkout -b 1.3.x-branch # This creates a local branch you can work on, based on the remote 1.3.x branch
  4. Hack some code
  5. git add and git commit -a as needed
  6. Follow same procedure as above for Sharing Your Changes. Git will send your changes to the 1.3.x branch in SVN and not the trunk

Merging the Changes You Made

Due to the way git interacts with SVN, you shouldn't automatically just merge your branch work onto the trunk. This may create strange histories in SVN.

So What?

So, this isn't buying you much more than you get with SVN. Yes, when you git checkout 1.3.x-branch it's lightning fast, and you can work offline. Here's a few things that happen to me all the time that would be difficult or impossible to do without Git.

Gotta Fix a Bug Real Quicklike

You are in the middle of working on a new feature and you need to to push out a bugfix in production code. Your in-development code can't be checked into trunk:
  1. git stash
  2. git checkout production-branch-name
  3. git checkout -b bugfix-xyz
  4. Fix bugs
  5. git commit -a
  6. git svn dcommit
  7. git checkout master
  8. git stash apply
You are now back where you started, without a fake revision just to hold your code and you didn't have to go checkout the branch elsewhere.

Can't commit to SVN due to a release

Often, teams restrict commit access to SVN while a release is being prepared. If the team is releasing version 1.5 and I'm working on 1.6 features, there can be some period of time where I'm not supposed to commit, because the 1.5 release is being prepared and under feature freeze.
  1. git commit -a
  2. Continuing working
When feature freeze is over, then I'll git svn dcommit to send my changes to the SVN server

Blocked on Feature X, Want to work on Feature Y

This happens to me quite frequently: I'm slated to work on a few features that aren't interdependent. I start hacking away on Feature X and hit a roadblock and can't continue working. I've got a half-implemented feature and I can't make any forward motion until a meeting next week. Feature Y, on the other hand, is ready to go. This requires some planning ahead:
  1. git checkout master
  2. git checkout -b feature-X
  3. Work on Feature X
  4. git commit -a etc. as I work
  5. Get blocked; meeting next week. D'oh!
  6. git checkout master
  7. git checkout -b feature-Y
  8. Work on Feature Y
At this point, X and Y are on two local branches and I can switch back and forth as needed. Don't underestimate how powerful this is, especially when you have certain features that are priorities, but can become blocked frequently. I can now easily put aside Feature Y once I have my meeting and start back up on Feature X. When I'm done, I git merge everything back to master and dcommit to SVN.

Type your log message, save it, realize you forgot to reference a bug ticket #

You have a bug tracker set up that links tickets and revisions; all you have to do is put the ticket # in your log message. It's a nice feature, but I forget to do it frequently. As long as you haven't done git svn dcommit, you can fix this:
  1. git commit --amend
Your editor will pop up and you can change the log message! Awesome.

Advanced Stuff

Once you get used to this, you will feel more comfortable doing some more advanced things.

Topic Branches

The most obviously beneficial was touched on above, but it boils down to: make every new feature on its own branch. This means you never work on master and you never work on an SVN branch. Those are only for assembling what you will send to SVN. This gives incredible flexibility to work on code when its convenient and not worry about checking in bad things. Git calls this topic branches.

Save your Experiments

If you do everything on a branch, you don't have to delete your work, ever. You can go back and revisit experiments, or work on low-priority features over a long period of time with all the advantages of version control, but without the baggage of remote branches you have to share with the world.

Cherry Pick

With Git, you typically commit frequently and you restrict the scope of each revision. A commit in git is more like a checkpoint, and a push in Git is more like a commit in SVN. So, commit in git like crazy. What this lets you do is move diffs around. On several occasions, I've had some code on a branch that I needed to use, but didn't want to pull in the entire branch. git cherry-pick lets me do that.

Mindlessly Backup Your Repo

  1. ssh your_user@some.other.box.com
  2. mkdir awesome_project
  3. cd awesome_project
  4. git init
  5. exit
  6. git remote add other-box your_user@some.other.box.com:/home/chrisb/awesome_project
  7. git push --all other-box
  8. echo "git push --force --all other-box" > .git/hooks/post-commit && chmod +x .git/hooks/post-commit
You now will back up your repository on every commit to the other box. Or, use GitHub!

REST Compliance Officer

March 17, 2009

With regard to this blog on REST compliance

Me: The Gliffy API is RESTFul
REST Compliance Officer: Does a "PUT" update the data at the given URL?
Me: Yes.
RCO: Trick Question! It's "URI". Is the only way to create a new resource done with a "POST"?
Me: Yes.
RCO: Is there exactly one endpoint, from which any and all resource locators are discoverable?
Me: Um, no, that puts undue burden on the client libraries, and over-complicates what we were trying to accomp....
RCO: YOU ARE NOT RESTFUL! READ FIELDING'S DISSERTATION, THE HTTP SPEC AND IMPLEMENT AN RFC-COMPLIANT URI PARSER IN THREE DIFFERENT LANGUAGES. NEXT!

Thank GODS that REST doesn't have a spec. If it did, it would still be in development.


P.S. If you are going to coin a term and you want to bitch about it being misused, maybe calling it a "style" isn't the best idea.

Java Annotations - Java's love of configuration over convention

March 11, 2009

In the beginning, EJB was a bloated mess of XML configuration files that allowed some sort of ultimate flexibility that absolutely no one needed nor cared about. And it sucked. So developers started using conventions to keep track of the four classes required to make a remote method call, and XDoclet was created to automate the creation of the XML configuration files. And it sucked less. Following in EJB's footsteps, Hibernate did the same thing. And XDoclet followed. And it still sucked.

So, annotations were created to essentially formalize what XDoclet was doing, instead of considering how horribly broken the implementation of J2EE or Hibernate was. And now that we have annotations, the "implementation pattern" of "ultimate flexibility through annotations" has made its way into numerous Java frameworks, such as JAX-RS and JPA.

Regarding JPA:

@Id
@GeneratedValue
@Column(name="person_id")
public int getPersonId() { return personId; }
This is not a significant improvement over XDoclet; the only benefit is if you mistype "GeneratedValue", the compiler will catch it. I shouldn't have to type "GeneratedValue" in the first place. Unless I'm doing something non-standard. Which I almost never do.

I have a Person class with a getPersonId method. Can JPA just assume that it maps to the PERSON table, and the PERSON_ID, respectively. Further, couldn't it figure out that it's the auto-generated primary key since the schema says primary key auto increment. All the information is there and available to the framework to figure this out.

The same goes for EJB. I have a class named FooStatelessBean. How about we assume it's a stateless session bean, and it's interface is defined by its public methods? It can then provide FooRemote and FooLocal for me, and I don't need to configure anything or keep three classes in sync.

Just because Java doesn't have all the Ruby dynamic magic doesn't mean we can't make things easy. In reading Surya Suravarapu’s blog post about CRUD via JAX-RS I can't help wondering why it takes so much code to call a few methods on a class?

Did the designers of JAX-RS not even look at how Rails does things? I get a PUT to the url /customers/45. We should default to calling put(45) on the class CustomersResource. Only if I want to obfuscate what's going (e.g. by having FooBar.frobnosticate() handle the request) should I be required to provide configuration.

Even in Surya's example code, he's following some conventions: His resource class is suffixed with Resource and his POST method is prefixed add. This should be baked into the spec. It's like EJB all over again with the common conventions that aren't supported by the framework because of too much useless flexibilty.

Supporting convention over configuration is easy in Java. In just a few hours, I had a tiny web framework that proves it1. It wouldn't take much more effort to allow the default behavior to be overridden, but, unlike JAX-RS, EJB, or even the Servlet spec itself, it doesn't punish developers who follow conventions. It makes their lives easier and thus encourages good behavior.

So, the point of all this is that annotations encourage bad framework design; unnecessary configuration is a major part of many Java frameworks and specs. And I have no idea why.


1it unfortunately breaks down at the UI layer, due to a statically typed and compiled language not being a great choice for implementing web UIs, but that's another issue.

Git, GitHub, forking: the new hotness

February 05, 2009

While working on my Gliffy Ruby Client, I decided I wanted a better way to describe the command line interface. Finding nothing that was any good, I whipped up GLI and refactored my Giffy command line client to use it. While doing that, I finally got annoyed at technoweenie's version of rest-client, and also noticed that the original author's had totally changed interfaces. So, clicked the nice "Fork" button on GitHub to get my own copy and fixed the issues. But that's not the cool part. The cool part is that I can change my Gliffy gem to depend on my rest-client implementation and, viola! No special instructions, no hacks, no nothing. This is a really cool thing that would be difficult with Subversion, impossible without RubyGems, and downright painful without GitHub.

Execute on your ideas now; forget secrecy, forget tweaking

January 22, 2009

A couple interesting things happened yesterday. I attended my company's annual meeting and watched the season premiere of Lost. At my company's annual meeting, we went over lots of exciting things, but there was some concern over our use of Google Apps for our email. Mainly, that they could glean our IP from reading our email and, should they choose to enter our market, gain an unfair advantage. Meanwhile on Lost, the writers actually gave us some insight into the time-travel elements of the show, describing several aspects of time travel that are not typically used in your average time-travel story. So, what have these two things to do with each other? I'd been noodling with a short story centered around time travel, and the type of time travel I was going to explore is very similar to what was described on Lost. Close enough that my story would come off as a bit less original than it would have 3 months ago. Even if my idea isn't that original (which ones really are?) it's a bit frustrating to see your idea developed (and deployed) by someone else independently. So, again, what have these to do with each other? They demonstrate the reality of (and difference between) coming up with an idea and actually doing something with it. Essentially, and idea, in and of itself, is not particularly valuable. It's what you do with it that really counts. If Google were to steal my company's IP by sniffing our email, I doubt it would have much effect on our ultimate success. Outside of stealing our code or data outright, our idea isn't something that's hard to come up with. We just happened to come up with it and execute on it first. Anyone getting into the game now is necessarily behind us. Could someone lap us? Certainly. Is their ability to do so in any way dependent on know our secret ideas? I seriously doubt it. So, sitting on ideas is a waste of time. Trying to hide an idea either for security or for fear of "unleashing" it in an underdeveloped state is counter-productive. Someone else has your idea. Guaranteed. And it's likely they are developing it. So, you should be developing it too, and hopefully releasing it to the world, rather than worry about who's stealing it, or who came up with it first. The first to market reaps the rewards.

Command line interface for Gliffy

January 14, 2009

My command line interface for Gliffy is relatively complete. It works pretty well, though the error handling isn't very clean. It's written in Ruby (RDoc here) and can be used as a Ruby client for Gliffy.

I decided on Ruby since that would be the most fun and didn't require learning a new programming language. I initially tried to make an ActiveRecord-style domain-based interface, but it was just too difficult and it was hard to see the real benefit. At the end of the day, I think integrating Gliffy into another application is a relatively simple thing, and a procedural interface would probably be the easiest way to do that. So, I modeled it after the PHP client library, more or less.

The command line interface uses the Ruby client library and provides just the basic functions I need:

&gt; gliffy ls
321654 Some Diagram
987654 Some Other Diagram
&gt; gliffy edit 321654
# Takes you to your browser to edit the diagram
I live on the command line, so this is much more expedient than logging into Gliffy and navigating the UI to edit a diagram.

I'm already feeling like providing access to the folders via the command line would be helpful (they are exposed in the Ruby client of course). Not sure how much the API will ultimately change (it's in private beta now), but hopefully not too much.

GitHub does it again; another killer feature

December 18, 2008

GitHub Pages (explained here) is yet another awesome feature of GitHub. You can publish, via git, arbitrary web content (even piping it through Jekyll for Textile markup and syntax highlighting). They have been keeping a tremendous momentum of late; introducing new features on a regular basis. I hope they keep it up. GitHub is, IMO, crushing SourceForge and Google Code in terms of simplicity, ease-of-use, and overall functionality.

Gliffy API private beta: what should I do?

December 12, 2008

Gliffy hooked me up with access to the private beta of their API (which I helped design and implement). I create a PHP client and experimental MediaWiki plugin to validate the API while working for them, and now I want to get something else going in my spare time.

My first thought was to make a Ruby client, because I think it would be fun and relatively easy. But, I have to admit that a Wordpress plugin would be more useful to me personally. That being said, A Trac extension would be useful at work, since we are using Trac (which is python based, and I can't say I'm too interestedin Python at the moment). I think if GitHub allowed git access to project wikis, it would be cool to allow easier integration of Gliffy diagrams to GitHub project wikis.

At any rate, I don't have tons of time outside of work, so I want it to be something easily achievable, and also something Chris and Clint are not likely to work on themselves....