Shell history meme

April 17, 2008

In response to the rage-of-the-moment shell history meme, I present my list, which looks different than a lot of peoples':
  • 149 cd - Well, duh
  • 104 ./run.sh - Occasionally to run JBoss (I have to have two JBoss servers running locally, and when I do a complete recompile/redeploy (not often), I must restart, as ant fstats the ever-loving shit out thousands of files (multiple times, per, 'natch, cause it so superior to make) at the same time JBoss is hot-deploying (twice) and my machine just dies), but more often to run the application I'm developing.
  • 88 tail - To read log files (were I on OS X, Console.app would be the way to go)
  • 84 ls - Well, duh
  • 73 git - I commit a lot, I diff alot, I stat alot, and I blame alot. You should, too
  • 57 sqlshell.pl - Awesomely awesome command-line SQL client written by my co-worker and enhanced on occasion by me. I'll probably steal it when I leave this project, and I wish he'd put it up on sourceforge
  • 51 jobs - To find out which JBosses to kill (frequently executed in the wrong shell window)
  • 44 kill - To then go kill JBoss
  • 43 vi - To get some work done
Runners up: rm, grep, git-svn

Distributed version control with Git for code quality and team organization

April 15, 2008

In my previous post, I outlined a code review process I've been using with reasonably effectiveness. It's supported, in my case, by the Git source code management tool (most known for it's use in managing the Linux kernel). Git or, more generally, distributed development, can encourage some good quality control procedures in teams working on enterprise software. The lessons learned from the open source world (and the Linux kernel, in particular) can be applied outside the world of OSS and to the consultant-heavy world of enterprise/in-house software development. The project I've been working on for the past several months has undergone what I believe to be a common change on in-house/enterprise software, which is that several new developers are being added to the project. Outside of the learning curve required with any new system, many of them are not seasoned Java developers, or are otherwise missing experience in some key technologies in use. While code reviews are a great way to ensure these developers are doing things the right way, there is still concern that their ability to commit to source control could be problematic for the entire team. Consider a developer breaking the build, or incorrectly refactoring a key piece of shared code. A review of their commit and some continuous integration can help identify these problems, but, once identified, they must be removed from the codebase. In the meantime, the development team could be stuck with an unusable build. This can lead to two bad practices:
  • Commit very rarely
  • Get new changes from the repository only when absolutely needed
These "anti-practices" result in unreadable commit logs, difficult (or skipped) code reviews, duplication of code, and a general discoherence of the system. This is primarily due to the way most common version control systems work. In reserved-checkout systems (e.g. PVCS, StarTeam) and concurrent systems (CVS, Subversion), there is the concept of the one true repository of code that is a bottleneck for all code on the project. The only way Aaron can use Bill's code is for Bill to commit it to the repository and for Aaron to check it out (along with anything else committed since the last time he did so). The only way Carl can effectively review Dan's code, or for the automated build to run his test cases, is to checkout code from the repository and examine/run it. This reality often leads to situations where each developer is operating on his own branch. The problem here is that CVS and Subversion suck at merging. This makes the branching solution effectively useless. Enter Git. With Git, there is no central repository. Each developer is on his own branch (or his own copy of someone's branch) and can commit to their heart's content, whenever they feel they have reached a commit point. Their changes will never be forced upon the rest of the team. So, how does the code get integrated? Developer's submit their code to the team lead/integrator (who is the ultimate authority on what code goes to QA/production/the customer), who then reviews it and either accepts or rejects it. If code is rejected, the team lead works with the developer to get it accepted (either via a simple email of the issues, or more in-depth mentoring as needed). Git makes this painless and fast, because it handles merging so well. Consider how effective this is, especially when managing a large (greater than, say, five) team of developers working concurrently. The only code that gets into the production build will have been vetted through the team lead; he is responsible for physically applying each developer's patches (an action that takes a few minutes or even seconds in Git). Further, developers get instant feedback on their code quality. In most cases, bad commits are the result of ignorance and lack of experience. A code review, with instant feedback, is a great way to address both of those issues, resulting in a better developer and a better team, based on open, honest, and immediate communication. Here's how to set this up:
  1. Assign a team lead to integrate the code - this is a senior developers who can assess code quality, provide mentoring and guidance and can be trusted to put code into the repository destined for QA and production
  2. Each developer clones the team lead's repository - This is done to baseline the start of their work
  3. Developers commit, branch, merge, and pull as necessary - Since Git makes merging simple, developer's can have full use of all features of version control and can do so in their environment without the possibility of polluting the main line of development. They can also share code amongst themselves, as well as get updates from the team lead's repository of "blessed" code1
  4. Developer's inform the lead of completion
  5. Lead pulls from their repository - The lead reviews the developer's changes and applies the patch to his repository. He can then exercise whatever quality control mechanisms he wishes, including automated tests, manual tests, reviews, etc2.
  6. Lead rejects patches he doesn't agree with - If the patch is wrong, buggy, or just not appropriate in some way, the lead rejects the patch and provides the developer with information on the correct approach
  7. Lead accepts patches he does agree with - If the lead agrees with the patch, he applies it to his repository, where it is now cleared for QA
This may seem convoluted, but it actually carries little overhead compared to a junior developer performing a "nuclear bomb" commit that must then be rolled back. For much larger teams, the approach can be layered, with the primary team lead accepting patches only from lieutenants, who accept patches from the primary developers. Unlike a lot of hand-wavy processes and practices, this model has been demonstrated effective on virtually every open source project. Even though the Linux kernel is one of the few to use technology to support this process (Git), every other large OSS project has the concept of "committers" who are the people allowed to actually commit. Anyone else wishing to contribute must submit patches to a committer, who then reviews and approves of their patch (or not). I belive this would be highly effective in a professional environment developing in-house or enterprise software (especially given the typical love of process in those environments; this process might actually help!). I have been on at least three such projects where it would've been an enormous boon to quality (not to mention that the natural mentoring and feedback built into the process would've been hugely helpful for the more junior developers).
1 Git even allows a developer to merge certain commits from one branch to another. Suppose Frank is working on a large feature, and happens to notice a bug in common code. He can address that bug and commit it. Gary can then merge only that commit into his codebase to get the bugfix, without having to also take all of Frank's in-progress work on the large feature. Good luck doing that with StarTeam.
2 A CI system could be set up in a variety of ways: it could run only against the lead's "blessed" repository, or it could run against an intermediate repository created by the lead (who then blesses patches that pass), or it could be totally on its own and allow developers to submit against it prior to submitting to the lead.

Quick and Dirty Code Reviews: Check commit logs

April 03, 2008

             Large maintenance 
+          aggressive schedule 
+       lots of new developers 
+ minimal system documentation
______________________________
 Need for highly efficient and 
       effective QA procedures
Where I've been working for the past few months, we've been under the gun to meet an aggressive deadline. As management is want to do, they've added several new developers to the project. One of the many reasons why adding developers is ultimately a bad thing is that, in addition to the complexity in communication, there is a risk of innocent well-developed code being added to the codebase that is Just Plain Wrong. Our system has been in development for many years and contains a wide variety of coding styles, implementation patterns and idioms. Some of them should Never Be Followed Again, and some are the Correct Way of Doing Things. There's really no easy way for a new developer to know what is what. Now, outside of going back in time, creating pedantic test cases, gathering requirements and incessantly refactoring, we need an option to make sure bad code doesn't get into the codebase. By "bad" I don't mean poorly written or buggy code, I mean code that does not fit into the system as a whole. For example, a developer was writing some code to generate printable reports. His implementation resulted in a very nice report popping up in a Swing window in our application (our application is a Swing front-end to a JEE back-end). It was very well-implemented and looked great. However, everywhere else in the application, reports are generated by writing a PDF to disk and asking Windows (via JDIC) to "view" the file. This is the code I'm talking about.

Finding bad code

At the start of the project, we went through the arduous process of identifying each bit of work in an MS-Project file and assigning developers to it. New developers were given tasks that didn't touch core components, while experienced developers got tasks involving major refacotrings or database changes. Our project lead suggested that each module undergo a code review. It sounds reasonable, however all of us let out a collective groan at the thought of wrangling together 3-4 developers once a week for an hour or two to go over printouts or screenfulls of code, much of which was simply being modified. One of the senior developers proposed the solution we ultimately went with: senior developers get emailed the diffs of all commits and make sure to spend some time reading and reviewing those commits. Coupled with our policy of "commit early, commit often", this has worked out great.

Diff-based code review

Here's what you need:
  • A concurrent version control system developers trust. Recommed Git or Subversion if you must.
  • A simple script to email diffs on every commit. Usually included as an example hook for must version control systems.
  • IM clients (Google talk within GMail works in even the most oppressive environment)
  • A sane version control policy: committed code must:
    • Compile
    • Not prevent application deployment/startup
    • Not horribly break someone else's code (optional)
    Developers should commit as frequently as they want (and preferably frequently). I typically commit code I feel is "done" but that might not add up to an actual feature. This requires accepting that head is not a real verison. Most real version control systems have the ability to tag, branch, etc. These features are for "real working versions". The head of the trunk is not.
  • A sane coding style policy: if you must re-indent or change bracing style, do it in its own commit, outside of actual code changes. Better yet, don't do it at all. Formatting changes can obscure the history of a piece of code and should be made minimally, if at all.
The "process" (if you want to even call it that) is:
  1. Diffs get emailed to the senior developers as soon as they happen
  2. Senior Developers read the diffs, using IM to discuss any issues
  3. If code does have issues, the diff is forwarded to the developer who committed, with comments on what to change and why (senior developers decide amongst themselves who will send the feedback, or a "lead developer" can, if one is identified)
Part of this requires some level of diplomacy, however a plain, to-the-point email on what the issues are with a piece of code, why the changes should be made, and a suggestion on how to make them should be digestible by anyone. I've had great success with this, having caught a wide variety of problems (even in my code, by others) without having to have one meeting or to print out one sheet of code. The fact is, on a maintenance project, you aren't reviewing the codebase, but changes to that codebase. Diffs are the way to understand what changes are being made.

Imports considered annoying and pointless

December 01, 2007

What is really the point of import statements in Java? Am I meant to believe that while perl, vim, find, eclipse, emacs, or any other tool written in the last decade can locate my class files that javac cannot? Couldn't javac, when faced with a usage of the class ArrayList, figure out that since the only fucking class named ArrayList available to it is in java.util that that might possibly be the class I mean? Other than resolving ambiguities, imports are a manual way to accomplish what the compiler could much more easily. Plus, removing them would reduce pointless coupling, improve maintenance and result in a class header that provided actual value, and not a shitload of lines to fold and skip over.
  • But imports help the compiler locate classes! - Why should I help the compiler locate classes? Why put a fully automatable task in the hands of a developer? Are you telling me the compiler can't index the classpath to more efficiently search at compile-time?
  • But imports let you know class dependencies - No, they don't. Only if you don't use star imports and only if you import only what you need would this be the case. However, not really, because your class could Class.forName. And, honestly, how much time do you spend looking at the import statements to perform this analysis? An automated tool could provide this information much more correctly and completely
  • But how would I know what classes are in a jar and what are in the codebase? - You'd know the same way the compiler knows. And, to be honest, the code should be written for the maintainers, not for the new kids. Anyone new to a codebase can, relatively quickly, figure out what is in the code base and what isn't. This, along with proper tools for locating classes integrated into your IDE would be much better than looking at import statements and grep'ing the output of jar tvf.
I think an approach the addresses the above concerns without adding a lot of cruft is to redefine what we mean by "package". In Java parlance, "package" is really just a directory used to organize classes and ensure unique naming. Conceptually, however, a "package" is a singular unit. For example, Apache's commons-lang contains nine Java packages, but it's really only, conceptually, one package. I think some changes to the language to help us all out would improve things. Wouldn't this be much more readable source code:
package myapp;
// no point in putting the dir-structure as dots, the compiler

// can figure it out.  Instead we indicate that this class, whever

// it is, is part of the conceptual package "myapp"


import commons-lang[2.1,];     // check for version 2.1 or greater

import commons-logging[1.0.*]; // check for version 1.0.* only

import j2ee[5.0,5.3];          // check for any version from 5.0 to 5.3


clarify java.util.Date;

public class Whatever
{
    public static void main(String args[]) 
    {
        Date date = new Date();
        // whatever else

    }
}
This syntax that I just made up is explicit and much more powerful than import statements. You declare your version requirements and dependencies in a different way than clearing up ambiguities. The compiler could even issue warnings when you import things you don't use. It would not be terribly difficult for the compiler to provide this service, and it would keep it in the language and not in the hands of some unwieldy external tool or IDE. I don't know, this just seems fairly obvious to me, and I'm surprised that Java continues the "not much better than #include" method of linking things together.

Why is J2EE/JBoss configuration such a nightmare?

November 26, 2007

I'm glad EJB3 has come along, because it has vastly simplified what you must do to get a J2EE application up and running. It's not 100% smooth, but it's a step in the right direction. That being said, anything beyond simple EJBs and Persistence objects is just such a clusterfuck of configuration, dependencies, undocumented magic strings, masked error messages and XML abuse. Why was XML chosen as a configuration format for what is basically a properties file. What is the advantage of this: <mbean name="big.fucking.long.whatever"> <attribute name="SomeProperty">some value</attribute> <attribute name="SomeOtherProperty">another value</attribute> <attribute name="TimeWastedTypingAngleBrackets">10 hours</attributes> <attribute name="MoneyWastedPayingForXMLSpy">$10000</attribute> </mbean> over this: big.fucking.long.whatever SomeProperty=some value SomeOtherProperty=another value TimeWastedTypingAngleBrackets=0 seconds MoneyWastedPayingForXMLSpy=$0 It seems to me that if all we are doing is configuring a set of properties and values, a format similar to the windows .ini format would be much prefered. And, honestly, if we can't do better than Windows, what the fuck. I guess one thing all three formats have in common is that you have no fucking idea what the attributes mean, which are required or what will happen at runtime. If you are lucky, you have the mbean source or javadoc (don't forget to look for is to precede boolean properties and get to precede all others!) Also, fucking this up generated an Oracle-quality error message from JBoss: "Attribute SomeProperty not found". So, are you looking for SomeProperty and didn't find it, or did you get it and not want it? Of course, we could, actually, leverage the power of XML and tools like DTDDoc and XSD Doc and do something like this: <mbean name="big.fucking.long.whatever"> <SomeProperty>some value</SomeProperty> <SomeOtherProperty>another value</SomeOtherProperty> <TimeWastedTypingAngleBrackets>10 hours</TimeWastedTypingAngleBrackets> <MoneyWastedPayingForXMLSpy>$10000</MoneyWastedPayingForXMLSpy> </mbean> This, if backed by a schema, would actually be a nice way to document (and enforce) configuration. Bonus points to Hibernate for allow properties or XML or MBean configuration and for having the property names different in each fucking format. It seems like a lot of extra work to make them all different. I'm not saying I want a Microsoft Enterprise Application Wizard, but a little common sense could go a long way.

Google Maps Pedometer

June 29, 2006

I love this thing, it's a Google Maps hack that allows you to plot a course, with waypoints, and show you the distance as well as mile markers. I've been using it to plot different paths for running specific distances. You can even save the path for later viewing. Here's a course I ran the other day for a three mile run.

The Power of Digital Audio

June 28, 2006

So, my band has been recording an E.P. and we did all the tracking ourselves. Before taking things into a studio for mixing, I went through and did all the editing. I guess in the old days of actual tape, things would be done differently while tracking, because editing tape involves razors, scotch tape and rulers. With something like Pro Tools, a lot of things can just be handled after the fact. I dunno, maybe that makes us crappy musicians, but in my mind, it's just something that enables us to get our musical ideas recorded with a minima of hassle. I had figured that editing the vocals together would be the biggest task. Basically taking the best phrase from multiple vocal takes and creating a final "comped" take for the vocals. It turns out that a much trickier part involved the drums. We didn't use a click track or metronome, as Michelle keeps reasonably steady time and we just didn't have time to rehearse to the click. The tricky part about this is that anywhere in a song where she's not playing, she has to click her sticks so we have a time reference. See, Tony, Devon and I would be recording our tracks later when she wasn't there, so we need a beat at all times during the song. We've got a couple songs with some multiple-measure stops in them, where Michelle hits a big cymbal crash, the band plays and then she comes back in. The first thing I noticed was that her stick clicks are in the middle of the decay of the cymal crash, like so: Now, I could just fade out right before the first click, but that would be highly awkward sounding. I could simply remove the click, but then there would be a noticable gap in the cymbal decay. So, Time Expansion to the rescue! First step, cut out the portion with the click sound: Next step, select an area of the cymbal decay adjacent and previous to the area I just removed, and use the time expansion/compression plug in to stretch it to fill the remaining space, without modifying the pitch: Note that I had to calculate the amount of space to fill via samples, and ensure that the "Sound vs. Rhythm" slider was all the way on Sound, or you get a noticable flanging effect. Once that's done, we get this: which works OK, but there's noticeable clicks when we pass from the edited audio to the unedited audio. A quick crossfade of both sections gives us and then we do it about 11 more times. The result is a smooth cymbal decay without any sound of stick clicks! I guess if we'd been doing this with tape, we would either have had to use a click track or have someone else click the missing rhythm in Michelle's headphones. Either way, I didn't even think about this problem at the time and thank God I was able to fix it. Go Pro Tools!

Wikipedia and the speed of eBusiness

June 20, 2006

So, I've authored a few Wikipedia entries, and have done large edits to some other, so usually a few times a week, I'll check the watchlist and keep an eye on things. I'll also periodically fix typos or reword things in articles if I'm reading. Usually I'll only bother for articles about old school video games, wrestling or music. I was reading the entry on amateur wrestler-turned-sports-entertainer Brock Lesna and notice some pretty poorly worded passages. A big problem with pro-wrestling entires is that they don't clearly distinguish wrestling storylines from real-world happenings and they come off a bit markish. So, I edited a big part of the entry about his time in WWE. I try to preview before saving, but whenever I edit a lot of text, I end up making several follow-up edits. After a couple of these, I notice a missing comma, so I click "Edit" and I get a blank page. Figuring wikipedia just barfed or something, I try it again. Nothing. I had been just editing the section, so I go back and try to edit the entire article. It has been replace with this (possibly NSFW) entry. Right under my nose! So, I reverted the edit and went to the user's talk page. I then had to get searching through the Wikipedia help section to find out how to flag this guy as a vandal. Meanwhile, he reverted my reversion to his vandalized page again! Another wikipedia user (possibly someone who was watching his talk page) undid his edits. I finally figured out a) how to put a vandal tag on his talk page (he's got a ton there already) and then b) how to inform the Wikipedia admins that he was a repeat offender and needed banning. Within a few minutes of that, I got a message that he was banned for one week and that this was the third time he'd been banned. The entire thing from start to finish was about 5 minutes. Kudos to the openness of wikipedia! Now the world can be more accurately informed about how Brock Lesnar almost broke his neck at Wrestlemania only to get buried by Stone Cold Steve Austin on his way out the door!

Acquisition of a USB cable

June 14, 2006

So, in an effort to bring my parents into the 21st century, I ordered my Mom a new computer. They, unfortunately, use Windows, so a PC it was. I had previously suggested using Dell or Gateway, but it seems that both companies insist on installing massive amounts of bloatware, and neither provides the actual install disks for re-installing your machine. So, I went to a budget computer builder, had them build it, install Windows XP and ship it to me. I then completed the setup, installed patches and all that. Having never used XP, I was pleasantly surprised that it just doesn't suck as much as Windows 2000, but still utterly baffled at the amount of reading and bizarre decision making required of the average user. Plus, I have no idea why the fuck I have to have a puppy dog talking to me when I do a search of the hard drive.

At any rate, I got the computer installed on their "network" without any problems. Their network consists of a dying Windows 2000 box, with the main printer connected to it, a wireless router and the new computer. It was rather inconvienient to move the main printer to connect directly to the XP box, so I used the magical power of networking to connect the printer. In Windows 2000, it's amazingly simple (though not remotely as simple as on OS X). The previous computer pretty much stayed connected to it all the time, even during the myriad of reboots required to keep the computers up and running. I figured with teh awesomz0rz pwnage of XP, it would be even simpler. Right off the bat, you cannot browse to the computer. It just churns and churns and churns, presumably checking every single port of every single possible IP address and asking "Hey! You there! Yes, you, Mr. Port! Do you have any Microsoft products connected to you?".

So, I just go to the computer directly, via the good ole \\COMPUTER_NAME notation. It asks me to login, which I do (why the fuck do I have to log in?!?!!?), indicating I would like the credentials to be remembered, and then I see a list of shared "stuff". The printer is top of the list, right-click, "Connect" and viola, I've now accomplished (hopefully), in six steps what requires one on OS X. But, whatever.

Of course, a few days later, my Mom calls and can't print. Windows has just dropped the connection to the printer. I instruct her to do as above, and, of course, she used forward slashes instead of backslashes (thus searching google for the other computer's name) and, once I'd corrected her, Windows did not remember any part of the login credentials, despite being told to do so. Of course, the printer connected fine and worked. Every 4-5 days, XP just drops the printer, and my Mom has to call me up and we go through this rigamaroll again.

Now, eventually, the Windows 2000 box the printer is connected to will be put out to pasture, and the printer would have to be plugged into the XP box. So, I figure I can save myself some tech support calls if I just connect the printer to the new box and screw the networking crap that Microsoft still can't seem to get right. The printer is about 12 feet away from the computer, so I figure a 15-foot USB cable should be plenty for connecting without having to rearrange anything in my mom's office.

Best Buy's price for a 12 foot cable (the longest they had): $39.99
CompUSA's price for same (again, the longest): $32.99 For a fucking USB cable.

So, to the Internets. I'd heard good things about NewEgg, so I figured this would be a good way to try them out. 15 foot USB cable retails for a measly $3.99. Brief perusal of other online dealers yielded a similar price. What the fuck are the BigBoxes thinking?. I guess that every consumer is a completely uninformed idiot who likes being cheated. Seriously, I can see paying twice as much for the "instant gratification" thing (even though THAT is ridiculous in and of itself), but 10 times as much?!?!?! Wow.

So, the shipping on a $4 USB cable is about $5, so I figure while I'm buying, I'll throw in a few other things I need. Put in a 7-port USB hub for myself, a spindle of CD-Rs for my Mom (who never buys them and therefore never backs up anything ever), and some DVD-R DLs for me (having recently exhausted the majority of my supply backing up the demo for my band), and some CD-Rs for myself as well. All said and done, I've got a good $150 in my shopping cart.

Now, this is ultimately stuff for my Mom, and since I live in the city (and therefore both UPS and FedEx actively hate me), I figured I'll just have it shipped to my Mom's house in Manassas. I create my NewEgg account, enter the shipping address and then am told that if my shipping address and billing address are different:

Contact your card issuer and have the alternate Ship-to Address added as an authorized shipping location in your account records or in the memo field. If you choose to ship to an address other than your billing address that has not been specified as an alternate Ship-to Address with your card-issuer, your order may be delayed by up to several days as we complete verification.
What. The. Fuck.

It seems to me that as of 19 fucking 99, the collective online portion of the human race has solved the difficult problem of shipping to an address other than your billing address. I can't even reacll the last time I was on a website where this feature was not seamless. Yet, these assclowns want me to call my credit card company?!?!?!?! From their own FAQ:

Why must Newegg.com verify my shipping address?
For fraud prevention purposes, if your billing and shipping addresses are different, we must verify your shipping address. Please contact the bank that issued your credit card and have your shipping address listed as an alternate address in that bank's memo field. Please make sure your credit card issuer bank's phone number is correctly listed in your Newegg.com account information.
What kind of fraud might happen if the legal credit card holder charges something on his card and has it shipped to whatever fucking address he wants? I guess NewEgg thinks they are smarter than the entire rest of the Internet, because they are the only place I've been to that has this idiotic requirement. What, do they hate me or something? They sure seem to be treating me as such.

A quick trip to Directron and the same items are now en-route to my Mom's house (I actually saved $10 on the USB hub, to boot). I usually use them for my online computer junk needs, but figured I'd try someone new. I won't be making that mistake again. Right after, I emailed NewEgg informing them that their stupid policy cost them today's sale, and any future sales. I guess if it prevents $150 worth of fraud it was worth it to them.

Update: Received the following message from them regarding my email that they lost a sale:

Thank you for contacting Newegg.

We humbly apologize if this safety precaution has inconvenienced you in anyway but please understand that our intent is only to ensure your satisfaction. <? xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
The weird xml crap was left as-is, with incorrect spacing. WTF is that anyway? So, basically what they are saying is that something that I said specifically dissastified me is there only to ensure my satisfaction. Go figure.