Using Java Persistence with Tomcat and no EJBs
May 08, 2008
- Create a persistence.xml file as per standard documentation leaving out the jta-data-source stanza (I could not figure out how to get Hibernate/JPA to find my configured data source)
- Create your hibernate.cfg.xml, being sure to include JDBC conncetion info. This will result in hibernate managing connections for you, which is fine
- Create a persistence jar containing:
- Hibernate config at root
- persistence.xml in META-INF
- All classes with JPA annotations in root (obviously in their java package/directory structure)
- This goes into WEB-INF/lib of the war file (being careful to omit the JPA-annotated classes from WEB-INF/classes
EntityManagerFactory emf =
Persistence.createEntityManagerFactory("name used in persistence.xml");
EntityManager em = emf.createEntityManager();
Query query = em.createQuery("from Account where name = :name");
query.setParameter("name",itsAccountName);
List results = query.getResultList();
// do stuff with your results
em.close();
emf.close();
EntityManagerFactory emf =
Persistence.createEntityManagerFactory("name used in persistence.xml");
EntityManager em = emf.createEntityManager();
EntityTransaction tx = em.getTransaction();
tx.begin();
Query query = em.createQuery("from Account where name = :name");
query.setParameter("name",itsAccountName);
List results = query.getResultList();
// modify your results somehow via persist()
// or merge()
tx.commit();
em.close();
emf.close();
Git and SVN: connecting git branches to svn branches
April 28, 2008
The first thing creates a new branch called "local-trunk" started at "trunk" (which is the remote branch mapping to the subversion main trunk). The second command creates a new branch called "local-foo", which is rooted at remote branch "FOO". I have no clue why I couldn't do the same thing twice, as both commands seem to do the same thing (the first switches to the branch "local-trunk" after creating it). But, this is what worked for me. Now, to develop, I git checkout local-foo and commit all day long. a git-svn dcommit will send my changes to subversion on the FOO branch. I can update the trunk via git checkout local-trunk and git-svn rebase. My hope is that I can merge from the trunk to my branch periodically and then, when my code is merged to the trunk, things will be pretty much done and ready to go. We'll see. On a side note, the git repository, which contains every revision of every file in the subversion repository is 586,696 bytes. The subversion checkout of just the FOO branch is 1,242,636 bytes; over double the size, and there's still not enough info in that checkout to do a log or diff between versions.git-checkout -b local-trunk trunk git branch local-foo FOO
REST Security: Signing requests with secret key, but does it work?
April 21, 2008
Both Amazon Web Services and the Flickr Services provide REST APIs to their services. I’m currently working on developing such a service, and noticed that both use signatures based on a shared secret to provide security (basically using a Hash Message Authentication Code).
It works as follows:
- Applications receive a shared secret known only to them and the service provider.
- A request is constructed (either a URL or a query string)
- A digest/hash is created using the shared secret, based on the request (for Flickr, the parameter keys and values are assembled in a certain way, so that Flickr can easily generate the same string)
- The digest is included in the request
- The service provider, using the shared secret, creates a digest/hash on the request it receives
- If the service provider’s signature matches the one included in the request, the request is serviced
It’s actually quite simple, and for one-time requests, is effective. The problem, however, is that anyone intercepting the request can make it themselves, without some other state being shared with the client and service provider. Consider a request for an image. The unsigned request might look like:
/api/images?image_id=45&type=jpg
The signed request, would look like so:
/api/images?image_id=45&type=jpg&signature=34729347298473
So, anyone can then take that URL and request the resource. They don’t need to know the shared secret, or the signature algorithm. This is a bit of a problem. One of the advantages of REST is that URLs that request resources are static and can be cached (much as WWW resources are). So, if I wish to protect the given URL, how can I do so?
HTTP Authentication
The usual answer is HTTP Authentication; the service provide protects the resource, and the client must first log in. Login can be done programmatically, and this basically accomplishes sending a second shared secret with the request that cannot be easily intercepted. HTTP Auth has its issues, however, and might not be feasible in every context.
Another way to address this is to provide an additional piece of data that makes each request unique and usable only once. To do so requires state to be saved on the client and the server.
Negotiated One-time Token
Authentication can be avoided by using the shared secret to establish a token, usable for one request of the given resource. It would work like this:
- Client requests a token for a given resource
- Service Provider creates a token (via some uuid algorithm ensuring no repeats) and associates it with the resource
- Client creates a second request, as above, for the resource, including the token in the request
- Service Provider checks not just for a valid signature, but also that the provided token is associated with the given resource
- If so, the token is retired, and the resource data is returned
Here, the URL constructed in step 3 can be used only once. Anyone intercepting the request can’t make it again, without constructing a new one, which they would be unable to do without the shared secret. Further, this doesn’t preclude caching. The main issue here is that since two requests are required, simultaneous access to one resource could result in false errors: if Client A acquires a token, and Client B requests one before Client A uses the token, Client A’s token could be squashed, resulting in an error when he makes his request. The service provider can alleviate this by allowing the issuance of multiple active tokens per resource.
Timestamp
A disadvantage to the One-Time Token method is that it requires two requests of the service provider for every actual request (one to get the token and one to request the resource). A way around that is to include a timestamp in the request. This would work as follows:
- Client creates request, including the current time. This request is signed as per above procedure
- Service provider validates the request and compares it’s time with the given timestamp.
- If the difference in the service provider’s time and the client’s provided time is within some tolerance, the request is serviced
This obviously requires the two clocks to be vaguely in sync. It also allows the resource to be requested by anyone within the timespan of the tolerance. But, it does save a second request to the client.
Self-created One-time Token
This is an amalgam of the Timestamp solution and the Negotiated One-time Token solution. Here, the client creates its own token, as a simple integer of increasing value. The server maintains the last requested value and accepts only requests with a higher number:
- Client creates request, using a global long-lived number
- Client signs requests and sends it to the service provider
- Service provider validates the signature and compares the provided numeric token with the one last used (the tokens can be globally scoped, or scoped for a given resource)
- If the provided numeric token is greater than the previous, the request is serviced
- The Client increments his numeric token for next time
As with the Timestamp solution, only one request is required. As with the negotiated one-time token solution, the URL can never be used twice. The main issue here is if the client forgets its numeric token. This could be addressed with an additional call to re-establish the token, made only when the Client has determined it no longer knows the last used value.
Unfortunately, this is much more susceptible to race conditions than the Negotiated one-time token. Since the service provider doesn’t know what tokens to expect (only that they should be greater than the last requested one), the client has to ensure that the “create request, submit request, receive response, update local numeric token” cycle is atomic. That is not straightforward.
Update Got another idea from a co-worker
Session Token
When a user access the system that uses the REST API, they get issued a token (via the REST API). This token is just like a session token, with an inactivity timeout and so forth. The token can be manually invalidated via the API, so that when a user logs out or completes some logical task, the token can be invalidated.
This suffers none of the problems of the other solutions, though it isn’t the most secure. However, the security problem it has (using the valid URL before the session times out) is fairly minor, and the tradeoff of getting one request per actual request and no race conditions makes it probably the best way to go.
Shell history meme
April 17, 2008
- 149 cd - Well, duh
- 104 ./run.sh - Occasionally to run JBoss (I have to have two JBoss servers running locally, and when I do a complete recompile/redeploy (not often), I must restart, as ant fstats the ever-loving shit out thousands of files (multiple times, per, 'natch, cause it so superior to make) at the same time JBoss is hot-deploying (twice) and my machine just dies), but more often to run the application I'm developing.
- 88 tail - To read log files (were I on OS X, Console.app would be the way to go)
- 84 ls - Well, duh
- 73 git - I
commit
a lot, Idiff
alot, Istat
alot, and Iblame
alot. You should, too - 57 sqlshell.pl - Awesomely awesome command-line SQL client written by my co-worker and enhanced on occasion by me. I'll probably steal it when I leave this project, and I wish he'd put it up on sourceforge
- 51 jobs - To find out which JBosses to kill (frequently executed in the wrong shell window)
- 44 kill - To then go kill JBoss
- 43 vi - To get some work done
rm
, grep
, git-svn
Distributed version control with Git for code quality and team organization
April 15, 2008
- Commit very rarely
- Get new changes from the repository only when absolutely needed
- Assign a team lead to integrate the code - this is a senior developers who can assess code quality, provide mentoring and guidance and can be trusted to put code into the repository destined for QA and production
- Each developer clones the team lead's repository - This is done to baseline the start of their work
- Developers commit, branch, merge, and pull as necessary - Since Git makes merging simple, developer's can have full use of all features of version control and can do so in their environment without the possibility of polluting the main line of development. They can also share code amongst themselves, as well as get updates from the team lead's repository of "blessed" code1
- Developer's inform the lead of completion
- Lead pulls from their repository - The lead reviews the developer's changes and applies the patch to his repository. He can then exercise whatever quality control mechanisms he wishes, including automated tests, manual tests, reviews, etc2.
- Lead rejects patches he doesn't agree with - If the patch is wrong, buggy, or just not appropriate in some way, the lead rejects the patch and provides the developer with information on the correct approach
- Lead accepts patches he does agree with - If the lead agrees with the patch, he applies it to his repository, where it is now cleared for QA
1 Git even allows a developer to merge certain commits from one branch to another. Suppose Frank is working on a large feature, and happens to notice a bug in common code. He can address that bug and commit it. Gary can then merge only that commit into his codebase to get the bugfix, without having to also take all of Frank's in-progress work on the large feature. Good luck doing that with StarTeam.
2 A CI system could be set up in a variety of ways: it could run only against the lead's "blessed" repository, or it could run against an intermediate repository created by the lead (who then blesses patches that pass), or it could be totally on its own and allow developers to submit against it prior to submitting to the lead.
Quick and Dirty Code Reviews: Check commit logs
April 03, 2008
Where I've been working for the past few months, we've been under the gun to meet an aggressive deadline. As management is want to do, they've added several new developers to the project. One of the many reasons why adding developers is ultimately a bad thing is that, in addition to the complexity in communication, there is a risk of innocent well-developed code being added to the codebase that is Just Plain Wrong. Our system has been in development for many years and contains a wide variety of coding styles, implementation patterns and idioms. Some of them should Never Be Followed Again, and some are the Correct Way of Doing Things. There's really no easy way for a new developer to know what is what. Now, outside of going back in time, creating pedantic test cases, gathering requirements and incessantly refactoring, we need an option to make sure bad code doesn't get into the codebase. By "bad" I don't mean poorly written or buggy code, I mean code that does not fit into the system as a whole. For example, a developer was writing some code to generate printable reports. His implementation resulted in a very nice report popping up in a Swing window in our application (our application is a Swing front-end to a JEE back-end). It was very well-implemented and looked great. However, everywhere else in the application, reports are generated by writing a PDF to disk and asking Windows (via JDIC) to "view" the file. This is the code I'm talking about.Large maintenance + aggressive schedule + lots of new developers + minimal system documentation ______________________________ Need for highly efficient and effective QA procedures
Finding bad code
At the start of the project, we went through the arduous process of identifying each bit of work in an MS-Project file and assigning developers to it. New developers were given tasks that didn't touch core components, while experienced developers got tasks involving major refacotrings or database changes. Our project lead suggested that each module undergo a code review. It sounds reasonable, however all of us let out a collective groan at the thought of wrangling together 3-4 developers once a week for an hour or two to go over printouts or screenfulls of code, much of which was simply being modified. One of the senior developers proposed the solution we ultimately went with: senior developers get emailed the diffs of all commits and make sure to spend some time reading and reviewing those commits. Coupled with our policy of "commit early, commit often", this has worked out great.Diff-based code review
Here's what you need:- A concurrent version control system developers trust. Recommed Git or Subversion if you must.
- A simple script to email diffs on every commit. Usually included as an example hook for must version control systems.
- IM clients (Google talk within GMail works in even the most oppressive environment)
- A sane version control policy: committed code must:
- Compile
- Not prevent application deployment/startup
- Not horribly break someone else's code (optional)
- A sane coding style policy: if you must re-indent or change bracing style, do it in its own commit, outside of actual code changes. Better yet, don't do it at all. Formatting changes can obscure the history of a piece of code and should be made minimally, if at all.
- Diffs get emailed to the senior developers as soon as they happen
- Senior Developers read the diffs, using IM to discuss any issues
- If code does have issues, the diff is forwarded to the developer who committed, with comments on what to change and why (senior developers decide amongst themselves who will send the feedback, or a "lead developer" can, if one is identified)
Imports considered annoying and pointless
December 01, 2007
- But imports help the compiler locate classes! - Why should I help the compiler locate classes? Why put a fully automatable task in the hands of a developer? Are you telling me the compiler can't index the classpath to more efficiently search at compile-time?
- But imports let you know class dependencies - No, they don't. Only if you don't use star imports and only if you import only what you need would this be the case. However, not really, because your class could Class.forName. And, honestly, how much time do you spend looking at the import statements to perform this analysis? An automated tool could provide this information much more correctly and completely
- But how would I know what classes are in a jar and what are in the codebase? - You'd know the same way the compiler knows. And, to be honest, the code should be written for the maintainers, not for the new kids. Anyone new to a codebase can, relatively quickly, figure out what is in the code base and what isn't. This, along with proper tools for locating classes integrated into your IDE would be much better than looking at import statements and grep'ing the output of jar tvf.
package myapp;
// no point in putting the dir-structure as dots, the compiler
// can figure it out. Instead we indicate that this class, whever
// it is, is part of the conceptual package "myapp"
import commons-lang[2.1,]; // check for version 2.1 or greater
import commons-logging[1.0.*]; // check for version 1.0.* only
import j2ee[5.0,5.3]; // check for any version from 5.0 to 5.3
clarify java.util.Date;
public class Whatever
{
public static void main(String args[])
{
Date date = new Date();
// whatever else
}
}
Why is J2EE/JBoss configuration such a nightmare?
November 26, 2007
<mbean name="big.fucking.long.whatever">
<attribute name="SomeProperty">some value</attribute>
<attribute name="SomeOtherProperty">another value</attribute>
<attribute name="TimeWastedTypingAngleBrackets">10 hours</attributes>
<attribute name="MoneyWastedPayingForXMLSpy">$10000</attribute>
</mbean>
over this:
big.fucking.long.whatever
SomeProperty=some value
SomeOtherProperty=another value
TimeWastedTypingAngleBrackets=0 seconds
MoneyWastedPayingForXMLSpy=$0
It seems to me that if all we are doing is configuring a set of properties and values, a format similar to the windows .ini format would be much prefered. And, honestly, if we can't do better than Windows, what the fuck. I guess one thing all three formats have in common is that you have no fucking idea what the attributes mean, which are required or what will happen at runtime.
If you are lucky, you have the mbean source or javadoc (don't forget to look for is to precede boolean properties and get to precede all others!) Also, fucking this up generated an Oracle-quality error message from JBoss: "Attribute SomeProperty not found". So, are you looking for SomeProperty and didn't find it, or did you get it and not want it?
Of course, we could, actually, leverage the power of XML and tools like DTDDoc and XSD Doc and do something like this:
<mbean name="big.fucking.long.whatever">
<SomeProperty>some value</SomeProperty>
<SomeOtherProperty>another value</SomeOtherProperty>
<TimeWastedTypingAngleBrackets>10 hours</TimeWastedTypingAngleBrackets>
<MoneyWastedPayingForXMLSpy>$10000</MoneyWastedPayingForXMLSpy>
</mbean>
This, if backed by a schema, would actually be a nice way to document (and enforce) configuration.
Bonus points to Hibernate for allow properties or XML or MBean configuration and for having the property names different in each fucking format. It seems like a lot of extra work to make them all different.
I'm not saying I want a Microsoft Enterprise Application Wizard, but a little common sense could go a long way.