Create public-facing unique keys alongside your primary keys
August 26, 2024 📬 Get My Weekly Newsletter ☞
Peter Shilling wrote a blog post about generating what he calls “cool ids” for models in Rails. Instead of numbers, models have an id prefixed with its type, like cus_abcdefg1234
for a customer. I like the idea, but don’t think these ids should be used as primary keys. External IDs are incredibly useful for a lot of things, but they aren’t great as primary keys. I’ll explain what public-facing external IDs are, how to make them, and why they shouldn’t be your primary keys.
Stripe’s admin UI surfaces identifiers for various entities, but these identifiers aren’t numbers, nor are the UUIDs. They are values
like cus_734t8wri4thugiuh
or pi_4t98yerihrsdf
. The bit before the underscore tells you what sort of thing it is (a customer, and payment intent, respectively), and the remainder is a unique value. I’m going to call these external IDs since, from Stripe’s perspective, they are being shared externally.
I’ve come to realize that you should almost always create external IDs for your database tables, and that they should be prefixed with a type identifier, but not be used as the primary key for your tables.
What’s So Great about External IDs?
When data is in a database, there is often need to refer to it outside whatever systems are built to manage it. For example, a customer service agent may need to ask an engineer about a particular customer, and they need to do that unambiguously. Sharing personal details like email or name is not a great practice and it’s often imprecise.
In a typical Rails app, you’d see the database primary keys in URLs, like admin.example.com/customers/1234
. The internal team will
start using these ids, e.g. “Hey Pat, can you figure out why customer 1234’s password resets aren’t working?”. There are many
downsides to this, but a big one is that if you see the value 1234
out of context, it’s hard to know what it is.
Even if everyone is dilligent about providing context, mistakes can get made. Imagine a URL like
admin.example.com/customers/1234/orders/4567
. If someone isn’t careful, they may copy the 4567
and refer to that as a customer
ID.
External IDs that are prefixed with a type eliminate this problem. Instead of a customer ID being a number, it starts with a prefix
telling you what it is, e.g. cus_1234
. That way, if the order ID is copied mistakenly, it will be obvious that it isn’t a customer
ID, since it would look like ord_4567
.
While you could certainly create these ids as derived fields, as below, you don’t want to.
class Customer
# Don't do this - see below
def external_id = "cus_#{self.id}"
end
Even though this value is not the primary key, it’s derived from it and thus, you are sharing the primary key externally, which should be avoided.
Primary Keys Should Not Be Externalized
Exposing primary keys has a lot of downsides and pretty much no upside (especially when see how straightforward it is to have an external ID). Most systems use numeric, monotonically increasing values for primary keys. This opens up any URL containing them to an enumeration attack, where a threat actor will just increase the number they see hoping to find a value they can access but shouldn’t.
Numeric keys can also expose business metrics. If you get a receipt with your customer ID on it, and that ID is, say, 123987, you can likely surmise the company has at least 100,000 customers.
You can certainly mitigate these problems by using UUIDs for your primary keys, however UUIDs still won’t address the most pernicious of issues, which is shadow processes using the IDs.
A good rule of thumb is that any data you externalize—even to internal users—is likely to be input into another system or process that you may not control…or even know about.
This means that the values you’ve set up to ensure referential integrity (i.e. keeping your database consistent and correct) are now coupled to systems and processes you may not be able to change. I had a job once where primary keys were used in other systems and when our database’s key ended up rolling over to a sixth digit, all hell broke loose in these heretofore unknown other systems.
It’s this last issue which is why you should create clear separate between columns in your database.
External IDs are Keys, Just Not Primary Ones
The word “primary” in “primary key” is there for a reason. It implies the existence of many keys, with one being used most of the time. But you can have many other keys as needed by the engineering team, database administrator, or business owners.
For example, many systems require unique email addresses. This is a key, even if not the primary one (and, if the business no longer requires unique emails, this key is no longer a key—good thing it wasn’t your primary key :).
An external ID is just another key you can use to uniquely identify a row. But, critically, it doesn’t serve any other purpose inside the database. It’s not used as a foreign key, and it doesn’t encode any domain information. This means you can be relative safe exposing it as a unique identifier for any reason.
This allows you to change it if you needed to, or move what table it’s a part of. If your primary keys were being shared or exposed, both of those operations would be difficult or impossible.
External IDs in Practice
There are many ways to generate external IDs. In Rails, you can use before_save
to ensure the value is there:
class Customer
before_save do |record|
if record.external_id.blank?
hex_id = SecureRandom.hex
record.external_id = "cus_#{hex_id}"
end
end
end
Note that there gems like the aforementioned cool_id or external_id that you should consider or at least copy to make this mechanism easier to manage and use.
If you are using Postgres, you can use database triggers instead. First, define a function that generates the ID. It accepts a single argument, which is the prefix (yes, Postgres’ programming language is super weird):
CREATE FUNCTION generate_external_id()
RETURNS trigger AS $$
BEGIN
NEW.external_id := TG_ARGV[0] || '_' || md5(random()::text);
RETURN NEW;
END;
$$ LANGUAGE plpgsql
;
Then, for any table you create, you’ll use this to create a trigger:
CREATE TRIGGER
customers_external_id
BEFORE INSERT ON
customers
FOR EACH ROW
EXECUTE PROCEDURE generate_external_id('cus');
I don’t think Rails has a way to do this directly, but you can wrap any SQL in a call execute(...)
inside a migration. You can also
wrap this into a method you can call inside a migration to avoid copying and pasting this boilerplate.
Once the external IDs are being inserted into the database, you’ll want to use them to e.g. build URLs. In Rails, URLs are built from Active Records and Active Models via the to_param
method:
class Customer
def to_param = self.external_id
end
external_id
will show up by default when you call to_json
, but you may want to omit id
from that payload to avoid leaking the ID
you are working to protect. You can do that by overriding serializable_hash
:
class Customer
def serializable_hash(options=nil)
default_options = { except: :id }
options = if options.nil?
default_options
else
# Allow params to override our defaults
# to preseve the method's expected behavior
default_options.merge(options)
end
super(options)
end
end
Lastly, you’ll want a way to locate records by this id. You can certainly do Customer.find_by!(external_id: params[:id])
, however
you could also override find
:
class Customer
def find(*arguments)
if arguments.size == 1
customer_by_external_id = find_by(
external_id: arguments.first
)
if customer_by_external_id.present?
return customer_by_external_id
end
end
super(*arguments)
end
end
This would preserve find
’s behavior on the primary key, but also allow looking up by the external id, too. I’ve done this before, but now feel it’s better to use find_by!(external_id:)
to be more explicit about what’s going on.
You can extract all of this behavior to a module or put it into ApplicationRecord
.