In this article we’ll discuss Digital Identities, what they are, and why they’re important. We’ll look at TrustRoot, it’s architecture and how we can build on top of existing and future identity systems. TrustRoot extends identity systems to easily and securely share verifiable documents.
It’s October 29th 1969 and Charley Kline a UCLA student, is attempting to send the first message over what we call the internet.
He is connecting to Stanford, and his goal is to type the word “login”. The connection is made and he types “l”… “o”…. success. Before entering the next letter the connection fails.
This marks the first message sent over the web.
We’ve come so far, but there are some questions that linger from that day. Most notably the question of “who” and digital identity.
In Charley’s case identity wasn’t a big issue — since there were practically nobody on the internet it was easy to figure out who people were (just by sheer limitation of who could be using the technology). It’d be obvious the message came from Charley and no one else.
Nowadays — you may have heard — but a lot more people are online. This makes it much more difficult to identify a person.
How do you know who you are talking to? How can you really trust that they are who they say they are?
Many great, and fun solutions have popped up in the last decade. We’ve created social media, open sourced strong encryption and even place deep trust in 3rd party identity companies like Google and Facebook. Unfortunately these systems generally rely in trusting a mega corp and centralized coordination.
However more recently, a resurgence for digital identity (and more generally distributed digital trust) is on the rise. It’s hard to exactly pin point when or why this happened, but i’ll speculate here
- Introduction of crypto currencies to the mainstream. This makes concepts of cryptographically backed trust less foreign to the general public.
- Bitcoin reaching >$100 Billion market cap and even more then $200 Billion in 2017. While belief in cryptocurrencies does not reflect greater cryptographic value, this is another strong argument for trust that is easily understood by non technical people.
- Growing impact of “Fake News”. The 2016 election brought light to the serious issue of purposely fake article. Better identification methods would make malicious acts like this harder and less damaging.
- International pressure/competition to better analyze citizens and their activities. In many advance countries we see unprecedented surveillance and partnerships between private tech companies and nation states. The United States is attempting to catch up to some of these standards. Better surveillance can save millions of lives especially in times of disaster.
- Natural progression. Things mature, and gloves fit better after years of wear. It’s expected that the web will develop better systems to model the things important to humans (like identifying others correctly!).
Digital Identity Today
If you don’t believe me that this is a thing, check out the following Google Trends chart as of Wed April 15th 2020.
Its easy to spot the huge spike this year. I’d attribute this to the growing government interest in digital identity.
Over the last couple of years the Department of Homeland Security has issues grants with titles like “Decentralized Digital Identity for Online and Offline Verification”.
News Release: DHS Awards $181K to Verify Digital Credentials
DHS S&T has awarded $181,392 to SICPA Product Security, LLC based in Springfield, VA to develop a solution for…
This year the enterprise efforts are ramping up and consortiums and alliances are forming to tackle this problem head on.
ID2020 | Digital Identity Alliance
The ID2020 Alliance is setting the course of digital ID through a multi-stakeholder partnership, ensuring digital ID is…
Overall we are seeing an increase in interest around trusted digital identities. In the developers world we are seeing many new crypto projects introduce novel and useful digital trust patterns.
Before we discuss TrustRoot and our contribution, let looks at a specific piece of prior work; PGP. This is helpful in orienting our thinking, we can take away some lessons, and stand on the shoulders of giants.
In 1991 Phil Zimmerman created PGP (Pretty Good Privacy) and it started as a human rights project.
Zimmerman was a peace activists in the 1980s and realized that fellow activists and members of grassroots organizations needed way to protect their communications online, and realized that strong encryption was needed.
The ingenious idea that Zimmerman also brought to the table is the concept of a “web of trust”.
This is the simple concept of extending trust across a network of people — the graphic below depicts this.
This idea proved too difficult for most people, and amazing even the inventor of PGP said he did not have the software installed on his computer in 2015
TrustRoot — Building on trust
TrustRoot is not another digital identity project. The is no DID or global datastore. TrustRoot is one level up from Digital Identity, it is a spec that extends ones trust with some useful features.
TrustRoot is a keyless way to share arbitrary verifiable data in a way where the prover can selectively disclose parts of the data.
A verifier only needs a 44 character base64 string, and short efficient proofs (that are log size) to the length of the data to verify anything. This can be done offline and with no specialized soft/hardware.
Trust-Root aims to enable any organization or individual to create verifiable documents in a maintainable and shareable way.
The core concept behind trust root is two part. The first part is deterministic serialization, and second is Merkle inclusion proofs. These two concepts have serious names but are super simple in practice.
Deterministic serialization is a predictable subset of general serialization.
Serialization is the process of converting a computer represented object into a format that can be stored and shared. A deterministic process is a process that always returns “the same” output assuming “the same” input. Hash functions are deterministic since they always produce the same output.
However in our case a (traditional) hash function is too sensitive to the input change. If you change a single byte — or rearrange the sequence — the hash will be completely different.
We want to be able to serialized relatively complex/human generated in a deterministic way that is resilient to extra spacing, mis ordering and even comments. Our goal is to create a serialization format that preserve content but not metadata. More on this later.
Merkle inclusion proofs
Merkle inclusion proofs are a cryptographic/mathematic process that computes and compare the path, from the Merkle root all the way down to the target leaf.
When validating a proof, one wants to check that their computation of the root and path to leaf are exactly the same as the ones provided by the prover. This can be done cheaply and efficiently and with only a small amount of data.
Thankful all widely used computing platforms including mobile, web, Apple, Linux and Windows all have optimized cryptographic libraries that ship with the operating systems. This means that provers and verifiers do not need any specialized code — with the correct commands any computer can easily verify a SHA256 Merkle tree.
Ubiquity and ease are only part of why we use Merkle trees. The more powerful reason is the implications of an inclusion proof and it’s omission tolerance.
If a proof returns true it means the prover and verify can agree that it‘s ’ computational infeasible to change the data. Since the data can not be altered without invalidating the tree this method preserves data integrity.
Omission tolerance refers to the ability to verify content without disclosing information about the other nodes — aside from their hash which doesn’t leak any information. This means we can independently verify any leaf in the tree without leaking anything else.
A cool feature of the verification process is that the verifier actually rebuilds parts of the tree when computing the path. Since the construction of a tree is predictable — the verifier can easily deduce if the whole tree is represented. A verifier can know if the prover is sharing either a part or the full document.
Serialization format and ordering
Choosing a serialization format is important for both technical and adoption reason. There are many standard format like JSON, YAML, XML and ProtoBufs.
However we are going to opt for a slightly newer and more simple format that meets both human and computer needs; TOML.
TOML was made to unambiguously map to a dictionary/hash map. It is also made to be extremely readable and human friendly. Anything that can be expressed in the formats above can be expressed in TOML and it has the added support of allowing comments.
Deserialization between a TOML file and it’s internal memory representation is not enough since we care about keys and values; and not the order they are stored.
We need a deterministic method to translate between stored and computed value. We want to produce the same internal representation every time no matter the order of the input key value.
An example better explains:
name = "David Holtz"
age = 26
Should be equivalent to
age = 26
name = "David Holtz"
So in order to make these equivalent we need to add new rules to the de/serialization process. These rules can be thought of as part of this greater specification.
All values must be sorted. Not just the top level but a full deep sort. Sorting the values ensures that we are consistently representing the data after we read it into memory. A deep sort ensures that the values are represented in a deterministic way even if they are arrays or objects.
Lastly we output our deterministic representation that is sorted, spaced and comment free TOML, as a byte string to our hasher. This system produces deterministic hash of the file. It is not sensitive changes in spacing, comments or alternate ordering of key values pairs. Woop 🙌
Now that we have a way to generate consistent representation and hashes we can structure these into a larger document.
The document structure follows a simple header, body schema and for simplicity we will also represent this as a TOML file.
The graphic shows the header in red and the body in blue. In the actual file the header is title metadata and the body are incremented verifiables.
this is the header area[verifiable.1]
this is part of the body[verifiable.2]
this is another part of the body
Only the header needs to be specified using the metadata tag, and the rest of the file is the body. The body consists of verifiables — which are a list of any assertion represented in TOML.
Generating provable documents
Using this document structure we’ve built up to, we’ll want to generate provable versions that we can share with end users/customers/friends!
In order to do this, well create a document. Well add issuing data: from, to, when and other important stuff. Next well add some verifiables, we’ll read each part of the document and deterministically serialize each piece. Using the serialized values we’ll create a Merkle tree and calculate the root. Lastly we’ll calculate proofs for each leaf and store all of the values, proofs and hashes in a single file.
Yay! We’ve created a Trust-Root document proof. This can be shared with someone who can selectively disclose parts of the document.
Verifying concerns and anchor trust
All a verifier needs is the root of the document and the piece (leaf, proof, data) they want to verify.
The prover will provide the document — as they are attempting to prove something, and as the verifier you only want to believe claims if they are verified (not tampered) and that they are vouched for by a trustable body.
The issuer of document is responsible for providing public access to the roots of the documents they’ve issued.
This should be provided by a sufficiently secure method, like writing to a public blockchain or via a trusted HTTPS domain.
A verification process might look like
- Receive document proof with header piece and prover selected verifiables
- Verifier uses unverified data in header to determine where to look for roots (some url)
- Verifier visits url and finds the root for the person/document in question.
- Assuming they trust the HTTPS domain/root provider they can use this to verify the data
- Verify that the header is true
- Verify all verifiables using the root and provided proofs
- Attempt to reconstruct full tree — to check if this is the full document
- Return results of verification and if the data represents the full document.
the above process represents a successful process — although there are multiple points where verification can fail if any values are tampered with after the original proof was constructed.
Linking back to Identity
Don’t worry we haven’t forgotten about Digital Identities. They are what make all of this work.
TrustRoot’s trust all hinges on sharing “roots”. Roots are used to verify data in the tree and should be hosted in a trusted way. Above we mention using a HTTPS domain as a way to provide trusted access, however we can do much better with a solid DID framework.
Instead of deriving trust via certificate authorities (how HTTPS works and why the little lock is in your browser) we can derive trust from a public/private keypair owned by a distributed identity.
So if you’ve gotten this far and grok the importance/power of a open source verifiable data structure, you might start to question.
Whats special about TrustRoot? Can’t we use traditional PKI to sign credentials, and then host our public key? Isn’t this the same concept?
There are fundamental differences between TrustRoot and PKI. The following sections will cover those differences — why TrustRoot’s use of Merkle trees give us better promises and how we can avoid keys in general.
Overlapping trees and shared leaves
Overlapping trees and shared leaves, are a special case that have unique implications.
If two trees share one or more leaves — it essentially doubles the trust in the shared leaves.
While the burden of trust is placed on the verifiers personal trust in the root publisher, shared leaves give us an opportunity to increase trust though an unlimited number of claim supporters.
Think about this in the case of a School issuing a trusted document to a student. Let’s say the document is a transcript and its represented as a collection of w3c standard verifiable credentials.
The issuer of these credentials will create a TrustRoot document and proof. They will share the document and publish the root.
Now a student can share there credentials as they please — and the verifier only needs to refer to a short root to verifier the claims.
This immediately looks like it depends on a central body since you need to contact the issuer for the root — but in reality there is no limitation on root publishers.
Lets say the one of the teachers wants to support a claim, all they need to do is publish a root that contains the leaf they want to support. They can do this anyway they want — hosting some complex root server themselves — or just posting the root in a tweet.
However they publicly share the root, it has the same implications. A verifier can verify the integrity of a claim in a distributed way. What if all teachers supported classes on transcripts? What if students supported each others claims?
In the end — the “truthfulness” of a claim is based on if the verifier trust the person who is making the assertion. If the School is not available, can you trust a handful of teachers instead?
TrustRoots data integrity is not based on any keys or “hidden” data. It is transparent and the data is verified by it’s actual content, and not a signature around it.
Trust is based on roots, and if you trust the publisher of those roots.
Nonambiguous support — Content identifiable
Collision free hash functions, like SHA256 guarantee a unique hash for anything we throw at it.
This means when showing support for a root, we can ensure that we are only supporting specific (immutable) claims.
Since roots are dependent on the contents of the data. Any change to the underlying claims will return a different root.
This means when we support by publishing a root, we are supporting a unchangeable claim. The issuer cannot change the data without invaliding our support.
Its hard to see how useful this is, but lets look at traditional PKI for a comparison. In a traditional setup an issuer would have a public private keypair. They would issue claims and sign them with their private key. They also host their public key so verifier can validate claims.
A verifier needs the public key to check any claim signed by that issuer, this is the same situation with issuers publishing roots as mentioned above.
However for roots we proposed a distributed solution, allowing arbitrary supports to host roots containing shared leaves.
Lets apply this to PKI, we distribute the issuer’s public key to the teachers who want to support students claims. This poses an issue, teachers now are vouching for anything that the issuers signs with their key… what value does that add? Also this does not create any direct connection between the teacher and the claim they are validating. The teacher has just ambiguously supported the issuers claims.
TrustRoot uses content as entropy for verification. PKI uses “private” keys.
No keys — Plug and play
Barrier to entry is too often a killer of new cryptographic projects. There is often a need for some specialized soft or hardware that execute some complex set of commands to reach “trust”.
Most often the need for these technologies is based on securing key-pairs. Wallets are glorified key-stores and your “identity” is often you proving that you know a private key.
TrustRoot, while we love keys — is not based on key-pairs. As noted above, verification is based on content. This is a huge advantage because it removes practically the whole barrier to entry.
An issuer just needs to share a short string in any trusted way. This can easily be their existing HTTPS website or engraved in a rock in front of their building. Roots can be printed out and stored on paper — sent through the mail. Whatever trust means to you — you can send it that way.
It also means that supporters can also publish roots how every they want. One of our most powerful digital identities, is a person’s track records on social media.
Years of social media is harder to fake then most other forms of ID. So you can use just that, a support can post roots on their social media, on their blog or outsource it to a 3rd party.
Removing keys from the mix, removes a lot of complexity and need for super secure devices (hardware bitcoin wallets).
While keys are great, and play well with TrustRoot. We do not rely on them and are essentially plug and play.
A working example
Now that we’ve introduce TrustRoot — the start to a open specification for packaging and verifying trusted documents. These documents are keyless, verifiable, can store arbitrary data and lend themselves to distributed trust. There is no dependency on key-pairs or blockchains, and TrustRoot is practically read to plug and play today.
Lets make it real!
The first thing we want to showcase is deterministic serialization. Remember this is the concept of a consistent outcome from “consistent” input.
In our case consistent refers to consistent key value pairs. We can include lists, dictionaries and anything else supported by the TOML spec. Before we hash the bytes, we do a deep sort to order all of the TOML values. This ensures that the representation of the values are kept but nothing else.
In the following GIF, the top box represents the users input, and the bottom box is the pre hashed output. On the bottom, you’ll find the final hashed value.
We’ll make some changes (add comments, change ordering) and see what impact this makes on the hash.
Its apparent that the serialization is deterministic and the hash is only impacted by content changes.
Next we’ll want to consolidate many of these values into a fully tree. Each hash represents the label of a node — we just need a way to construct the tree from them.
Building TrustRoot Documents and generating proofs
Just like individual leaves, the ability to reduce an item into a deterministic hash is based on the input. In order to create consistent hashes, we need to specify a simple format. We have a header (the metadata block) and then we have N number of verifiables. As long as we have a header, and prefix the verifiables TrustRoot will be able to deterministically serialize each part individually and then the whole document.
Following the outlined format of a TrustRoot Document, we can enter our “Document” to the panel on the left. We can see the header, which has important data, generally related to who, what and when.
The following blocks are just like the verifiable we created above. It is any data that we want, encoded in TOML.
On the right hand side we have the the auto generated proof. This by default contains a proof to each leaf of the tree, but again — the prover is able to disclose any number of these (and they can be verified).
Since we can serialize anything, we are not constrained to a single data format. We can mix and match the data formats we want to verify.
In the case of the example, we have one verifiable that is a bespoke credential, while the second verifiable follows the w3c verifiable credential!
Since TrustRoot is flexible we can build in existing standards like VC’s to best integrate with future tools.
Now that we have a way to easily create documents and proofs, we’ll need just as easy of a way to verify them.
This last page — allows us to input a document, either as a file upload, copy and paste or a mock API call.
On the right hand side we see a card for each section of the document and its proof. Each card contains the original claim, it’s deterministic hash and the proof from root to leaf.
On the left hand side the user can see in a step by step manner if their root, path, leaf combo is verified. They also can see if the provided information rebuilds the same tree — if so the verifier can be confident they are receiving the whole document.
In the GIF you’ll see me copy and paste the words in the dark popup into the proof input. This short string is all you need to verify a single claim!
Overall this app allows us to explore all the parts of TrustRoot. We learn about deterministic serialization, how to build provable documents and lastly verifying them with no specialized software/app.
The demo doesn't render well in this article — open the sandbox to check it out (in alpha, expect bugs)
TrustRoot and its components are new and growing concepts. As I explore more data models and the ways they can used for digital trust or digital identity i’ll continue to update TrustRoot and it’s spec.
All comments, and questions are welcome! Help me make TrustRoot a secure open source tool.
😀 Thanks for reading!