A brief and inaccurate history of application security, for learning purposes.

Facebook and Google's Gmail are used as example client and server respectively, but many companies both serve and consume OAuth.

In the beginning, there were usernames. You connected to a computer (either with a keyboard or a network) and this created a session, where the computer (server) sets aside some working memory for your use. Unfortunately, once someone found out your username, they could do things you didn't want them to do. So a public identifier needs an associated secret (a password).
Unfortunately, password databases were leaked, and the passwords stolen. So hashing was introduced, where even if the database leaked, the passwords could not be known. (A one-way hash function takes a password and reliably produces the same "hash"; there's no easy way to go the other way, from a hash to a password. You can hash the user's input, and if it is the same as the stored hash, it must have been the right password).
Unfortunately, rainbow tables were invented, a huge list of all possible passwords and their hashes. So salts were used, where extra data was combined with passwords before hashing to make rainbow tables not work.
The internet was invented. Users logged in with passwords to create sessions. Unfortunately, Internet Service Providers and other intermediate networks proved untrustworthy, so HTTPS was introduced. Servers were required to identify themselves with a certificate signed by a trusted company such as Thawte or Verisign, and the connection was encrypted. Browsers are distributed with all the certificates of the trusted companies, known as Certfication Authorities; a signature made with the CA's private key (known only to a very carefully guarded server) can be verified using their public key.
Unfortunately, users didn't like typing their password into every page. Later, dodgy javascript would also fake login pages. A session identifier needed to be stored in the browser so that the user didn't have to type their password into every page. Cookies were invented to store the session identifier.
Javascript was invented, and was able to read the cookies. Unfortunately, third-party javascript on web pages started stealing cookies.
Secure cookies were invented that were not readable by JavaScript and were only allowed to be sent to the server that created them. This was good.
Unfortunately, sites such as Facebook wanted to read data from sites such as Gmail. They asked for users' passwords, and unfortunately some sites stole the users' accounts. This created a lot of work for the gmail account recovery team (and other problems). So the idea of access tokens was invented, where the user could give Facebook a Gmail access token instead of their Google password. An example access token might be FACEMAILACCESS303.
Unfortunately, some sites did dodgy things with their access tokens. So the idea of scopes was invented, where access tokens have restricted permissions. So, for example, the user could give Facebook an access token that can read Gmail contacts but not delete (Myspace) emails.
Unfortunately, users didn't like manually issuing access tokens. So the idea of browser redirects with codes was invented, where Facebook would send the user to Google and then Google would ask the user if they wanted to allow access and then redirect back to Facebook with the access token in the url.
Unfortunately, sites such as Fakebook would pretend to be Facebook and would ask to be redirected back to them instead. So OAuth was invented, where sites would register for OAuth and get a client ID, such as FACE01, and that client ID would only work for certain redirect URLs (www.facebook.com).
Unfortunately, Fakebook could use Facebook's client ID. So the idea of a client secret was invented, which meant that access tokens would only work if the client secret was also provided, and Facebook would only use the secret to prove its identity to Google (or similar). The client secret is Facebook's Google password, for example, FACESECRET.
Unfortunately, it was not safe to expose the client secret to the browser. So the idea of an authorization code was invented, which could only be used a single time to create a access token. The way this works is that Facebook asks the user to go to Google to create an authorization code (such as FACEAUTH303) for the Facebook client ID (FACE01), then Facebook is given that code and uses it with its secret to create a Google access token: sending FACEAUTH303 (FACESECRET) to Google might yield the access token FACEMAILACCESS303.
Unfortunately, lots of sites leaked their access tokens. So the idea of a refresh token was introduced, so that losing an access token was less of a problem. The refresh token can only be used to create access tokens, and is not used to make normal requests, so it is less exposed. The access tokens that leak can then expire after a few minutes or hours, and the refresh token is used to create more access tokens without having to prompt the user again.
Then, mobile apps were invented. They received their authorization codes via redirects between apps, which were unfortunately controlled by the apps themselves (not the server providing the redirect url). So the client secret couldn't help, because it would leak the first time it was used. Instead, the idea of PKCE ("Pixie") was introduced, where a new secret would be different for every authorization code -- it would be generated by Facebook, then the hash of it sent, then when Facebook wanted to redeem its authorization code, they would provide the unhashed PKCE code. This makes intercepting authorization codes on their way back to the requester pointless.
Unfortunately, our story stops there because we've run out of history. Welcome to the present!

Takeaways

Use secrets to prove identity so that you don't have to worry about fraudulent connections.
Use https everywhere so you don't have to worry about insecure networks
Only send client secrets to the site that issued them so that you don't have to worry about them leaking.
Only store hashes of salted passwords so that read access to the password database doesn't have to be protected.
Protect secrets travelling through insecure clients (browsers or apps) so that you don't have to worry about them leaking.
Keep tokens away from JavaScript so you don't have to worry about XSS attacks.
There seems to be some overlap between authorization codes and refresh tokens. They are both used to protect access tokens. Both of them are (or can be) single-use-only. Both of them (or can be made to) only work if presented with the client secret. Confusingly, authorization codes protect refresh tokens which protect access tokens. Authorization codes seem to be more about prompting the user and validating client secrets, whereas refresh tokens are just about short-lifing access tokens. But some flows will be able to combine these into a single token (e.g. refresh tokens requiring client secrets to work).
Use refresh tokens so you can worry less about untrustworthy browsers and apps leaking client secrets and access tokens
Use authorization codes so you know the user has been asked whether they want to grant a permission
Use PKCE so that you don't have to worry about the redirect process being insecure
OAuth allows websites to exchange fine-grained user permissions. For example, it can allow Facebook to read your Gmail contact list but not read your emails.
Prevent refresh tokens from getting stolen and reused by making them single-use. If a refresh token does get stolen and reused, make the user authorize again and stop the stolen one (and possibly its access tokens) from working. For example, if Facebook asked the user to create a second Google refresh token (via an authorization code), the first one would stop working. (The user is be triggered to authorize again because the thief has made the user's copy of the refresh token invalid -- when the thief creates an access token, the server also issues an updated refresh token)

Quite.

Monday, July 20, 2020

web auth by mistake

A brief and inaccurate history of application security, for learning purposes.

Takeaways