URL Encode
& Decode
Encode special characters for safe URL usage or decode percent-encoded URLs. Results appear instantly as you type.
Master URL Encoding & Decoding
Learn how URL encoding works and when to use it for safe, standards-compliant web development.
What Is URL Encoding?
URL encoding (percent-encoding) converts special characters into a %XX format that can be safely transmitted in URLs. Space becomes %20, # becomes %23. Defined by RFC 3986, it ensures browsers and servers interpret URLs accurately.
Characters That Need Encoding
- Reserved chars: ! # $ & ' ( ) * + , / : ; = ? @ [ ]
- Non-ASCII chars: Japanese, Chinese, Arabic, and all Unicode chars
- Spaces: %20 (or + in form data)
- Control chars: Tabs, newlines, and other control characters
encodeURI vs encodeURIComponent
In JavaScript, encodeURI() encodes an entire URL (preserving /, ?, and #), while encodeURIComponent() encodes individual parameter values (converting all special chars). Always use encodeURIComponent() for query parameter values.
URL Encoding Examples
Original
Hello WorldEncoded
Hello%20WorldOriginal
café & teaEncoded
caf%C3%A9%20%26%20teaOriginal
search?q=aiEncoded
search%3Fq%3DaiOriginal
price=100$Encoded
price%3D100%24Original
東京Encoded
%E6%9D%B1%E4%BA%ACReal-World Use Cases
URL encoding is required everywhere in web development — building API query parameters, submitting form data, embedding redirect URLs, creating email links. Encoding mistakes are a common source of hard-to-debug bugs.
Security & URL Encoding
URL encoding is crucial for preventing XSS (Cross-Site Scripting) attacks. Always encode user input before embedding in URLs. Be aware of double-encoding attacks (%2520) and ensure consistent encoding/decoding throughout your stack.
Internationalized Domain Names
Japanese and Chinese domain names (e.g., 例え.jp) are converted to Punycode (xn--r8jz45g.jp) for DNS processing. IDNs look friendly but may display as Punycode in some tools. Always test IDN URLs across browsers.
API Development
Encode query params to safely build API requests
Form Submission
application/x-www-form-urlencoded encoding for HTML forms
Redirect URLs
Safely embed OAuth/SSO callback URLs in redirect parameters
Email Links
Encode special characters in email links to prevent breakage
Everything About URL Encoding
From RFC 3986 fundamentals to practical implementation across modern web stacks.
What is URL Encoding and Why Does It Exist?
URL encoding, formally called percent-encoding, is a mechanism for transmitting data inside a Uniform Resource Identifier (URI) when that data contains characters that would otherwise be misinterpreted by the URI grammar. The standard governing how URLs are constructed, RFC 3986, defines a limited set of characters that are safe to appear unescaped: the letters A through Z, digits 0 through 9, and a handful of punctuation marks like hyphen, underscore, period, and tilde. Anything outside that allowed alphabet, including spaces, accented letters, ideographic characters from Asian scripts, emojis, and most punctuation, must be converted into its percent-encoded form before it can travel through HTTP, be stored in a browser address bar, or be logged by a server.
The mechanic is deliberately simple. Each character that needs encoding is first represented as its byte sequence under UTF-8, and then each byte is rewritten as a percent sign followed by two hexadecimal digits. A literal space becomes %20 because the ASCII value of a space is decimal 32, hex 20. The Japanese character for east, 東, encodes to %E6%9D%B1 because UTF-8 represents it across three bytes. Because percent-encoding operates byte by byte rather than character by character, any binary data, in any encoding, can be safely shipped through a URL as long as the producer and consumer agree on the underlying character set.
The standard predates the modern web by more than a decade. Early specifications for FTP, Gopher, and electronic mail required a way to embed arbitrary text inside short identifiers, and the percent-encoding idea was lifted into RFC 1738 in 1994, then refined in RFC 2396 and finalized in RFC 3986 in 2005. Today it is the bedrock of nearly every web interaction. Every link you click, every form you submit, every API request your phone fires off in the background passes data through this encoding layer. Understanding it well is one of the highest-leverage pieces of knowledge a web developer can have because encoding errors silently corrupt data, break analytics, expose security holes, and produce 404 errors that look identical to genuinely missing resources.
How Percent-Encoding Works Step by Step
Classify Each Character
The encoder walks the input string one character at a time and asks whether each character belongs to the unreserved set (letters, digits, hyphen, underscore, period, tilde). Unreserved characters pass through unchanged because they cannot be confused with URL delimiters. Reserved characters and any non-ASCII characters are flagged for transformation.
Convert to UTF-8 Bytes
For characters outside the ASCII range, the encoder first serializes them into UTF-8. The character café contains an é which is two bytes in UTF-8 (0xC3 0xA9), while the emoji rocket fragments into a four-byte sequence. Sticking with UTF-8 is critical because legacy encodings like Latin-1 produce ambiguous results that servers cannot decode reliably.
Hex Encode Each Byte
Every byte that needs escaping is rewritten as a percent sign followed by exactly two uppercase hexadecimal digits. The byte 0x20 (space) becomes %20, byte 0xC3 becomes %C3, and byte 0xA9 becomes %A9. Uppercase digits are the convention recommended by RFC 3986 although decoders must accept both upper and lower case to remain interoperable.
Concatenate the Result
The encoder stitches the percent-encoded fragments back together with the unreserved characters in their original positions, producing a final string that contains only ASCII characters from a tiny safe alphabet. The output is now suitable for embedding in any URL component, transmitting over HTTP without content-type negotiation, and storing in plain-text logs.
Decoding Reverses the Steps
A decoder scans for percent signs, reads the next two characters as hex digits, and reassembles the original byte sequence. Those bytes are then reinterpreted as UTF-8 to recover the original characters. The operation is exactly lossless when both sides agree on UTF-8; misaligned encodings produce mojibake or replacement characters where data once was.
Component Boundaries Matter
A URL is parsed into scheme, host, path, query, and fragment before encoding rules apply to each component independently. A slash is allowed inside the path but must be encoded inside a query value. A question mark separates path from query at the top level but must be encoded if it appears inside a path segment. Respecting these boundaries is what separates correct from broken encoding.
Where You Will Use URL Encoding
Constructing API Query Strings
When your client code assembles a request URL programmatically, every parameter value must be percent-encoded before it is appended after the question mark. A search term like "blue & green" would break the query string if appended verbatim because the ampersand starts a new parameter. Encoding it to blue%20%26%20green preserves the literal value. Modern HTTP libraries handle this automatically when you pass parameters as a dictionary or object, but when you manually concatenate strings, this tool helps verify the result matches what the server expects.
Submitting HTML Form Data
Forms with method="GET" serialize their fields into the URL after the action attribute, and forms with method="POST" using the default application/x-www-form-urlencoded content type encode their bodies the same way. Spaces are conventionally replaced with the plus sign rather than %20 in this context, a quirk inherited from the original CGI specification. When debugging unexpected server-side values, paste the raw form body into the decoder above to see exactly what the server received before parsing.
Building Redirect and Callback URLs
OAuth, SAML, and single-sign-on flows pass return URLs inside query parameters of the identity provider URL. Because the return URL itself contains schemes, slashes, and its own query parameters, it must be encoded as a single opaque value to avoid being interpreted as part of the outer URL. Failing to encode the redirect parameter is one of the most common causes of "invalid redirect_uri" errors during OAuth integration testing.
Generating Shareable Deep Links
Apps that generate share URLs containing user-supplied titles, captions, or pre-filled message bodies must encode those values before injecting them into the link template. Twitter, Facebook, WhatsApp, and email share buttons all rely on this pattern. Without encoding, a hashtag inside a tweet body silently truncates the URL at the first # character, and an ampersand in a subject line spills over into adjacent parameters.
Storing Search Filters in URLs
Single-page applications often serialize filter state into the URL so that users can bookmark or share specific views. Multi-select filters, sort orders, page numbers, and date ranges all need to round-trip cleanly through the address bar. Percent-encoding the serialized state guarantees that special characters in user-typed search terms cannot collide with the query-string syntax and corrupt the application state on reload.
Logging and Analytics Tracking
UTM parameters appended to marketing links carry campaign names, source identifiers, and creative tags. Marketing teams frequently use spaces, parentheses, and accented characters in their campaign labels without realizing those values must be encoded. The result is analytics dashboards that under-count traffic because identical campaigns appear under multiple encoded variants. Standardizing on this tool during campaign setup prevents the inconsistency.
Reserved Characters, Unreserved Characters, and the Gray Zone
RFC 3986 splits the printable ASCII character set into three buckets that determine encoding behavior. The unreserved set contains the 66 characters that may appear in any URL component without ever being escaped: the 26 uppercase letters, the 26 lowercase letters, the 10 digits, and the four marks hyphen, underscore, period, and tilde. These are guaranteed safe and an encoder must not escape them, because percent-encoding an unreserved character produces a URL that is not equivalent under canonical comparison even though most clients treat it the same. Producing %41 instead of A is technically valid but considered incorrect.
The reserved set contains characters that carry structural meaning at certain positions in a URL. Generic delimiters like colon, slash, question mark, hash, square brackets, and at-sign separate top-level components and must be encoded when they appear inside a component as data rather than as a separator. Sub-delimiters such as exclamation point, dollar sign, ampersand, apostrophe, parentheses, asterisk, plus, comma, semicolon, and equals sign have application-specific meaning, most notably in query strings where ampersand separates pairs and equals separates keys from values. A reserved character may be left unescaped if it serves its structural purpose at that position, or it may be encoded to disambiguate.
Everything else, sometimes called the unsafe set, must always be percent-encoded. This bucket includes the space, the angle brackets less than and greater than, the double quote, the curly braces, the vertical bar, the backslash, the caret, the backtick, and all control characters below ASCII 0x20. It also includes every byte above 0x7F, which is where non-Latin scripts and emojis live. The space is particularly tricky because three different conventions exist: %20 in path and query, plus in form data, and %20 in fragments. Encoders aimed at general URLs should output %20 because it is universally accepted, while form-specific encoders use plus.
The tilde character has an unusual history. RFC 1738 listed it as unsafe, but RFC 3986 promoted it to the unreserved set because real-world systems treated it as safe anyway. As a result, older code may still encode tilde to %7E unnecessarily. Both forms are valid and decode identically, so do not be surprised when comparing the output of two encoders that one is more conservative than the other. The same is true for asterisk, parenthesis, and apostrophe, which JavaScript's encodeURIComponent leaves alone but RFC 3986 permits encoding for safety inside contexts like email mailto links where they have additional meaning.
Pro Tips for Encoding Without Bugs
- Always encode at the latest possible moment, when the value is about to be injected into a URL template, never when it is being stored or passed between functions. Storing already-encoded values invites double-encoding bugs because the next function in the pipeline cannot tell the difference between a literal percent sign in the user input and an encoded sequence.
- Pick the right primitive for each layer. JavaScript exposes both encodeURI and encodeURIComponent and they are not interchangeable. Use encodeURI when you have a whole URL string that should be made transport-safe without breaking its structural slashes and question marks, and use encodeURIComponent when you have a single value going into a single position inside a larger URL you are building piece by piece.
- Decode exactly once. When a URL travels through multiple proxies, frameworks, or middleware, each layer may helpfully decode incoming values. If your application code also decodes, the result is a corruption that produces literal percent sequences inside what should be plain text. Read the documentation of every layer that touches the URL and decode only at the boundary that has not yet done it.
- When testing internationalization, paste real Unicode strings from your target locales into both the encoder and the decoder. Round-trip them through your application stack and confirm the bytes match before and after. A surprising number of legacy systems silently strip or substitute non-ASCII characters in URLs, which produces broken links for international users that domestic QA teams never catch.
- Treat URL encoding as a security control, not just a serialization detail. Encoding user input before it enters a URL prevents an entire class of cross-site scripting attacks where attackers craft input designed to look like URL syntax. Pair encoding with strict allow-lists of expected parameter shapes on the server, and reject any request whose decoded values fall outside the expected character set.
Common URL Encoding Mistakes
Double Encoding the Same Value
The most common mistake is calling the encoder on a value that is already encoded. A space that should be %20 becomes %2520 because the percent sign of %20 gets re-encoded to %25. The result still passes through the network but the server-side decoder returns the literal string "%20" instead of a space. Debug this by inspecting the URL before and after each transformation and confirming each component is encoded exactly once. When a value has to traverse multiple frameworks, document explicitly which layer owns the encoding step.
Using encodeURI Where encodeURIComponent Belongs
JavaScript developers regularly reach for encodeURI when they need encodeURIComponent. The difference matters: encodeURI leaves ampersand, equals, question mark, slash, and hash unescaped because it assumes the input is a complete URL. If you pass a single parameter value through encodeURI, any of those structural characters inside the value will be misinterpreted by the server. Always pick encodeURIComponent for individual values and reserve encodeURI for the rare case where you are sanitizing an entire URL string.
Mixing Plus and %20 for Spaces
Form submissions encode spaces as plus signs while general URLs use %20. If your application reads query parameters with one convention and writes them with another, spaces silently turn into plus signs or vice versa, leading to mismatched cache keys, broken search analytics, and link rot. Standardize on one convention per layer of your stack and never use raw string concatenation to build query strings, since manual concatenation makes it almost impossible to apply the right rule consistently.
Encoding Characters That Should Stay Reserved
Over-zealous encoders that escape every non-letter break URL syntax by encoding the slashes inside paths and the equals signs inside query strings. The receiving server sees a path with no separators and treats the entire URL as one opaque segment that does not match any route. Trust the primitive: encodeURIComponent escapes everything except unreserved characters, which is correct for individual values, but the result must be inserted between the structural characters of the URL, never used as a replacement for the entire URL.
Forgetting About the Fragment Identifier
The hash fragment after the pound sign is a separate component with its own encoding rules. Browsers never send the fragment to the server, but client-side routers and single-page applications rely on it for navigation state. Many developers forget to encode user-supplied content that ends up in the fragment, causing routes to break when the content includes spaces, slashes, or hash signs. Apply the same encodeURIComponent treatment to fragment values that you apply to query parameters.
URL Encoding Across Languages and Frameworks
Every popular programming language ships with built-in URL encoding utilities, but the defaults and edge cases vary in ways that matter. In JavaScript, encodeURIComponent is the workhorse for parameter values and leaves the marks hyphen, underscore, period, tilde, exclamation, asterisk, apostrophe, and parenthesis unescaped. In Python, urllib.parse.quote with safe="" produces nearly identical output, but the default safe parameter leaves slash unescaped which is rarely what you want for individual values. PHP exposes both urlencode and rawurlencode, where urlencode follows the form-data convention of plus signs for spaces and rawurlencode follows RFC 3986. Java's URLEncoder is the same form-data variant and requires an explicit UTF-8 charset argument to avoid platform-default encoding pitfalls. Ruby on Rails uses CGI.escape and ERB::Util.url_encode for two slightly different conventions.
Go's net/url package provides both QueryEscape for query parameters and PathEscape for path segments, and the distinction is genuinely useful because path segments allow more characters unescaped than query values do. C# offers HttpUtility.UrlEncode for legacy ASP.NET compatibility and Uri.EscapeDataString for RFC 3986 conformance, with the latter being the correct choice for new code. Rust developers typically reach for the url crate or the percent-encoding crate, both of which expose granular control over which character sets are considered safe in which contexts. The variation across languages means that round-tripping a value from one stack to another can produce subtly different bytes for the same input, which matters when computing cache keys, signing requests cryptographically, or comparing URLs for equality.
A particularly thorny case is signed URL construction for cloud services like AWS S3, Google Cloud Storage, and Azure Blob Storage. These services compute a cryptographic signature over the canonical request, which includes the URL-encoded query string. If the client encodes the parameters with one convention and the server canonicalizes them with another, the signatures will not match and the request is rejected as unauthorized. AWS Signature Version 4 specifies its own canonical encoding rules that differ from RFC 3986 in subtle ways: spaces become %20, plus becomes %2B, asterisk becomes %2A, and unreserved characters per RFC 3986 are left alone. Every official SDK implements these rules correctly, but hand-rolled signers frequently get them wrong. The tool above accepts arbitrary input and shows you exactly what bytes come out, which is invaluable when debugging signature mismatches by comparing your client output against the AWS canonical request string.
Modern URL parsing has converged around the WHATWG URL Standard, which the URL constructor in browsers and the URL class in Node.js both implement. This standard supersedes RFC 3986 with a more permissive and browser-aligned set of rules, including automatic normalization of hosts to lowercase, removal of default ports, and resolution of dot segments in paths. The encoding behavior is mostly compatible but differs in edge cases involving backslashes, leading and trailing whitespace, and ASCII control characters. When writing new code, prefer the URL constructor and the searchParams interface for query manipulation because they implement the standard correctly and handle the round-tripping details automatically. Use this tool when you need to inspect or override what the standard library does behind the scenes.
Key Takeaways
- Percent-encoding is a byte-level transformation defined by RFC 3986 that converts unsafe characters into a %XX hexadecimal form, ensuring URLs remain unambiguous as they flow through browsers, proxies, servers, and logs across any character set.
- Encode each component independently and at the latest possible moment. Use encodeURIComponent for individual values and reserve encodeURI for entire pre-built URLs. Decoding once, on the server side, is correct; decoding repeatedly produces double-encoding artifacts that silently corrupt data.
- Watch for the plus-versus-%20 split between form data and general URLs, the gray-zone characters whose encoding is optional, and language-specific defaults that may differ from RFC 3986. Always test your full encode-transport-decode pipeline with real international input.
- Treat URL encoding as a security boundary by validating decoded input against a strict allow-list of expected shapes on the server. Combine encoding with parameterized templates, never with raw string concatenation, to eliminate an entire class of injection and parameter-pollution attacks.
Frequently Asked Questions
What is URL encoding (percent-encoding)?
URL encoding converts characters that are not safe in a URL into a percent sign followed by hexadecimal digits (for example, a space becomes %20). Defined by RFC 3986, it ensures browsers and servers interpret URLs correctly.
What is the difference between encode and decode?
Encoding turns a normal string into the safe %XX form, while decoding does the reverse, converting %XX sequences back into the original readable text. This tool supports both directions.
Why do query parameters need URL encoding?
When a parameter value contains characters like spaces, &, =, or non-ASCII text, they can be misread as URL delimiters. Encoding them ensures the parameter is transmitted correctly and prevents broken links and bugs.
Is URL encoding reversible?
Yes. URL encoding is a fully reversible transformation. A correctly encoded string can be decoded back to its exact original value without any loss.
Is my data uploaded to a server?
No. All conversion happens entirely in your browser (client-side). The text you enter is never sent to or stored on any server.