
1 // 2 // Copyright (c) 2006, Brian Frank and Andy Frank 3 // Licensed under the Academic Free License version 3.0 4 // 5 // History: 6 // 28 Jun 06 Brian Frank Creation 7 // 8 9 ** 10 ** Uri is used to immutably represent a Universal Resource Identifier 11 ** according to [RFC 3986]`http://tools.ietf.org/html/rfc3986`. 12 ** The generic format for a URI is: 13 ** 14 ** <uri> := [<scheme> ":"] <body> 15 ** <body> := ["//" <auth>] ["/" <path>] ["?" <query>] ["#" <frag>] 16 ** <auth> := [<userInfo> "@"] <host> [":" <port>] 17 ** <path> := <name> ("/" <name>)* 18 ** <name> := <basename> ["." <ext>] 19 ** <query> := <queryPair> (<querySep> <queryPair>)* 20 ** <querySep> := "&" | ";" 21 ** <queryPair> := <queryKey> ["=" <queryVal>] 22 ** 23 ** Uris are expressed in either encoded or decoded form. In encoded 24 ** form RFC 3986 defines a strict set of rules for the characters 25 ** allowed in each section of the URI (scheme, userInfo, host, path, 26 ** query, and fragment). Any character outside of the allowed set is 27 ** UTF-8 encoded into octets and '%HH' percent encoded. 28 ** 29 ** In decoded form the full range of Unicode characters is allowed in all 30 ** sections except the general delimiters which separate sections. For 31 ** example '?' is barred in any section before the query, buf is permissible 32 ** in the query string itself or the fragment identifier. The scheme must 33 ** be strictly defined in terms of ASCII alphanumeric, ".", "+", or "-". 34 ** 35 ** The Uri API is designed to work with the decoded format of the Uri. 36 ** Access methods like `host`, `pathStr`, or `queryStr` all return the 37 ** decoded format of the URI. To summarize different ways of working 38 ** with Uri: 39 ** - `Uri.fromStr`: parses a string from its decoded format 40 ** - `Uri.decode`: parses a string from percent encoded format 41 ** - `Uri.encode`: translate into percent encoded format 42 ** 43 ** Uri can be used to model either absolute URIs or relative references. 44 ** The `plus` and `minus` shortcut operators can be used to resolve and 45 ** relativize relative references against a base URI. 46 ** 47 const final class Uri 48 { 49 50 ////////////////////////////////////////////////////////////////////////// 51 // Constructor 52 ////////////////////////////////////////////////////////////////////////// 53 54 ** 55 ** Parse the specified string into a Uri. Throw ParseErr 56 ** if the string is a malformed URI. This method parses 57 ** an decoded Unicode string into its generic parts. 58 ** It does not unescape '%' or '+' and handles normal Unicode 59 ** characters in the string. 60 ** 61 ** All Uris are automatically normalized as follows: 62 ** - Replacing "." and ".." segments in the middle of a path 63 ** - Scheme always normalizes to lowercase 64 ** - If http then port 80 normalizes to null 65 ** - If http then a null path normalizes to / 66 ** 67 static Uri fromStr(Str s) 68 69 ** 70 ** Parse an ASCII percent encoded string into a Uri according to 71 ** RFC 3986. All '%HH' escape sequences are translated into octects, 72 ** and then the octect sequence is UTF-8 decoded into a Str. The '+' 73 ** character in the query section is unescaped into a space. 74 ** ParseErr if the string is a malformed URI or if not encoded 75 ** correctly. Refer to `fromStr` for normalization rules. 76 ** 77 static Uri decode(Str s) 78 79 ////////////////////////////////////////////////////////////////////////// 80 // Utils 81 ////////////////////////////////////////////////////////////////////////// 82 83 ** 84 ** Decode a map of query parameters which are URL encoded according 85 ** to the "application/x-www-form-urlencoded" MIME type. This method 86 ** will unescape '%' percent encoding and '+' into space. The parameters 87 ** are parsed into map using the same semantics as `Uri.query`. Throw 88 ** ArgErr is the string is malformed. See `encodeQuery`. 89 ** 90 static Str:Str decodeQuery(Str s) 91 92 ** 93 ** Encode a map of query parameters into URL percent encoding 94 ** according to the "application/x-www-form-urlencoded" MIME type. 95 ** See `decodeQuery`. 96 ** 97 static Str encodeQuery(Str:Str q) 98 99 ** 100 ** Return if the specified string is an valid name segment to 101 ** use in an unencoded URI. The name must be at least one char 102 ** long and can never be "." or "..". The legal characters are 103 ** defined by as follows from RFC 3986: 104 ** 105 ** unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" 106 ** ALPHA = %x41-5A / %x61-7A ; A-Z / a-z 107 ** DIGIT = %x30-39 ; 0-9 108 ** 109 ** Although RFC 3986 does allow path segments to contain other 110 ** special characters such as 'sub-delims', Fan takes a strict 111 ** approach to names to be used in URIs. 112 ** 113 static Bool isName(Str name) 114 115 ** 116 ** If the specified string is not a valid name according 117 ** to the `isName` method, then throw `NameErr`. 118 ** 119 static Void checkName(Str name) 120 121 ////////////////////////////////////////////////////////////////////////// 122 // Identity 123 ////////////////////////////////////////////////////////////////////////// 124 125 ** 126 ** Two Uris are equal if they have same string normalized representation. 127 ** 128 override Bool equals(Obj that) 129 130 ** 131 ** Return a hash code based on the normalized string representation. 132 ** 133 override Int hash() 134 135 ** 136 ** Return normalized string representation. 137 ** 138 override Str toStr() 139 140 ** 141 ** Return the percent encoded string for this Uri according to 142 ** RFC 3986. Each section of the Uri is UTF-8 encoded into octects 143 ** and then percent encoded according to its valid character set. 144 ** Spaces in the query section are encoded as '+'. 145 ** 146 Str encode() 147 148 ////////////////////////////////////////////////////////////////////////// 149 // Components 150 ////////////////////////////////////////////////////////////////////////// 151 152 ** 153 ** Return if an absolute Uri which means it has a nonnull scheme. 154 ** 155 Bool isAbs() 156 157 ** 158 ** Return if a relative Uri which means it has a null scheme. 159 ** 160 Bool isRel() 161 162 ** 163 ** A Uri represents a directory if it has a non-null path which 164 ** ends with a "/" slash. Directories are joined with other Uris 165 ** relative to themselves versus non-directories which are joined 166 ** relative to their parent. 167 ** 168 ** Examples: 169 ** `/a/b`.isDir -> false 170 ** `/a/b/`.isDir -> true 171 ** 172 Bool isDir() 173 174 ** 175 ** Return the scheme component or null if not absolute. The 176 ** scheme is always normalized into lowercase. 177 ** 178 ** Examples: 179 ** `http://foo/a/b/c`.scheme -> "http" 180 ** `HTTP://foo/a/b/c`.scheme -> "http" 181 ** `mailto:who@there.com`.scheme -> "mailto" 182 ** 183 Str scheme() 184 185 ** 186 ** The authority represents a network endpoint in the format: 187 ** [<userInfo> "@"] host [":" <port>] 188 ** 189 ** Examples: 190 ** `http://user@host:99/`.auth -> "user@host:99" 191 ** `http://host/`.auth -> "host" 192 ** `/dir/file.txt`.auth -> null 193 ** 194 Str auth() 195 196 ** 197 ** Return the host address of the URI or null if not available. The 198 ** host is in the format of a DNS name, IPv4 address, or IPv6 address 199 ** surrounded by square brackets. Return null if the uri is not 200 ** absolute. 201 ** 202 ** Examples: 203 ** `ftp://there:78/file`.host -> "there" 204 ** `http://www.cool.com/`.host -> "www.cool.com" 205 ** `http://user@10.162.255.4/index`.host -> "10.162.255.4" 206 ** `http://[::192.9.5.5]/`.host -> "[::192.9.5.5]" 207 ** `//foo/bar`.host -> "foo" 208 ** `/bar`.host -> null 209 ** 210 Str host() 211 212 ** 213 ** User info is string information embedded in the authority using 214 ** the "@" character. Its use is discouraged for security reasons. 215 ** 216 ** Examples: 217 ** `http://brian:pass@host/'.userInfo -> "brian:pass" 218 ** `http://www.cool.com/`.userInfo -> null 219 ** 220 Str userInfo() 221 222 ** 223 ** Return the IP port of the host for the network end point. It is optionally 224 ** embedded in the authority using the ":" character. If unspecified then 225 ** return null. 226 ** 227 ** Examples: 228 ** `http://foo:81/'.port -> 81 229 ** `http://www.cool.com/`.port -> null 230 ** 231 Int port() 232 233 ** 234 ** Return the path parsed into a list of simple names or 235 ** an empty list if the pathStr is "" or "/". 236 ** 237 ** Examples: 238 ** `mailto:me@there.com` -> null 239 ** `http://host`.path -> Str[,] 240 ** `http://foo/`.path -> Str[,] 241 ** `/`.path -> Str[,] 242 ** `/a`.path -> ["a"] 243 ** `/a/b`.path -> ["a", "b"] 244 ** `../a/b`.path -> ["..", "a", "b"] 245 ** 246 Str[] path() 247 248 ** 249 ** Return the path component of the Uri. 250 ** 251 ** Examples: 252 ** `mailto:me@there.com` -> "me@there.com" 253 ** `http://host` -> "" 254 ** `http://foo/`.pathStr -> "/" 255 ** `/a`.pathStr -> "/a" 256 ** `/a/b`.pathStr -> "/a/b" 257 ** `../a/b`.pathStr -> "../a/b" 258 ** 259 Str pathStr() 260 261 ** 262 ** Return if the path starts with a leading slash. If 263 ** pathStr is null, then return false. 264 ** 265 ** Examples: 266 ** `http://foo/`.isPathAbs -> true 267 ** `/dir/f.txt`.isPathAbs -> true 268 ** `dir/f.txt`.isPathAbs -> false 269 ** `../index.html`.isPathAbs -> false 270 ** 271 Bool isPathAbs() 272 273 ** 274 ** Return simple file name which is path.last or null 275 ** if the path is empty. 276 ** 277 ** Examples: 278 ** `/`.name -> null 279 ** `/a/file.txt`.name -> "file.txt" 280 ** `/a/file`.name -> "file" 281 ** 282 Str name() 283 284 ** 285 ** Return file name without the extension (everything up 286 ** to the last dot) or null if name is null. 287 ** 288 ** Examples: 289 ** `/`.basename -> null 290 ** `/a/file.txt`.basename -> "file" 291 ** `/a/file`.basename -> "file" 292 ** `/a/file.`.basename -> "file" 293 ** `..`.basename -> ".." 294 ** 295 Str basename() 296 297 ** 298 ** Return file name extension (everything after the last dot) 299 ** or null if name is null or name has no dot. 300 ** 301 ** Examples: 302 ** `/`.ext -> null 303 ** `/a/file.txt`.ext -> "txt" 304 ** `/Foo.Bar`.ext -> "Bar" 305 ** `/a/file`.ext-> null 306 ** `/a/file.`.ext-> "" 307 ** `..`.ext -> null 308 ** 309 Str ext() 310 311 ** 312 ** Return the query parsed as a map of key/value pairs. If no query 313 ** string was specified return an empty map (this method will never 314 ** return null). The query is parsed such that pairs are separated by 315 ** the "&" or ";" characters. If a pair contains the "=", then 316 ** it is split into a key and value, otherwise the value defaults 317 ** to "true". 318 ** 319 ** Examples: 320 ** `http://host/path?query`.query -> ["query":"true"] 321 ** `http://host/path`.query -> [:] 322 ** `?a=b;c=d`.query -> ["a":"b", "c":"d"] 323 ** `?a=b&c=d`.query -> ["a":"b", "c":"d"] 324 ** `?a=b;;c=d;`.query -> ["a":"b", "c":"d"] 325 ** `?a=b;;c`.query -> ["a":"b", "c":"true"] 326 ** 327 Str:Str query() 328 329 ** 330 ** Return the query component of the Uri which is everything 331 ** after the "?" but before the "#" fragment. Return null if 332 ** no query string specified. 333 ** 334 ** Examples: 335 ** `http://host/path?query#frag`.queryStr -> "query" 336 ** `http://host/path?query`.queryStr -> "query" 337 ** `http://host/path`.queryStr -> null 338 ** `../foo?a=b&c=d`.queryStr -> "a=b&c=d" 339 ** `?a=b;c;`.queryStr -> "a=b;c;" 340 ** 341 Str queryStr() 342 343 ** 344 ** Return the fragment component of the Uri which is everything 345 ** after the "#". Return null if no fragment specified. 346 ** 347 ** Examples: 348 ** `http://host/path?query#frag`.frag -> "frag" 349 ** `http://host/path` -> null 350 ** `#h1` -> "h1" 351 ** 352 Str frag() 353 354 ////////////////////////////////////////////////////////////////////////// 355 // Normalization 356 ////////////////////////////////////////////////////////////////////////// 357 358 ** 359 ** Return the parent directory of this Uri or null if a parent 360 ** path cannot be computed from this Uri. 361 ** 362 ** Examples: 363 ** `http://foo/a/b/c?q#f`.parent -> `http://foo/a/b/` 364 ** `/a/b/c/`.parent -> `/a/b/`) 365 ** `a/b/c`.parent -> `a/b/` 366 ** `/a`.parent -> `/` 367 ** `/`.parent -> null 368 ** `a.txt`.parent -> null 369 ** 370 Uri parent() 371 372 ** 373 ** Return a new Uri with the specified Uri appended to this Uri. 374 ** 375 ** Examples: 376 ** `http://foo/path` + `http://bar/` -> `http://bar/` 377 ** `http://foo/path?q#f` + `newpath` -> `http://foo/newpath` 378 ** `http://foo/path/?q#f` + `newpath` -> `http://foo/path/newpath` 379 ** `a/b/c` + `d` -> `a/b/d` 380 ** `a/b/c/` + `d` -> `a/b/c/d` 381 ** `a/b/c` + `../../d` -> `d` 382 ** `a/b/c/` + `../../d` -> `a/d` 383 ** `a/b/c` + `../../../d` -> `../d` 384 ** `a/b/c/` + `../../../d` -> `d` 385 ** 386 Uri plus(Uri toAppend) 387 388 ** 389 ** Relativize this uri against the specified base. 390 ** 391 ** Examples: 392 ** `http://foo/a/b/c` - `http://foo/a/b/c` -> `` 393 ** `http://foo/a/b/c` - `http://foo/a/b` -> `c` 394 ** `//foo/a/b/c` - `http://foo/` -> `a/b/c` 395 ** `/a/b/c` - `/a` -> `b/c` 396 ** 397 Uri minus(Uri toRelativize) 398 399 ** 400 ** Return this Uri relativized against the specified number of parent 401 ** levels of the path. If path is null or path.size < levels then 402 ** throw ArgErr. 403 ** 404 ** Examples: 405 ** `http://host/a/b/c?query#frag`.tail -> `b/c` 406 ** `a/b/c/d`.tail -> `b/c/d` 407 ** `a/b/c`.tail(2) -> `c` 408 ** `a`.tail, `` 409 ** 410 Uri tail(Int levels := 1) 411 412 ////////////////////////////////////////////////////////////////////////// 413 // Resolution 414 ////////////////////////////////////////////////////////////////////////// 415 416 ** 417 ** Convenience for File.make(this) - no guarantee is made 418 ** that the file exists. 419 ** 420 File toFile() 421 422 ** 423 ** Resolve this Uri to it's target object. If this Uri is absolute 424 ** then base may be null (it is ignored), otherwise the Uri is resolved 425 ** against the base specified. Throw ArgErr is base is null and this 426 ** Uri is relative. Return null if the Uri cannot be resolved. 427 ** Steps for resolution: 428 ** 1) if absolute, map scheme to factory class to get base (TODO) 429 ** 2) route to base.trapUri 430 ** 431 ** TODO: how should this method be with Resources and resolve??? 432 ** 433 Obj get(Obj base := null) 434 435 ** 436 ** Resolve this Uri to a resource within the local virtual machine's 437 ** namespace. If this Uri is path absolute, then it is resolved 438 ** against `Sys.namespace`. If base is null, then the Uri must be 439 ** path absolute. If the resource cannot be resolved and checked is 440 ** false return null, otherwise throw UnresolvedErr. 441 ** 442 ** TODO: doc things like how scheme is handled??? 443 ** 444 Resource resolve(Resource base := null, Bool checked := true) 445 446 }