logo

const class

sys::Uri

sys::Obj
  sys::Uri
   1  //
   2  // Copyright (c) 2006, Brian Frank and Andy Frank
   3  // Licensed under the Academic Free License version 3.0
   4  //
   5  // History:
   6  //   28 Jun 06  Brian Frank  Creation
   7  //
   8  
   9  **
  10  ** Uri is used to immutably represent a Universal Resource Identifier
  11  ** according to [RFC 3986]`http://tools.ietf.org/html/rfc3986`.
  12  ** The generic format for a URI is:
  13  **
  14  **   <uri>       := [<scheme> ":"] <body>
  15  **   <body>      := ["//" <auth>] ["/" <path>] ["?" <query>] ["#" <frag>]
  16  **   <auth>      := [<userInfo> "@"] <host> [":" <port>]
  17  **   <path>      := <name> ("/" <name>)*
  18  **   <name>      := <basename> ["." <ext>]
  19  **   <query>     := <queryPair> (<querySep> <queryPair>)*
  20  **   <querySep>  := "&" | ";"
  21  **   <queryPair> := <queryKey> ["=" <queryVal>]
  22  **
  23  ** Uris are expressed in either encoded or decoded form.  In encoded
  24  ** form RFC 3986 defines a strict set of rules for the characters
  25  ** allowed in each section of the URI (scheme, userInfo, host, path,
  26  ** query, and fragment).  Any character outside of the allowed set is
  27  ** UTF-8 encoded into octets and '%HH' percent encoded.
  28  **
  29  ** In decoded form the full range of Unicode characters is allowed in all
  30  ** sections except the general delimiters which separate sections.  For
  31  ** example '?' is barred in any section before the query, buf is permissible
  32  ** in the query string itself or the fragment identifier.  The scheme must
  33  ** be strictly defined in terms of ASCII alphanumeric, ".", "+", or "-".
  34  **
  35  ** The Uri API is designed to work with the decoded format of the Uri.
  36  ** Access methods like `host`, `pathStr`, or `queryStr` all return the
  37  ** decoded format of the URI.  To summarize different ways of working
  38  ** with Uri:
  39  **   - `Uri.fromStr`: parses a string from its decoded format
  40  **   - `Uri.decode`:  parses a string from percent encoded format
  41  **   - `Uri.encode`:  translate into percent encoded format
  42  **
  43  ** Uri can be used to model either absolute URIs or relative references.
  44  ** The `plus` and `minus` shortcut operators can be used to resolve and
  45  ** relativize relative references against a base URI.
  46  **
  47  const final class Uri
  48  {
  49  
  50  //////////////////////////////////////////////////////////////////////////
  51  // Constructor
  52  //////////////////////////////////////////////////////////////////////////
  53  
  54    **
  55    ** Parse the specified string into a Uri.  Throw ParseErr
  56    ** if the string is a malformed URI.  This method parses
  57    ** an decoded Unicode string into its generic parts.
  58    ** It does not unescape '%' or '+' and handles normal Unicode
  59    ** characters in the string.
  60    **
  61    ** All Uris are automatically normalized as follows:
  62    **   - Replacing "." and ".." segments in the middle of a path
  63    **   - Scheme always normalizes to lowercase
  64    **   - If http then port 80 normalizes to null
  65    **   - If http then a null path normalizes to /
  66    **
  67    static Uri fromStr(Str s)
  68  
  69    **
  70    ** Parse an ASCII percent encoded string into a Uri according to
  71    ** RFC 3986.  All '%HH' escape sequences are translated into octects,
  72    ** and then the octect sequence is UTF-8 decoded into a Str.  The '+'
  73    ** character in the query section is unescaped into a space.
  74    ** ParseErr if the string is a malformed URI or if not encoded
  75    ** correctly. Refer to `fromStr` for normalization rules.
  76    **
  77    static Uri decode(Str s)
  78  
  79  //////////////////////////////////////////////////////////////////////////
  80  // Utils
  81  //////////////////////////////////////////////////////////////////////////
  82  
  83    **
  84    ** Decode a map of query parameters which are URL encoded according
  85    ** to the "application/x-www-form-urlencoded" MIME type.  This method
  86    ** will unescape '%' percent encoding and '+' into space.  The parameters
  87    ** are parsed into map using the same semantics as `Uri.query`.  Throw
  88    ** ArgErr is the string is malformed.  See `encodeQuery`.
  89    **
  90    static Str:Str decodeQuery(Str s)
  91  
  92    **
  93    ** Encode a map of query parameters into URL percent encoding
  94    ** according to the "application/x-www-form-urlencoded" MIME type.
  95    ** See `decodeQuery`.
  96    **
  97    static Str encodeQuery(Str:Str q)
  98  
  99    **
 100    ** Return if the specified string is an valid name segment to
 101    ** use in an unencoded URI.  The name must be at least one char
 102    ** long and can never be "." or "..".  The legal characters are
 103    ** defined by as follows from RFC 3986:
 104    **
 105    **   unreserved  =  ALPHA / DIGIT / "-" / "." / "_" / "~"
 106    **   ALPHA       =  %x41-5A / %x61-7A   ; A-Z / a-z
 107    **   DIGIT       =  %x30-39 ; 0-9
 108    **
 109    ** Although RFC 3986 does allow path segments to contain other
 110    ** special characters such as 'sub-delims', Fan takes a strict
 111    ** approach to names to be used in URIs.
 112    **
 113    static Bool isName(Str name)
 114  
 115    **
 116    ** If the specified string is not a valid name according
 117    ** to the `isName` method, then throw `NameErr`.
 118    **
 119    static Void checkName(Str name)
 120  
 121  //////////////////////////////////////////////////////////////////////////
 122  // Identity
 123  //////////////////////////////////////////////////////////////////////////
 124  
 125    **
 126    ** Two Uris are equal if they have same string normalized representation.
 127    **
 128    override Bool equals(Obj that)
 129  
 130    **
 131    ** Return a hash code based on the normalized string representation.
 132    **
 133    override Int hash()
 134  
 135    **
 136    ** Return normalized string representation.
 137    **
 138    override Str toStr()
 139  
 140    **
 141    ** Return the percent encoded string for this Uri according to
 142    ** RFC 3986.  Each section of the Uri is UTF-8 encoded into octects
 143    ** and then percent encoded according to its valid character set.
 144    ** Spaces in the query section are encoded as '+'.
 145    **
 146    Str encode()
 147  
 148  //////////////////////////////////////////////////////////////////////////
 149  // Components
 150  //////////////////////////////////////////////////////////////////////////
 151  
 152    **
 153    ** Return if an absolute Uri which means it has a nonnull scheme.
 154    **
 155    Bool isAbs()
 156  
 157    **
 158    ** Return if a relative Uri which means it has a null scheme.
 159    **
 160    Bool isRel()
 161  
 162    **
 163    ** A Uri represents a directory if it has a non-null path which
 164    ** ends with a "/" slash.  Directories are joined with other Uris
 165    ** relative to themselves versus non-directories which are joined
 166    ** relative to their parent.
 167    **
 168    ** Examples:
 169    **   `/a/b`.isDir -> false
 170    **   `/a/b/`.isDir -> true
 171    **
 172    Bool isDir()
 173  
 174    **
 175    ** Return the scheme component or null if not absolute.  The
 176    ** scheme is always normalized into lowercase.
 177    **
 178    ** Examples:
 179    **   `http://foo/a/b/c`.scheme -> "http"
 180    **   `HTTP://foo/a/b/c`.scheme -> "http"
 181    **   `mailto:who@there.com`.scheme -> "mailto"
 182    **
 183    Str scheme()
 184  
 185    **
 186    ** The authority represents a network endpoint in the format:
 187    **   [<userInfo> "@"] host [":" <port>]
 188    **
 189    ** Examples:
 190    **   `http://user@host:99/`.auth -> "user@host:99"
 191    **   `http://host/`.auth -> "host"
 192    **   `/dir/file.txt`.auth -> null
 193    **
 194    Str auth()
 195  
 196    **
 197    ** Return the host address of the URI or null if not available.  The
 198    ** host is in the format of a DNS name, IPv4 address, or IPv6 address
 199    ** surrounded by square brackets.  Return null if the uri is not
 200    ** absolute.
 201    **
 202    ** Examples:
 203    **   `ftp://there:78/file`.host -> "there"
 204    **   `http://www.cool.com/`.host -> "www.cool.com"
 205    **   `http://user@10.162.255.4/index`.host -> "10.162.255.4"
 206    **   `http://[::192.9.5.5]/`.host -> "[::192.9.5.5]"
 207    **   `//foo/bar`.host -> "foo"
 208    **   `/bar`.host -> null
 209    **
 210    Str host()
 211  
 212    **
 213    ** User info is string information embedded in the authority using
 214    ** the "@" character.  Its use is discouraged for security reasons.
 215    **
 216    ** Examples:
 217    **   `http://brian:pass@host/'.userInfo -> "brian:pass"
 218    **   `http://www.cool.com/`.userInfo -> null
 219    **
 220    Str userInfo()
 221  
 222    **
 223    ** Return the IP port of the host for the network end point.  It is optionally
 224    ** embedded in the authority using the ":" character.  If unspecified then
 225    ** return null.
 226    **
 227    ** Examples:
 228    **   `http://foo:81/'.port -> 81
 229    **   `http://www.cool.com/`.port -> null
 230    **
 231    Int port()
 232  
 233    **
 234    ** Return the path parsed into a list of simple names or
 235    ** an empty list if the pathStr is "" or "/".
 236    **
 237    ** Examples:
 238    **   `mailto:me@there.com` -> null
 239    **   `http://host`.path -> Str[,]
 240    **   `http://foo/`.path -> Str[,]
 241    **   `/`.path -> Str[,]
 242    **   `/a`.path -> ["a"]
 243    **   `/a/b`.path -> ["a", "b"]
 244    **   `../a/b`.path -> ["..", "a", "b"]
 245    **
 246    Str[] path()
 247  
 248    **
 249    ** Return the path component of the Uri.
 250    **
 251    ** Examples:
 252    **   `mailto:me@there.com` -> "me@there.com"
 253    **   `http://host` -> ""
 254    **   `http://foo/`.pathStr -> "/"
 255    **   `/a`.pathStr -> "/a"
 256    **   `/a/b`.pathStr -> "/a/b"
 257    **   `../a/b`.pathStr -> "../a/b"
 258    **
 259    Str pathStr()
 260  
 261    **
 262    ** Return if the path starts with a leading slash.  If
 263    ** pathStr is null, then return false.
 264    **
 265    ** Examples:
 266    **   `http://foo/`.isPathAbs   -> true
 267    **   `/dir/f.txt`.isPathAbs    -> true
 268    **   `dir/f.txt`.isPathAbs     -> false
 269    **   `../index.html`.isPathAbs -> false
 270    **
 271    Bool isPathAbs()
 272  
 273    **
 274    ** Return simple file name which is path.last or null
 275    ** if the path is empty.
 276    **
 277    ** Examples:
 278    **   `/`.name -> null
 279    **   `/a/file.txt`.name -> "file.txt"
 280    **   `/a/file`.name -> "file"
 281    **
 282    Str name()
 283  
 284    **
 285    ** Return file name without the extension (everything up
 286    ** to the last dot) or null if name is null.
 287    **
 288    ** Examples:
 289    **   `/`.basename -> null
 290    **   `/a/file.txt`.basename -> "file"
 291    **   `/a/file`.basename -> "file"
 292    **   `/a/file.`.basename -> "file"
 293    **   `..`.basename -> ".."
 294    **
 295    Str basename()
 296  
 297    **
 298    ** Return file name extension (everything after the last dot)
 299    ** or null if name is null or name has no dot.
 300    **
 301    ** Examples:
 302    **   `/`.ext -> null
 303    **   `/a/file.txt`.ext -> "txt"
 304    **   `/Foo.Bar`.ext -> "Bar"
 305    **   `/a/file`.ext-> null
 306    **   `/a/file.`.ext-> ""
 307    **   `..`.ext -> null
 308    **
 309    Str ext()
 310  
 311    **
 312    ** Return the query parsed as a map of key/value pairs.  If no query
 313    ** string was specified return an empty map (this method will never
 314    ** return null).  The query is parsed such that pairs are separated by
 315    ** the "&" or ";" characters.  If a pair contains the "=", then
 316    ** it is split into a key and value, otherwise the value defaults
 317    ** to "true".
 318    **
 319    ** Examples:
 320    **   `http://host/path?query`.query -> ["query":"true"]
 321    **   `http://host/path`.query -> [:]
 322    **   `?a=b;c=d`.query -> ["a":"b", "c":"d"]
 323    **   `?a=b&c=d`.query -> ["a":"b", "c":"d"]
 324    **   `?a=b;;c=d;`.query -> ["a":"b", "c":"d"]
 325    **   `?a=b;;c`.query -> ["a":"b", "c":"true"]
 326    **
 327    Str:Str query()
 328  
 329    **
 330    ** Return the query component of the Uri which is everything
 331    ** after the "?" but before the "#" fragment.  Return null if
 332    ** no query string specified.
 333    **
 334    ** Examples:
 335    **   `http://host/path?query#frag`.queryStr -> "query"
 336    **   `http://host/path?query`.queryStr -> "query"
 337    **   `http://host/path`.queryStr -> null
 338    **   `../foo?a=b&c=d`.queryStr -> "a=b&c=d"
 339    **   `?a=b;c;`.queryStr -> "a=b;c;"
 340    **
 341    Str queryStr()
 342  
 343    **
 344    ** Return the fragment component of the Uri which is everything
 345    ** after the "#".  Return null if no fragment specified.
 346    **
 347    ** Examples:
 348    **   `http://host/path?query#frag`.frag -> "frag"
 349    **   `http://host/path` -> null
 350    **   `#h1` -> "h1"
 351    **
 352    Str frag()
 353  
 354  //////////////////////////////////////////////////////////////////////////
 355  // Normalization
 356  //////////////////////////////////////////////////////////////////////////
 357  
 358    **
 359    ** Return the parent directory of this Uri or null if a parent
 360    ** path cannot be computed from this Uri.
 361    **
 362    ** Examples:
 363    **   `http://foo/a/b/c?q#f`.parent -> `http://foo/a/b/`
 364    **   `/a/b/c/`.parent -> `/a/b/`)
 365    **   `a/b/c`.parent   -> `a/b/`
 366    **   `/a`.parent      ->  `/`
 367    **   `/`.parent       ->  null
 368    **   `a.txt`.parent   ->  null
 369    **
 370    Uri parent()
 371  
 372    **
 373    ** Return a new Uri with the specified Uri appended to this Uri.
 374    **
 375    ** Examples:
 376    **   `http://foo/path` + `http://bar/` -> `http://bar/`
 377    **   `http://foo/path?q#f` + `newpath` -> `http://foo/newpath`
 378    **   `http://foo/path/?q#f` + `newpath` -> `http://foo/path/newpath`
 379    **   `a/b/c`  + `d` -> `a/b/d`
 380    **   `a/b/c/` + `d` -> `a/b/c/d`
 381    **   `a/b/c`  + `../../d` -> `d`
 382    **   `a/b/c/` + `../../d` -> `a/d`
 383    **   `a/b/c`  + `../../../d` -> `../d`
 384    **   `a/b/c/` + `../../../d` -> `d`
 385    **
 386    Uri plus(Uri toAppend)
 387  
 388    **
 389    ** Relativize this uri against the specified base.
 390    **
 391    ** Examples:
 392    **   `http://foo/a/b/c` - `http://foo/a/b/c`  -> ``
 393    **   `http://foo/a/b/c` - `http://foo/a/b`  -> `c`
 394    **   `//foo/a/b/c` - `http://foo/` -> `a/b/c`
 395    **   `/a/b/c` - `/a` ->  `b/c`
 396    **
 397    Uri minus(Uri toRelativize)
 398  
 399    **
 400    ** Return this Uri relativized against the specified number of parent
 401    ** levels of the path.  If path is null or path.size < levels then
 402    ** throw ArgErr.
 403    **
 404    ** Examples:
 405    **   `http://host/a/b/c?query#frag`.tail -> `b/c`
 406    **   `a/b/c/d`.tail -> `b/c/d`
 407    **   `a/b/c`.tail(2) -> `c`
 408    **   `a`.tail, ``
 409    **
 410    Uri tail(Int levels := 1)
 411  
 412  //////////////////////////////////////////////////////////////////////////
 413  // Resolution
 414  //////////////////////////////////////////////////////////////////////////
 415  
 416    **
 417    ** Convenience for File.make(this) - no guarantee is made
 418    ** that the file exists.
 419    **
 420    File toFile()
 421  
 422    **
 423    ** Resolve this Uri to it's target object.  If this Uri is absolute
 424    ** then base may be null (it is ignored), otherwise the Uri is resolved
 425    ** against the base specified.  Throw ArgErr is base is null and this
 426    ** Uri is relative.  Return null if the Uri cannot be resolved.
 427    ** Steps for resolution:
 428    **   1) if absolute, map scheme to factory class to get base (TODO)
 429    **   2) route to base.trapUri
 430    **
 431    ** TODO: how should this method be with Resources and resolve???
 432    **
 433    Obj get(Obj base := null)
 434  
 435    **
 436    ** Resolve this Uri to a resource within the local virtual machine's
 437    ** namespace.  If this Uri is path absolute, then it is resolved
 438    ** against `Sys.namespace`.  If base is null, then the Uri must be
 439    ** path absolute.  If the resource cannot be resolved and checked is
 440    ** false return null, otherwise throw UnresolvedErr.
 441    **
 442    ** TODO: doc things like how scheme is handled???
 443    **
 444    Resource resolve(Resource base := null, Bool checked := true)
 445  
 446  }