A header-only URI library in modern C++. Just plug and play!
The driving reason for this library is the general lack of URI-parsing libraries
that are generic to all forms of URIs, and the amount of bad URI parsing in the
world. To address both of those cases, this library provides access to the
content component (called hier-part
in the spec, even for non-hierarchical
URIs) for non-hierarchical URIs, and provides access to the parsed components of
the content component (username, password, host, port, path) for hierarchical
URIs. Note that accessing hierarchical URI components for a non-hierarchical URI
is invalid and will throw a std::domain_error
, while accessing the content
component of a hierarchical URI is invalid and will likewise throw a
std::domain_error
. For additional notes, see the following API documentation.
uri(char const *uri_text, scheme_category category = scheme_category::Hierarchical, query_argument_separator separator = query_argument_separator::ampersand)
anduri(std::string const &uri_text, scheme_category category = scheme_category::Hierarchical, query_argument_separator separator = query_argument_separator::ampersand)
: constructs auri
object and throws an exception for any invalid component.uri(uri const &other)
anduri &operator=(uri const &other)
: copy constructor and copy assignment operator. Creates a duplicate of the supplied uri.uri(uri const &other, std::map<component, std::string> const &replacements)
: Constructs a new URI with the supplied URI, with components replaced as per the replacements dictionary. This constructor cannot change if the path is rooted or not, and it cannot change from a hierarchical URI to a non-hierarchical URI.uri(std::map<component, std::string> const &components, scheme_category category, bool rooted_path, query_argument_separator separator = query_argument_separator::ampersand)
: Constructs a new URI from the components given. Note that currently it is possible to build very invalid URIs with this setup, as no validation is performed (as of right now.) This constructor is a wee bit experimental.
std::string const &get_scheme() const
: get the scheme component.scheme_category get_scheme_category() const
: get the scheme category, either Hierarchical or NonHierarchical.std::string const &get_content() const
: get the content component of a non-hierarchical URI. Throws when called on a hierarchical URI.std::string const &get_username() const
: get the username component of the URI (or return an empty string if none was present in the source string.) This method will most likely be marked deprecated shortly (as username/password handling in URIs is deprecated.) Throws when called on a non-hierarchical URI.std::string const &get_password() const
: get the password component of the URI (or return an empty string if none was present in the source string.) This method will most likely be marked deprecated shortly (as username/password handling in URIs is deprecated.) Throws when called on a non-hierarchical URI.std::string const &get_host() const
: get the host component of the URI. Returns an empty string if the host component was empty or not supplied. Throws when called on a non-hierarchical URI.unsigned long get_port() const
: get the port component (parsed into anunsigned long
) of the URI. If no port was supplied, returns 0. Throws when called on a non-hierarchical URI.std::string const &get_path() const
: get the path component of the URI. Returns an empty string if the path component was empty. Throws when called on a non-hierarchical URI.std::string const &get_query() const
: get the query component of the URI, as a string. Returns an empty string if no query was supplied.std::map<std::string, std::string> const &get_query_dictionary() const
: get the parsed contents of the query component, as a key-value dictionary. This operation uses the separator declared at the creation of the URI, which as a default uses an ampersand. If your URI uses semicolons to separate arguments, set the optional arguments in the constructor.std::string const &get_fragment() const
: get the fragment component of the URI. Returns an empty string if no fragment was supplied.std::string to_string() const
: get the normalized form of the URI; any empty components included in the initial URI string will be stripped from this form. Currently does not normalize on capitalization, but do not rely on the case of the returned URI matching the supplied case in the future. If no authority component was present in a hierarchical URI, this method will preserve the rootless state of the path component, i.e. for the URNurn:ietf:rfc:2141
, the path isietf:rfc:2141
with no root (initial/
character) and as such, the normalized string form also will not have a root character.
This library comes with a basic set of tests, which (as of this writing) mostly confirm that a few example URIs work well, and should confirm the operation of the library for various cases.
With GCC or Clang++, the instructions are fairly straightforward; run the
following in this directory, substituting clang++
for g++
(assuming your
installation has C++11 support):
g++ -std=c++11 test.cc -o uri_test
./uri_test
With MSVC, note that I have only tested with 2015, and I expect later versions will be similar. Using the developer command prompt, navigate to this directory and run the following:
cl.exe test.cc
.\test.exe
The map-based instantiation is very weak currently, as it does absolutely no
validation. Similarly, the IPv6 parsing is only on structure, it doesn't
validate that the address is a valid format. That's up for consideration for
future work. Even though the documentation currently states that this library
normalizes URIs, it does not presently normalize the case of a URI, and since
the path handling is generic, rationalization of a path involving relative
sections (.
or ..
) has to be application specific.
One thing I'd like to do with this library is add specializations for specific
types of non-hierarchical URIs; data:
, mailto:
, and geo:
are all strong
candidates for subclasses to support their formats, since they're very
well-defined. Since the current class structure supports hierarchical URIs very
well as-is, no further extensions seem necessary in that direction.
Once C++14 support is more widespread (or I can find an appropriate value to
check for, get_username()
and get_password()
will be marked as
[[deprecated]]
, in order to appropriately warn on their use. Please heed this
warning, and don't use get_username()
or get_password()
, they're horribly
unsafe and only included for completeness. At some point in the future, calling
them may result in an exception being thrown at runtime.
As an optional future task, I might consider supporting sniffing query argument separators from context; I'm not too interested in that right now, however, since guessing as a parser is generally a good excuse to create a bug farm.