Contents
Intoduction
My web site happens to be hosted on the Zeus server.
The Zeus server does support .htaccess
files, and offers through these some identical functionality / configuration directives to Apache, including Allow and Deny directives, and many others.
URL Rewriting however is a different matter. Here Zeus has its own more verbose re-writing script.
Set out below are some simple instructions for going about URL rewriting with Zeus. As with my Apache notes, these are not intended to be complete.
See Zeus User Guide for the complete works.
Key Points
- fragments
Remember that fragments
#bookmark
are not covered by this processing and are just attached afterwards.- Where does the rewrite script go?
The re-writing script must be saved in a file called
rewrite.script
and deposited in your web root directory.- Processing Order
If you apply a REQUEST = 301, the script is terminated, and restarted with the new address.
If the script reaches the end, the final contents of the
URL
variable that are used as the new request. This means that if any rule is complete, you need to look at usinggoto
to jump to the end of the file.- Regular Expressions
The regular expressions used by Zeus are a sub-set of the full Perl, so some expressions may not be available, such as
\d
(as a shortcut for[0-9]
).This is what Zeus supports:-
ABC
Literal "ABC"
(ABC|DEF)
Literal "ABC" or Literal "DEF"
[13-68]
A single character that is 1 or 3 or 4 or 5 or 6 or 8
[a-z]
A single character that is any lower case letter
[XYZ]
A single character that is X or Y or Z
[^XYZ]
A single character that is not X and not Y and not Z
. (dot)
any character
x*
zero or more x's
x+
one or more x's
x?
zero or one x (an optional x)
^
Anchor to start of line
$
Anchor to end of line
\.
Escaped to give a "." character
\*
Escaped to give a "*" character
\+
Escaped to give a "+" character
\^
Escaped to give a "^" character
\$
Escaped to give a "$" character
\\
Escaped to give a "\" character
If you are matching a URL with a query string, e.g.
entry.php?id=1
remember to escape the.
with\.
. and the?
with\?
.If you want to match multiple different start points, either
^(one|two|three)
or^one|^two|^three
.- Security
It is best if you make your matches as precise as possible, with trailing and leading
^
and$
to minimise malicious visitors providing incorrect URLs which are accepted.- Strings
As with Apache, strings do not use quotes.
If you want to refer to an environment variable in a string you use the form
%{<variable name>}
such as%{SCRATCH:key}
or%{URL}
. Refering to an environment variable elsewhere is just directSCRATCH:key
orURL
.- Comments
#
starts a comment line, same as Apache.
Simple Example
Here is a simple example to start off with:-
RULE_0_START:
match URL into $ with ^/blog/(.*)/$
if matched then
set URL=/blog-entry.php?id=$1
endif
RULE_0_END:
- This will convert
http://www.baconbutty.com/blog/12/
internally intohttp:www.baconbutty.com/blog-entry.php?id=12
- The whole rule can be labelled (i.e.
RULE_0_START:
). This can be used if you have more than one rule, and you want togoto
another part of the script. The end lable is just for information really, but could be used togoto
the end of the current rule. - The
match
tests the regular expression^/blog/(.*)/$
against the value of URL, and if it matches, stores any captured sub-matches such as the(.*)
into the$
variable which can then be accessed with$1...N
later. You can also use the%
variable. - URL is an environment variable which is the equivalent of Apache's
REQUEST_URI + "?" + QUERY_STRING
- The browser address bar will not change (this only carries out an internal redirect). This is the effect of
set URL=
.
Environment Variables
The following are variables used in your scripts, examples assume you have typed http://www.baconbutty.com/blog/12/?val=a#bookmark1
.
URL
The bit after the host, but not including fragment.
I.e. /blog/12/?val=a
REMOTE_HOST
Hostname of the remote client, or the IP address if DNS failed.
REMOTE_ADDR
Holds the IP address of the remote client.
REQUEST_METHOD
Holds the request method. E.g GET
or POST
.
ENV
An array of strings, containing the operating system environment variables. Modifiable.
You use it thus:-
ENV : <Name>
to read
set ENV : <Name> =
to set.
IN
An array of strings, containing the HTTP headers that were received with the request.
You use it thus:-
IN : <Name>
to read
set IN : <Name> =
to set.
OUT
An array of strings, containing the HTTP headers that will be returned to the users browser when the request is completed. Starts of empty.
You use it thus:-
OUT : <Name>
to read
set OUT : <Name> =
to set.
SCRATCH
An array that can be used to hold temporary values.
You use it thus:-
SCRATCH : <Name>
to read
set SCRATCH : <Name> =
to set.
BODY
Specify some content that is to be returned to the Browser.
E.g. set BODY =
if you are doing a re-direct with RESPONSE, then the BODY content is largely irrelevant, as it does not get displayed by the Zeus server. You just set it in order that Zeus takes notice of the RESPONSE.
RESPONSE
Specify the HTTP response code that is to be returned, if BODY is set (see above).
E.g. you could use 301 if you wanted to update the browser's address bar.
Codes are:-
200 OK
201 Created
204 No content
206 Partial content
207 Multi-Status
300 Multiple Choices
301 Moved Permanently
302 Moved Temporarily
303 See Other
304 Not Modified
400 Bad Request
401 Unauthorized
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable
410 Gone
411 Length Required
413 Request Entity Too Large
414 Request-URI Too Long
416 Request Range Not Satisfiable
500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable / Server Too Busy
504 Gateway Time-out
505 HTTP Version Not Supported
Commmands
match
Grammar:-
[insensitive] match <variable> into <array> with <regex>
- <variable> is an environment variable.
- <array> is one of
$
and%
which stores the array of captured parentheses for the <regex>, addressedas $1..N
or%1..N
. - <regex> is a pearl-like regular expression (cut down) - see above.
A flag called matched
is set based on the results of the last match performed, and is tested with if
.
if
Grammar:-
if [not] [matched|exists] then
... some statements
else
... some statements
endif
Follows a match
statement or look
statement.
matched
is a flag for a match
statement
exists
is a flag for a look
statement
Note that whenever you use a match
or look
statement, the previous matched
and exists
flags will be reset. This means you cannot have a match
immediately followed by a look
, and expect to test matched
and exists
.
You can nest commands etc.
set
Grammar:-
set <variable> <operator> <string>
- Sets a particular environment variable with a string.
- <variable> is an environment variable.
- <operator> is
=
(replace content) or.
(concatenate) - <string> is a string! It does not need to be in quotes.
goto
goto <label>
If you have a label MY_LABEL:
then this jumps to it. You do not use the :
when referencing label's name.
map
Grammar:-
map path into <var> from <path>
This is used to create a filesystem path (useful for testing the existence of files).
<path> is a file path relative to the DOCUMENT_ROOT, starting with /
.
The classic is:-
map path into SCRATCH:DOCROOT from /
This will will obtain the equivalent of the DOCUMENT_ROOT filesystem path. Useful for adding things to. An empty string is returned if it fails.
look
Grammar:-
look for [dir|file] at <path>
Does the specifed file or directory exist.
Follow it up with if exists then ...
.
Here is an example that checks first whether your URL
exists, so you can customise the response:-
map path into SCRATCH:path from %{URL}
look for file at %{SCRATCH:path}
if not exists then
look for file at %{SCRATCH:path}/index.html
endif
if exists then
... great
else
... custom not found message
endif
substr
Grammar:-
substr <dest> is <src> from <startpos> for <num>
Store in the <dest> variable, a substring of <scr> being the <num> characters from <startpos> (begins with 1).
E.g.
substr SCRATCH:SUB is URL from 1 for 3
The first 3 characters of URL, and store under the SUB
key in SCRATCH
.
Labels
A label appears like this on its own line:-
MY_LABEL:
.... some commands
It is not needed, but can be used with goto
to jump around your script.
There are two hidden labels START:
and END:
- e.g. goto END
will terminate the script.
Don't Forget
- Relative Links
If you are going to use static URLs, such as
http://www.baconbutty.com/blog/12/
then you need to remember two things:-- Firstly, all links to other pages will need to be switched to static.
- Secondly, this will potentially screw up your links to other static resources / bookmarks in the document itself.
In relation to the second point, lets consider this further.
There are two aspects to consider:-
- Bookmark links to the same page, such as
href="#toc804580"
. These should continue to work as is,http://www.baconbutty.com/blog/12/#toc804580
should be just fine (as noted the fragment is dealt with separately). - Relative links to other static resources will fail. E.g. if I import a style with
@import url(common.css);
thencommon.css
is relative and will be seen ashttp://www.baconbutty.com/blog/12/common.css
which does not exist.
So how to solve the second bullet point?
- Well the easiest is to convert all relative links to full URLS. E.g.
@import url(http://www.baconbutty.com/common.css
, or the shorter verision of appending a/
, e.g.@import url(/common.css);
- There is a shortcut, which is to use the base element. E.g.
<base href="/">
or<base href="http://www.baconbutty.com/">
. BUT WATCH OUT - this will also affect your in-page bookmarks. So it is not really a solution. - Do some URL re-writing - see my solution in the Complex Example below.
Accordingly, if you are going to move to static URLs, you need to ensure that you adequately deal with
relative links
andbookmarks
separately.
Complex Example
Here is a complex example you can test with my site.
This script allows you to type http://www.baconbutty.com/blog/<entry-number>/
and preserves all relative links.
It breaks down as follows:-
Variable Initialisation
################################################################
VARIABLES:
map path into SCRATCH:DOCUMENT_ROOT from /
set SCRATCH:ORIGINAL_URL = %{URL}
set SCRATCH:REQUEST_URI = %{URL}
match URL into $ with ^([^\?]+)\?(.+)$
if matched then
set SCRATCH:REQUEST_URI = $1
set SCRATCH:QUERY_STRING = $2
set SCRATCH:QUERY_STRING_QUESTION = ?$2
set SCRATCH:QUERY_STRING_AMP = &$2
else
match URL into $ with ^([^\?]+)(.*)$
if matched then
set SCRATCH:REQUEST_URI = $1
set SCRATCH:QUERY_STRING = $2
set SCRATCH:QUERY_STRING_QUESTION = $2
set SCRATCH:QUERY_STRING_AMP = $2
endif
endif
VARIABLES_END:
This stores some basic variable information, and is borrowed from work I have seen on the net.
I store both QUERY_STRING
and QUERY_STRING_QUESTION
, and also store empty string versions (the second match), useful for re-attaching the query string later without having to re-calculate whether it is empty etc.
Adjust the relative links
################################################################
RELATIVE_LINKS:
match URL into $ with /blog/[0-9]+/([^\?]+(\?.+)?)$
if not matched then goto RELATIVE_LINKS_END
if matched then
set URL = /$1
goto END
endif
RELATIVE_LINKS_END:
If you type in http://www.baconbutty.com/blog/12/
then all relative links, e.g. my css files will be attached thus http://www.baconbutty.com/blog/12/somerelativelink.css
.
The above rewrite strips out the /blog/12/
part - simple!
Add a terminating /
################################################################
CANONICAL:
match SCRATCH:REQUEST_URI into $ with ^(/blog/[0-9]+)$
if not matched then goto CANONICAL_END
if matched then
set URL = $1/%{SCRATCH:QUERY_STRING_QUESTION}
set RESPONSE = 301
set OUT:Location = %{URL}
set BODY = Please try <a href="%{URL}">here</a> instead\n
endif
CANONICAL_END:
If you type /blog/12
this will add a terminating /
.
It does a 301 external redirect, equivalent to Apache [R=301]
. Note that you need to set the BODY
if you want this to work.
It adds back in the query string with %{SCRATCH:QUERY_STRING_QUESTION}
. You could also use this type of technique to emulate Apache's [QSA]
flag.
The external re-direct terminates the script here automatically; an implicit [L]
.
Correct previous dynamic links
################################################################
PREVIOUS:
match SCRATCH:ORIGINAL_URL into $ with ^/entry\.php\?id=([0-9]+)(&(.+))?$
if not matched then goto PREVIOUS_END
if matched then
set SCRATCH:TMP = $2
match SCRATCH:TMP into % with (&.+)
if matched then set URL = /blog/$1/?%1
if not matched then set URL = /blog/$1/
set RESPONSE = 301
set OUT:Location = %{URL}
set BODY = Please try <a href="%{URL}">here</a> instead\n
endif
PREVIOUS_END:
This is just for illustration. If you type http://www.baconbutty.com/entry.id?id=12
it will be re-written to the static URL form.
It also preserves any other part of the query string.
Internal redirect to blog entry php
################################################################
BLOG_ENTRY:
match SCRATCH:REQUEST_URI into $ with ^/blog/([0-9]+)/$
if matched then
set URL = /blog-entry.php?id=$1%{SCRATCH:QUERY_STRING_AMP}
endif
BLOG_ENTRY_END:
This is the real work-horse, the final internal redirect (no address bar update) to the dynamic file that delivers the blog entry.
Again I have %{SCRATCH:QUERY_STRING_AMP}
which is effectively an Apache [QSA]
.
Sorry, comments have been suspended. Too much offensive comment spam is causing the site to be blocked by firewalls (which ironically therefore defeats the point of posting spam in the first place!). I don't get that many comments anyway, so I am going to look at a better way of managing the comment spam before reinstating the comments.