存档

文章标签 ‘ISAPI_Rewrite’

ISAPI_rewrite中文手册及多站点配置方法

2009年9月3日
ISAPI_rewrite中文手册及多站点配置方法已关闭评论

 在NT 2000 XP和2003平台上,在系统帐户下应该INETINFO程序应该与IIS5以共存模式过滤器运行。所以系统帐户应该给予对所有的ISAPI-REWIRITE DLLS 和所有的HTTPD。INI文件至少可读权限,我们也推荐对给予系统帐户对于所有包括HTTPD。INI文件的文件夹的可写权限,这将允许产生HTTP。PARSE。ERRORS文件,这些文件包含配置文件语法错误。对于PROXY模块也需要额外的权限,因为它将运行于连接池或HIGH-ISPLATED应用模式,IIS帐户共享池和HIGH-ISOLATION池应被给予 对RWHELPERE。DLL的可读权限。缺省情况下IWAM-《计算机名》被用于所有的池,在相应的COM+应用设置中应借助COM+ADMINISTRATION MMC SNAP-IN建立池帐户

配置文件格式化

有两种形式的配置文件-GLOBAL(SERVER-LEVEL)和INDIVIDUAL(SITE-LEVAL)文件,GLOBAL配置文件应被命名为HTTPD.INI并出现在ISAPI-REWRITE安装目录中,文件的快捷方式通过开始菜单提供,INDIVIDUAL配置文件应名为HTTPD。INI并且能够出现在虚拟站点的物理根目录中,两种类型的格式化是相同的并是标准的WINDOWS。INI文件,所有的指令都应该放在这一部分并且所有指令都应该以分隔线放置,任何这一部分以外的文本都将被忽略

HTTPD.INI文件示例
 

Code:
[ISAPI_Rewrite]

# This is a comment

# 300 = 5 minutes
CacheClockRate 300
RepeatLimit 20

# Block external access to the httpd.ini and httpd.parse.errors files
RewriteRule /httpd(?:\.ini|\.parse\.errors) / [F,I,O]

# Block external access to the Helper ISAPI Extension
RewriteRule .*\.isrwhlp / [F,I,O]

# Some custom rules
RewriteCond Host: (.+)

RewriteCond 指令

Syntax:(句法) RewriteCond TestVerb CondPattern [Flags]

这一指令定义一个条件规则,在 RewriteRule 或者 RewriteHeader或 RewriteProxy指令前预行RewriteCond指令,后面的规则 只有它的,模式匹配URI的当前状态并且额外的条件也被应用才会被应用。

TestVerb

Specifies verb that will be matched against regular expression.
特别定义的动词匹配规定的表达式

TestVerb=(URL | METHOD | VERSION | HTTPHeaderName: | %ServerVariable) where:

URL – returns Request-URI of client request as described in RFC 2068 (HTTP 1.1);
返回客户端在RFC2068中描述的需求的Request-URI
METHOD – returns HTTP method of client request (OPTIONS, GET, HEAD, POST, PUT, DELETE or TRACE);
返回客户端需求(OPTIONS, GET, HEAD, POST, PUT, DELETE or TRACE)的HTTP方法
VERSION – returns HTTP version;
返回HTTP版本
HTTPHeaderName – returns value of the specified HTTP header. HTTPHeaderName can be any valid HTTP header name. Header names should include the trailing colon ":". If specified header does not exists in a client’s request TestVerb is treated as empty string.
返回特定义的HTTP头文件的值

Code:
HTTPHeaderName =
Accept:
Accept-Charset:
Accept-Encoding:
Accept-Language:
Authorization:
Cookie:
From:
Host:
If-Modified-Since:
If-Match:
If-None-Match:
If-Range:
If-Unmodified-Since:
Max-Forwards:
Proxy-Authorization:
Range:
Referer:
User-Agent:
Any-Custom-Header

得到更多的关于HTTP头文件的和他们的值的信息参考RFC2068

ServerVariable 返回特定义的服务器变量的值 。例如服务器端口,全部服务器变量列表应在IIS文档中建立,变量名应用%符预定;
CondPattern
The regular expression to match TestVerb
规则表达式匹配TestVerb
 

Code:
[Flags]
Flags is a comma-separated list of the following flags:

O (nOrmalize)

Normalizes string before processing. Normalization includes removing of an URL-encoding, illegal characters, etc. This flag is useful with URLs and URL-encoded headers

RewriteRule 指令

Syntax: RewriteRule Pattern FormatString [Flags]
这个指令可以不止发生一次,每个指令定义一个单独的重写规则,这些规则的定义命令很重要,因为这个命令在应用运行时规则是有用途的

I (ignore case)

不管大小写强行指定字符匹配,这个FLAG影响RewriteRule指令和相应的RewriteCond 指令

F (Forbidden)

对客户端做反应,停止REWRITING进程并且发送403错误,注意在这种情况下FORMATSTRING 是无用的并可以设置为任何非空字符串。

L (last rule)

不应用任何重写规则在此停止重写进程,使用这个FLAG以阻止当前被重写的URI被后面的规则再次重写

N (Next iteration)

强制REWRITINGENGINE调整规则目标并且从头重启规则检查(所有修改将保存),重启次数由RepeatLimit指定的值限制,如果这个数值超过N FLAG将被忽略

NS (Next iteration of the same rule)

以N标记工作不从相同的规则重启规则规则进程(例如强制重复规则应用),通过RepeatLimit指令指定一个反复实行某一规则的最大数目,

P (force proxy)

强制目的URI在内部强制为代理需求并且立即通过ISAPI扩展应付代理需求,必须确认代理字符串是一个有效的URI包括协议 主机等等否则代理将返回错误

R (explicit redirect)

强制服务器对客户端发出重定向指示即时应答,提供目的URI的新地址,重定向规则经常是最后规则

RP (permanent redirect)

几乎和[R]标记相同但是发布301HTTP状态而不是302HTTP状态代码

U (Unmangle Log)

当URI是源需求而不是重写需求时记载URI

O (nOrmalize)

在实行之前标准化字符串。标准化包括URL-ENCODING,不合法的字符的再移动等,这个标记对于URLS和URLS-ENDODED头是有用的

CL (Case Lower)

小写

CU (Case Upper)

大写

RewriteHeader directive
 

Code:
Syntax: RewriteHeader HeaderName Pattern FormatString [Flags]

这个指令是RewriteRule的更概括化变种,它不仅重写URL的客户端需求部分,而且重写HTTP头,这个指令不仅用于重写。生成,删除任何HTTP头,甚至改变客户端请求的方法

HeaderName

指定将被重写的客户头,可取的值与 RewriteCond 指令中TestVerb参数相同

Pattern
限定规则表达式以匹配Request-URI,
FormatString
限定将生成新的URI的FormatString
[Flags]
是一个下列FLAGS的命令分隔列表

I (ignore case)
不管大小写强行指定字符匹配,这个FLAG影响RewriteRule指令和相应的RewriteCond 指令
F (Forbidden)
对客户端做反应,停止REWRITING进程并且发送403错误,注意在这种情况下FORMATSTRING 是无用的并可以设置为任何非空字符串。
L (last rule)
不应用任何重写规则在此停止重写进程,使用这个FLAG以阻止当前被重写的URI被后面的规则再次重写
N (Next iteration)
强制REWRITINGENGINE调整规则目标并且从头重启规则检查(所有修改将保存),重启次数由RepeatLimit指定的值限制,如果这个数值超过N FLAG将被忽略

NS (Next iteration of the same rule)
以N标记工作不从相同的规则重启规则规则进程(例如强制重复规则应用),通过RepeatLimit指令指定一个反复实行某一规则的最大数目,

R (explicit redirect)
强制服务器对客户端发出重定向指示即时应答,提供目的URI的新地址,重定向规则经常是最后规则
RP (permanent redirect)
几乎和[R]标记相同但是发布301HTTP状态而不是302HTTP状态代码
U (Unmangle Log)
当URI是源需求而不是重写需求时记载URI
O (nOrmalize)
在实行之前标准化字符串。标准化包括URL-ENCODING,不合法的字符的再移动等,这个标记对于URLS和URLS-ENDODED头是有用的
CL (Case Lower)
小写
CU (Case Upper)
大写

要重移动头,FORMAT STRING模式应该生成一个空字符串,例如这一规则将从客户请求中重移代理信息
RewriteHeader User-Agent: .* $0
并且这一规则将把OLD-URL HEADER 加入请求中。
RewriteCond URL (.*)RewriteHeader Old-URL: ^$ $1
最后一个例子将通过改变请求方法定向所有的WEBDAV请求到/WEBDAV。ASP
 

Code:
RewriteCond METHOD OPTIONS
RewriteRule (.*) /webdav.asp?$1
RewriteHeader METHOD OPTIONS GET
RewriteProxy directive
Syntax: RewriteProxy Pattern FormatString [Flags]

强制目的URI在内部强制为代理需求并且立即通过ISAPI扩展应付代理需求,这将允许IIS作为代理服务器并且重路由到其他站点和服务器
Pattern
限定规则表达式以匹配Request-URI,
FormatString
限定将生成新的URI的FormatString
[Flags]
是一个下列FLAGS的命令分隔列表
D (Delegate security)
代理模式将试图以当前假冒的用户资格登陆远程服务器,
C (use Credentials)
代理模式将试图一在URL或基本授权头文件中指定的资格登陆远程服务器,用这个标记你可以使用http://user:password@host.com/path/ syntax 作为URL
F (Follow redirects)
缺省情况下ISAPI_Rewrite 将试图将MAP远程服务器返回的重定向指令到本地服务器命名空间,如果远程服务器返回重定向点到那台服务器其他的某个位置,ISAPI_Rewrite 将修改这一重定向指令指向本服务器名,这将避免用户看到真实(内部)服务器名称
使用F标记强制代理模式内部跟踪远程服务器返回的重定向指令,使用这个标记如果你根本不需要接受远程服务器的重定向指令,在WINHTTP设置中有重定向限制以避免远程重定向循环

I (ignore case)
不管大小写强行指定字符匹配
U (Unmangle Log)
当URI是源需求而不是重写需求时记载URI
O (nOrmalize)
在实行之前标准化字符串。标准化包括URL-ENCODING,不合法的字符的再移动等,这个标记对于URLS和URLS-ENDODED头是有用的
CacheClockRate directive
Syntax: CacheClockRate Interval
这个指令只在GLOBAL配置内容中出现,如果这个指令在SITE-LEVEL内容中出现将被忽略并把错误信息写入httpd.parse.errors 文件
ISAPI_Rewrite caches每次在第一次加载时配置,使用这个指令你可以限定当一个特定站点从缓存中清理的不活动周期,把这个参数设置的足够大你可以强制ISAPI_Rewrite 永不清理缓存,记住任何配置文件的改变将在下次请求后立即更新而忽略这个周期
Interval
限定特定配置被清理出缓存的不作为时间(以秒计),缺省值3600(1小时)
EnableConfig and DisableConfig directives
Syntax:
EnableConfig [SiteID|"Site name"]
DisableConfig [SiteID|"Site name"]
对所选站点激活或不激活SITE-LEVEL配置或者改变缺省配置,缺省SITE-LEVEL配置不激活,这个指令只出现在GLOBAL配置内容中
SiteID
Numeric metabase identifier of a site

Site name
Name of the site as it appears in the IIS console
不用参数使用这个命令将改变缺省配置到ENABLE/DISABLE配置进程

例子

下面例子将使配置仅作用于ID=1(典型是缺省站点)名字是MY SITE的站点
 

Code:
DisableConfig
EnableConfig 1
EnableConfig"My site"

下边例子将激活名称为SOMESITE配置因为它分割设置重载了缺省设置
 

Code:
EnableConfig"Some site"
DisableConfig
EnableRewrite and DisableRewrite directives
Syntax:
EnableRewrite [SiteID|"Site name"]
DisableRewrite [SiteID|"Site name"]

对所选站点激活或不激活重写或者改变缺省配置,缺省重写配置激活,这个指令只出现在GLOBAL配置内容中
 

Code:
SiteID
Numeric metabase identifier of a site

Site name
Name of the site as it appears in the IIS console.

不使用参数这个命令将全部激活或者不激活

RepeatLimit directive

Syntax: RepeatLimit Limit
这个指令可以出现在GLOBAL和SITE-LEVEL配置文件中,如果出现在GLOBAL配置文件中竟改变GLOBAL对于所有站点的限制,出现在SITE-LEVEL配置中竟只改变对于这个站点的限制并且这个限制不能超过GLOBAL限制
ISAPI_Rewrite在实行规则时允许循环,这个指令允许限制最大可能循环的数量,可以设置为0或1而不支持循环,
LIMIT
限制最大循环数量,缺省32

RFStyle directive

Syntax: RFStyle Old | New

Configuration Utility
ISAPI_Rewrite Full包括配置功用(可以在 ISAPI_Rewrite 程序组中启动),它允许你浏览测试状态并输入注册码(如果在安装过程中没有注册),并且调整部分与代理模式操作相关的产品功能,UTILITY是由三个页面组成的属性表

Trial page允许你浏览TIRAL状态并输入注册码(如果在安装过程中没有注册)

Settings page

这页包含对下列参数的编辑框

Helper URL

这个参数影响过滤器和代理模块之间的联系方式,它即可以是以点做前缀的文件扩展名(如 .isrwhlp)也可以是绝对路径,

第一种情况下扩展名将追加在初始请求URI上并且代理模块竟通过SCRIPT MAP激活,缺省扩展名isrwhlp在安装进程中加在global script map 中,如果你改变这个扩展名或者你的应用不继承global script map 设置你应该手动添加向script map 所需求的入口。这个应该有如下参数
 

Code:
Executable: An absolute path to the rwhelper.dll in the short form
Extension: Desired extension (.isrwhlp is default)
Verbs radio button: All Verbs
Script engine checkbox: Checked
Check that file exists checkbox: Unchecked

我们已经创建了一个WSH script proxycfg.vbs ,可以简单在一个a script maps中注册,她位于安装文件夹并且可以在命令行一如下方式运行

cscript proxycfg.vbs [-r] [MetabasePath]
Optional -r 强制注册扩展名
Optional MetabasePath parameter allows specification of the first metabase key to process. By default it is "/localhost/W3SVC".
要在所有现存的 script maps 中注册你可以以如下命令行激活 script
cscript proxycfg.vbs -r
第二种情况下你应该提供一个URI作为’Helper URL’的值,你也应该map 一个 ISAPI_Rewrite的安装文件夹作为美意个站点的虚拟文件家
注意:根据顾客反应,IIS5(也许包括IIS4)对长目录名有问题。所以我们强烈推荐使用短目录名
Worker threads limit
这个参数限制在代理扩展线程池中工作线程数,缺省为0意味着这个限制等于处理器数量乘以2
Active threads limit
这个参数限制当前运行线程数,这个数量不可大于"Worker threads limit". 缺省0意思是等于处理器数量
Queue size 这个参数定义最大请求数量,如果你曾经看到Queue timeout expired" 信息在 the Application event log中你可以增加这个参数
Queue timeout
这个参数定义你在内部请求队列中防止新请求的最大等待时间,如果你曾经看到Queue timeout expired" 信息在 the Application event log中你可以增加这个参数
Connect timeout
以毫秒设定代理模块连接超时
Send timeout
以毫秒设定代理模块发送超时
Receive timeout
以毫秒设定代理模块发送超时
About page.
It contains copyright information and a link to the ISAPI_Rewrite’s web site.

Regular expression syntax

这一部分覆盖了 ISAPI_Rewrite规定的表达句法

Literals

所有字符都是原意除了 ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^" and "$".,这些字符在用"\"处理时是原意,原意指一个字符匹配自身

Wildcard
The dot character "." matches any single character except null character and newline character
以下为句法

Repeats

A repeat is an expression that is repeated an arbitrary number of times. An expression followed by "*" can be repeated any number of times including zero. An expression followed by "+" can be repeated any number of times, but at least once. An expression followed by "?" may be repeated zero or one times only. When it is necessary to specify the minimum and maximum number of repeats explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a" repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with no upper limit. Note that there must be no white-space inside the {}, and there is no upper limit on the values of the lower and upper bounds. All repeat expressions refer to the shortest possible previous sub-expression: a single character; a character set, or a sub-expression grouped with "()" for example.

Examples:
 

Code:
"ba*" will match all of "b", "ba", "baaa" etc.
"ba+" will match "ba" or "baaaa" for example but not "b".
"ba?" will match "b" or "ba".
"ba{2,4}" will match "baa", "baaa" and "baaaa".
Non-greedy repeats
Non-greedy repeats are possible by appending a ‘?’ after the repeat; a non-greedy repeat is one which will match the shortest possible string.

For example to match html tag pairs one could use something like:
 

Code:
"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"

In this case $1 will contain the text between the tag pairs, and will be the shortest possible matching string.

Parenthesis
Parentheses serve two purposes, to group items together into a sub-expression, and to mark what generated the match. For example the expression "(ab)*" would match all of the string "ababab". All sub matches marked by parenthesis can be back referenced using \N or $N syntax. It is permissible for sub-expressions to match null strings. Sub-expressions are indexed from left to right starting from 1, sub-expression 0 is the whole expression.

Non-Marking Parenthesis
Sometimes you need to group sub-expressions with parenthesis, but don’t want the parenthesis to spit out another marked sub-expression, in this case a non-marking parenthesis (?:expression) can be used. For example the following expression creates no sub-expressions:

"(?:abc)*"

Alternatives
Alternatives occur when the expression can match either one sub-expression or another, each alternative is separated by a "|". Each alternative is the largest possible previous sub-expression; this is the opposite behaviour from repetition operators.

Examples:

"a(b|c)" could match "ab" or "ac".
"abc|def" could match "abc" or "def".
Sets
A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by "[" and "]" and can contain literals, character ranges, character classes, collating elements and equivalence classes. Set declarations that start with "^" contain the compliment of the elements that follow.

Examples:

Character literals:

"[abc]" will match either of "a", "b", or "c".
"[^abc] will match any character other than "a", "b", or "c".
Character ranges:

"[a-z]" will match any character in the range "a" to "z".
"[^A-Z]" will match any character other than those in the range "A" to "Z".
Character classes
Character classes are denoted using the syntax "[:classname:]" within a set declaration, for example "[[:space:]]" is the set of all whitespace characters. The available character classes are:

alnum Any alpha numeric character.
alpha Any alphabetical character a-z and A-Z. Other characters may also be included depending upon the locale.
blank Any blank character, either a space or a tab.
cntrl Any control character.
digit Any digit 0-9.
graph Any graphical character.
lower Any lower case character a-z. Other characters may also be included depending upon the locale.
print Any printable character.
punct Any punctuation character.
space Any whitespace character.
upper Any upper case character A-Z. Other characters may also be included depending upon the locale.
xdigit Any hexadecimal digit character, 0-9, a-f and A-F.
word Any word character – all alphanumeric characters plus the underscore.
unicode Any character whose code is greater than 255, this applies to the wide character traits classes only.

There are some shortcuts that can be used in place of the character classes:

\w in place of [:word:]
\s in place of [:space:]
\d in place of [:digit:]
\l in place of [:lower:]
\u in place of [:upper:]
Collating elements
Collating elements take the general form [.tagname.] inside a set declaration, where tagname is either a single character, or a name of a collating element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is equivalent to [,]. ISAPI_Rewrite supports all the standard POSIX collating element names, and in addition the following digraphs: "ae", "ch", "ll", "ss", "nj", "dz", "lj", each in lower, upper and title case variations. Multi-character collating elements can result in the set matching more than one character, for example [[.ae.]] would match two characters, but note that [^[.ae.]] would only match one character.

Equivalence classes
Equivalenceclassestakethegeneralform[=tagname=] inside a set declaration, where tagname is either a single character, or a name of a collating element, and matches any character that is a member of the same primary equivalence class as the collating element [.tagname.]. An equivalence class is a set of characters that collate the same, a primary equivalence class is a set of characters whose primary sort key are all the same (for example strings are typically collated by character, then by accent, and then by case; the primary sort key then relates to the character, the secondary to the accentation, and the tertiary to the case). If there is no equivalence class corresponding to tagname, then [=tagname=] is exactly the same as [.tagname.].

To include a literal "-" in a set declaration then: make it the first character after the opening "[" or "[^", the endpoint of a range, a collating element, or precede it with an escape character as in "[\-]". To include a literal "[" or "]" or "^" in a set then make them the endpoint of a range, a collating element, or precede with an escape character.

Line anchors
An anchor is something that matches the null string at the start or end of a line: "^" matches the null string at the start of a line, "$" matches the null string at the end of a line.

Back references
A back reference is a reference to a previous sub-expression that has already been matched, the reference is to what the sub-expression matched, not to the expression itself. A back reference consists of the escape character "\" followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2" to the second etc. For example the expression "(.*)\1" matches any string that is repeated about its mid-point for example "abcabc" or "xyzxyz". A back reference to a sub-expression that did not participate in any match, matches the null string. In ISAPI_Rewrite all back references are global for entire RewriteRule and corresponding RewriteCond directives. Sub matches are numbered up to down and left to right beginning from the first RewriteCond directive of the corresponding RewriteRule directive, if there is one.

Forward Lookahead Asserts
There are two forms of these; one for positive forward lookahead asserts, and one for negative lookahead asserts:

"(?=abc)" matches zero characters only if they are followed by the expression "abc".
"(?!abc)" matches zero characters only if they are not followed by the expression "abc".

Word operators
The following operators are provided for compatibility with the GNU regular expression library.

"\w" matches any single character that is a member of the "word" character class, this is identical to the expression "[[:word:]]".
"\W" matches any single character that is not a member of the "word" character class, this is identical to the expression "[^[:word:]]".
"\<" matches the null string at the start of a word.
"\>" matches the null string at the end of the word.
"\b" matches the null string at either the start or the end of a word.
"\B" matches a null string within a word.
Escape operator
The escape character "\" has several meanings.

The escape operator may introduce an operator for example: back references, or a word operator.
The escape operator may make the following character normal, for example "\*" represents a literal "*" rather than the repeat operator.
Single character escape sequences:
The following escape sequences are aliases for single characters:

Escape sequence Character code Meaning
\a 0x07 Bell character.
\t 0x09 Tab character.
\v 0x0B Vertical tab.
\e 0x1B ASCII Escape character.
\0dd 0dd An octal character code, where dd is one or more octal digits.
\xXX 0xXX A hexadecimal character code, where XX is one or more hexadecimal digits.
\x{XX} 0xXX A hexadecimal character code, where XX is one or more hexadecimal digits, optionally a unicode character.
\cZ z-@ An ASCII escape sequence control-Z, where Z is any ASCII character greater than or equal to the character code for ‘@’.

Miscellaneous escape sequences:
The following are provided mostly for perl compatibility, but note that there are some differences in the meanings of \l \L \u and \U:

Escape sequence Meaning
\w Equivalent to [[:word:]].
\W Equivalent to [^[:word:]].
\s Equivalent to [[:space:]].
\S Equivalent to [^[:space:]].
\d Equivalent to [[:digit:]].
\D Equivalent to [^[:digit:]].
\l Equivalent to [[:lower:]].
\L Equivalent to [^[:lower:]].
\u Equivalent to [[:upper:]].
\U Equivalent to [^[:upper:]].
\C Any single character, equivalent to ‘.’.
\X Match any Unicode combining character sequence, for example "a\x 0301" (a letter a with an acute).
\Q The begin quote operator, everything that follows is treated as a literal character until a \E end quote operator is found.
\E The end quote operator, terminates a sequence begun with \Q.
What gets matched?
The regular expression will match the first possible matching string, if more than one string starting at a given location can match then it matches the longest possible string. In cases where their are multiple possible matches all starting at the same location, and all of the same length, then the match chosen is the one with the longest first sub-expression, if that is the same for two or more matches, then the second sub-expression will be examined and so on. Note that ISAPI_Rewrite uses MATCH algorithm. The result is matched only if the expression matches the whole input sequence. For example:

RewriteCond URL ^/somedir/.* #will match any request to somedir directory and subdirectories, while
RewriteCond URL ^/somedir/ #will match only request to the root of the somedir.
Special note about "pathological" regular expressions
ISAPI_Rewrite uses a very powerful regular expressions engine Regex++ written by Dr. John Maddock. But as any real thing it’s not ideal: There exists some "pathological" expressions which may require exponential time for matching; these all involve nested repetition operators, for example attempting to match the expression "(a*a)*b" against N letter a’s requires time proportional to 2N. These expressions can (almost) always be rewritten in such a way as to avoid the problem, for example "(a*a)*b" could be rewritten as "a*b" which requires only time linearly proportional to N to solve. In the general case, non-nested repeat expressions require time proportional to N2, however if the clauses are mutually exclusive then they can be matched in linear time – this is the case with "a*b", for each character the matcher will either match an "a" or a "b" or fail, where as with "a*a" the matcher can’t tell which branch to take (the first "a" or the second) and so has to try both.

Boost 1.29.0 Regex++ could detect "pathological" regular expressions and terminate theirs matching. When a rule fails ISAPI_Rewrite sends "500 Internal Server error – Rule Failed" status to a client to indicate configuration error. Also the failed rule is disabled to prevent performance losses
Format string syntax
In format strings, all characters are treated as literals except: "(", ")", "$", "\", "?", ":".

To use any of these as literals you must prefix them with the escape character
The following special sequences are recognized:

Grouping:
Use the parenthesis characters ( and ) to group sub-expressions within the format string, use \( and \) to represent literal ‘(‘ and ‘)’.

Sub-expression expansions:
The following perl like expressions expand to a particular matched sub-expression:

$` Expands to all the text from the end of the previous match to the start of the current match, if there was no previous match in the current operation, then everything from the start of the input string to the start of the match.
$’ Expands to all the text from the end of the match to the end of the input string.
$& Expands to all of the current match.
$0 Expands to all of the current match.
$N Expands to the text that matched sub-expression N.

Conditional expressions:
Conditional expressions allow two different format strings to be selected dependent upon whether a sub-expression participated in the match or not:

?Ntrue_expression:false_expression

Executes true_expression if sub-expression N participated in the match, otherwise executes false_expression.

Example: suppose we search for "(while)|(for)" then the format string "?1WHILE:FOR" would output what matched, but in upper case.

Escape sequences:
The following escape sequences are also allowed:

\a The bell character.
\f The form feed character.
\n The newline character.
\r The carriage return character.
\t The tab character.
\v A vertical tab character.
\x A hexadecimal character – for example \x0D.
\x{} A possible unicode hexadecimal character – for example \x{1A0}
\cx The ASCII escape character x, for example \c@ is equivalent to escape-@.
\e The ASCII escape character.
\dd An octal character constant, for example \10

Examples例子

Emulating host-header-based virtual sites on a single site
例如你在两个域名注册www.site1.com 和 www.site2.com,现在你可以创建两个不同的站点而使用单一的物理站点。把以下规则加入到你的httpd.ini 文件
 

Code:
[ISAPI_Rewrite]

#Fix missing slash char on folders
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,R]

#Emulate site1
RewriteCond Host: (?:www\.)?site1\.com
RewriteRule (.*) /site1$1 [I,L]

#Emulate site2
RewriteCond Host: (?:www\.)?site2\.com
RewriteRule (.*) /site2$1 [I,L]

现在你可以把你的站点放在/site1 和 /site2 目录中.

或者你可以应用更多的类规则:
 

Code:
[ISAPI_Rewrite]

#Fix missing slash char on folders
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,R]

RewriteCond Host: (www\.)?(.+)
RewriteRule (.*) /$2$3

为站点应该命名目录为 /somesite1.com, /somesite2.info, etc.
Using loops (Next flag) to convert request parameters
假如你希望有物理URL如 http://www.myhost.com/foo.asp?a=A&b=B&c=C 使用请求如 http://www.myhost.com/foo.asp/a/A/b/B/c/C 参数数量可以从两个请求之间变化

至少有两个解决办法。你可以简单的为每一可能的参数数量添加一个分隔规则或者你可以使用一个技术说明如下面的例子
 

Code:
ISAPI_Rewrite]
RewriteRule (.*?\.asp)(\?[^/]*)?/([^/]*)/([^/]*)(.*) $1(?2$2&:\?)$3=$4$5 [NS,I]

这个规则将从请求的URL中抽取一个参数追加在请求字符的末尾并且从头重启规则进程。所以它将循环直到所有参数被移动到适当的位置,或者直到超过RepeatLimit
也存在许多这个规则的变种。但使用不同的分隔字符,例如。使用URLS如http://www.myhost.com/foo.asp~a~A~b~B~c~C 可以应中下面的规则:
 

Code:
[ISAPI_Rewrite]
RewriteRule (.*?\.asp)(\?[^~]*)?~([^~]*)~([^~]*)(.*) $1(?2$2&:\?)$3=$4$5 [NS,I]
Running servers behind IIS

假如我们有一个内网服务器运行IIS而几个公司服务器运行其他平台,这些服务器不能从INTERNET直接进入,而只能从我们公司的网络进入,有一个简单的例子可以使用代理标记映射其他服务器到IIS命名空间:

 

Code:
[ISAPI_Rewrite]
RewriteProxy /mappoint(.+) http\://sitedomain$1 [I,U]
Moving sites from UNIX to IIS

这个规则可以帮助你把URL从 /~username 改变到 /username 和从 /file.html 改变到 /file.htm. 这个在你仅仅把你的站从UNIX移动到IIS并且保持搜索引擎和其他外部页面对老页面的连接时是有用的

 

Code:
[ISAPI_Rewrite]

#redirecting to update old links
RewriteRule (.*)\.html $1.htm
RewriteRule /~(.*) http\://myserver/$1 [R]

Moving site location

许多网管问这样的问题:他们要重定向所有的请求到一个新的网络服务器,当你需要建立一个更新的站点取代老的的时候经常出现这样的问题,解决方案是用ISAPI_Rewrite 于老服务器中
 

Code:
[ISAPI_Rewrite]

#redirecting to update old links
RewriteRule (.+) http\://newwebserver$1 [R]

Browser-dependent content
Dynamically generated robots.txt

robots.txt是一个搜索引擎用来发现能不能被索引的文件,但是为一个大站创建一个有许多动态内容的这个文件是很复杂的工作,我们可以写一个robots.asp script

现在使用单一规则生成 robots.txt
 

Code:
[ISAPI_Rewrite]

RewriteRule /robots\.txt /robots.asp
Making search engines to index dynamic pages

站点的内容存储在XML文件中,在服务器上有一个/XMLProcess.asp 文件处理XML文件并返回HTML到最终用户,URLS到文档有如下形式
http://www.mysite.com/XMLProcess.asp?xml=/somdir/somedoc.xml
但是许多公共引擎不能索引此类文档,因为URLS包含问号(文档动态生成),
ISAPI_Rewrite可以完全消除这个问题
 

Code:
[ISAPI_Rewrite]

RewriteRule /doc(.*)\.htm /XMLProcess.asp\?xml=$1.xml

现在使用如同http://www.mysite.com/doc/somedir/somedoc.htm的URL进入文档,搜索引擎将不知道不是somedoc.htm 文件并且内容是动态生成的

Negative expressions (NOT

有时当模式不匹配你需要应用规则,这种情况下你可以使用在规则表达式中称为Forward Lookahead Asserts

例如你需要不使用IE把所有用户移动到别的地点
 

Code:
[ISAPI_Rewrite]
# Redirect all non Internet Explorer users
# to another location
RewriteCond User-Agent: (?!.*MSIE).*
RewriteRule (.*) /nonie$1
Dynamic authentification

例如我们在站点上有一些成员域,我们在这个域上需要密码保护文件而我们不喜欢用BUILT-IN服务器安全,这个情况下可以建立一个ASP脚本(称为proxy.asp),这个脚本将代理所有请求到成员域并且检查请求允许,这里有一个简单的模板你可以放进你自己的授权代码

现在我们要通过配置 ISAPI_Rewrite 通过这个页面代理请求:
 

Code:
[ISAPI_Rewrite]
# Proxy all requests through proxy.asp
RewriteRule /members(.+) /proxy.asp\?http\://mysite.com/members$1
Blocking inline-images (stop hot linking

假设我们在http://www.yundong78.com/下有些页面有一些内联 GIF图片很好,他人可以不直接协商通过盗链到他们的页面上,我们不喜欢这样因为加大了服务器流量
当我们不能100%保护图片,我们至少可以在浏览器发送一个HTTP Referer header的地方限制这种情况
 

Code:
[ISAPI_Rewrite]
RewriteCond Host: (.+)
RewriteCond Referer: (?!http://\1.*).*
RewriteRule .*\.(?:gif|jpg|png) /block.gif [I,O]

多站点配置

只需要将httpd.ini文件放置到相应站点的根目录下即可.

IT技术, 互联网技术 ,

URL Rewriting Using ISAPI_Rewrite

2009年9月3日
URL Rewriting Using ISAPI_Rewrite已关闭评论

URL Rewriting Using ISAPI_Rewrite

by Cristian Darie and Jaimie Sirovich

"Click me!" If the ideal URL could speak, its speech would resemble the communication of an experienced salesman. It would grab your attention with relevant keywords and a call to action; and it would persuasively argue that one should choose it instead of the other one. Other URLs on the page would pale in comparison.

URLs are more visible than many realize, and a contributing factor in CTR. They are often cited directly in copy, and they occupy approximately 20% of the real estate in a given search engine result page. Apart from "looking enticing" to humans, URLs must be friendly to search engines. URLs function as the "addresses" of all content in a web site. If confused by them, a search engine spider may not reach some of your content in the first place. This would clearly reduce search engine friendliness.

So let’s enumerate all of the benefits of placing keywords in URLs:

1. Doing so has a small beneficial effect on search engine ranking in and of itself.

2. The URL is roughly 20% of the real estate you get in a SERP result. It functions as a call to action and increases perceived relevance.

3. The URL appears in the status bar of a browser when the mouse hovers over anchor text that references it. Again-it functions as a call to action and increases perceived relevance.

4. Keyword-based URLs tend to be easier to remember than ?ProductID=5&CategoryID=2.

5. Query keywords, including those in the URL, are highlighted in search result pages.

6. Often, the URL is cited as the actual anchor text, that is:

<a href="http://www.example.com/foo.html">http://www.example.com/foo.html</a>

Obviously, a user is more likely to click a link to a URL that contains relevant keywords, than a link that does not. Also, because keywords in anchor text are a decisive ranking factor, having keywords in the URL-anchor-text will help you rank better for "foos."

To sum up these benefits in one phrase:

Keyword-rich URLs are more aesthetically pleasing and more visible, and are likely to enhance your CTR and search engine rankings.

Implementing URL Rewriting

The hurdle we must overcome to support keyword-rich URLs like those shown earlier is that they don’t actually exist anywhere in your web site. Your site still contains a script-named, say, Product.aspx-which expects to receive parameters through the query string and generate content depending on those parameters. This script would be ready to handle a request such as this:

http://www.example.com/Product.aspx?ProductID=123

but your web server would normally generate a 404 error if you tried any of the following:

http://www.example.com/Products/123.html

http://www.example.com/my-super-product.html

URL rewriting allows you to transform the URL of such an incoming request (which we’ll call the original URL) to a different, existing URL (which we’ll call the rewritten URL), according to a defined set of rules. You could use URL rewriting to transform the previous nonexistent URLs to Product.aspx?ProductID=123, which does exist.

If you happen to have some experience with the Apache web server, you probably know that it ships by default with the mod_rewrite module, which is the standard way to implement URL rewriting in the LAMP (Linux/Apache/MySQL/PHP) world. That is covered in the PHP edition of this book.

Unfortunately, IIS doesn’t ship by default with such a module. IIS 7 contains a number of new features that make URL rewriting easier, but it will take a while until all existing IIS 5 and 6 web servers will be upgraded. Third-party URL-rewriting modules for IIS 5 and 6 do exist, and also several URL-rewriting libraries, hacks, and techniques, and each of them can (or cannot) be used depending on your version and configuration of IIS, and the version of ASP.NET. In this chapter we try to cover the most relevant scenarios by providing practical solutions.

To understand why an apparently easy problem-that of implementing URL rewriting-can become so problematic, you first need to understand how the process really works. To implement URL rewriting, there are three steps:

1. Intercept the incoming request. When implementing URL rewriting, it’s obvious that you need to intercept the incoming request, which usually points to a resource that doesn’t exist on your server physically. This task is not trivial when your web site is hosted on IIS 6 and older. There are different ways to implement URL rewriting depending on the version of IIS you use (IIS 7 brings some additional features over IIS 5/6), and depending on whether you implement rewriting using an IIS extension, or from within your ASP.NET application (using C# or VB.NET code). In this latter case, usually IIS still needs to be configured to pass the requests we need to rewrite to the ASP.NET engine, which doesn’t usually happen by default.

2. Associate the incoming URL with an existing URL on your server. There are various techniques you can use to calculate what URL should be loaded, depending on the incoming URL. The "real" URL usually is a dynamic URL.

3. Rewrite the original URL to the rewritten URL. Depending on the technique used to capture the original URL and the form of the original URL, you have various options to specify the real URL your application should execute.

The result of this process is that the user requests a URL, but a different URL actually serves the request. The rest of the article covers how to implement these steps using ISAPI_Rewrite by Helicontech. For background information on how IIS processes incoming requests, we recommend Scott Mitchell’s article "How ASP.NET Web Pages are Processed on the Web Server," located at http://aspnet.4guysfromrolla.com/articles/011404-1.aspx.

URL Rewriting with ISAPI_Rewrite v2

Using a URL rewriting engine such as

Helicon

‘s ISAPI_Rewrite has the following advantages over writing your own rewriting code:

* Simple implementation. Rewriting rules are written in configuration files; you don’t need to write any supporting code.

* Task separation. The ASP.NET application works just as if it was working with dynamic URLs. Apart from the link building functionality, the ASP.NET application doesn’t need to be aware of the URL rewriting layer of your application.

* You can easily rewrite requests for resources that are not processed by ASP.NET by default, such as those for image files, for example.

To process incoming requests, IIS works with ISAPI extensions, which are code libraries that process the incoming requests. IIS chooses the appropriate ISAPI extension to process a certain request depending on the extension of the requested file. For example, an ASP.NET-enabled IIS machine will redirect ASP.NET-specific requests (which are those for .aspx files, .ashx files, and so on), to the ASP.NET ISAPI extension, which is a file named aspnet_isapi.dll.

Figure 3-3 describes how an ISAPI_Rewrite fits into the picture. Its role is to rewrite the URL of the incoming requests, but doesn’t affect the output of the ASP.NET script in any way.

At first sight, the rewriting rules can be added easily to an existing web site, but in practice there are other issues to take into consideration. For example, you’d also need to modify the existing links within the web site content. This is covered in Chapter 4 of Professional Search Engine Optimization with ASP.NET: A Developer’s Guide to SEO.

f0303

Figure 3-3

ISAPI_Rewrite allows the programmer to easily declare a set of rules that are applied by IIS on-the-fly to map incoming URLs requested by the visitor to dynamic query strings sent to various ASP.NET pages. As far as a search engine spider is concerned, the URLs are static.

The following few pages demonstrate URL rewriting functionality by using

Helicon

‘s ISAPI_Rewrite filter. You can find its official documentation at http://www.isapirewrite.com/docs/. Ionic’s ISAPI rewriting module has similar functionality.

In the first exercise we’ll create a simple rewrite rule that translates my-super-product.html to Product.aspx?ProductID=123. This is the exact scenario that was presented in Figure 3-3.

The Product.aspx Web Form is designed to simulate a real product page. The script receives a query string parameter named ProductID, and generates a very simple output message based on the value of this parameter. Figure 3-4 shows the sample output that you’ll get by loading http://seoasp/Product.aspx?ProductID=3.

131473 fg0304

Figure 3-4

In order to improve search engine friendliness, we want to be able to access the same page through a static URL: http://seoasp/my-super-product.html. To implement this feature, we’ll use-you guessed it!-URL rewriting, using

Helicon

‘s ISAPI_Rewrite.

As you know, what ISAPI_Rewrite basically does is to translate an input string (the URL typed by your visitor) to another string (a URL that can be processed by your ASP.NET code). In this exercise we’ll make it rewrite my-super-product.html to Product.aspx?ProductID=123.

This article covers ISAPI_Rewrite version 2. At the moment of writing, ISAPI_Rewrite 3.0 is in beta testing. The new version comes with an updated syntax for the configuration files and rewriting rules, which is compatible to that of the Apache mod_rewrite module, which is the standard rewriting engine in the Apache world. Please visit Cristian’s web page dedicated to this book, http://www.cristiandarie.ro/seo-asp/, for updates and additional information regarding the following exercises.

Exercise: Using

Helicon

‘s ISAPI_Rewrite

1. The first step is to install ISAPI_Rewrite. Navigate to http://www.helicontech.com/download.htmand download ISAPI_Rewrite Lite (freeware). The file name should be something like isapi_rwl_x86.msi. At the time of writing, the full (not freeware) version of the product comes in a different package if you’re using Windows Vista and IIS 7, but the freeware edition is the same for all platforms.

2. Execute the MSI file you just downloaded, and install the application using the default options all the way through.

If you run into trouble, you should visit the Installation section of the product’s manual, at http://www.isapirewrite.com/docs/#install. If you run Windows

Vista

, you need certain IIS modules to be installed in order for ISAPI_Rewrite to function. If you configured IIS as shown in Chapter 1 of the book Professional Search Engine Optimization with ASP.NET: A Developer’s Guide to SEO, you already have everything you need, and the installation of ISAPI_Rewrite should run smoothly.

3. Make sure your IIS web server is running and open the http://seoasp/ web site using Visual Web Developer. (Code samples for this demo site are available from Wrox at http://www.wrox.com/WileyCDA/WroxTitle/productCd-0470131470,descCd-download_code.html.)

4. Create a new Web Form named Product.aspx in your project, with no code-behind file or Master Page. Then modify the generated code as shown in the following code snippet. (Remember that you can have Visual Web Developer generate the Page_Load signature for you by switching to Design view, and double-clicking an empty area of the page or using the Properties window.)

<%@ Page Language="C#" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<script runat="server">

protected void Page_Load(object sender, EventArgs e)

{

// retrieve the product ID from the query string

string productId = Request.QueryString["ProductID"];

// use productId to customize page contents

if (productId != null)

{

// set the page title

this.Title += ": Product " + productId;

// display product details

message.Text =

String.Format("You selected product #{0}. Good choice!", productId);

}

else

{

// display product details

message.Text = "Please select a product from our catalog.";

}

}

</script>

<html xmlns="http://www.w3.org/1999/xhtml" >

<head runat="server">

<title>ASP.NET SEO Shop</title>

</head>

<body>

<form id="form1" runat="server">

<asp:Literal runat="server" ID="message" />

</form>

</body>

</html>

5. Test your Web Form by loading http://seoasp/Product.aspx?ProductID=3. The result should resemble Figure 3-4.

6. Let’s now write the rewriting rule. Open the Program Files/Helicon/ISAPI_Rewrite/httpd.ini file (you can find a shortcut to this file in Programs), and add the following highlighted lines to the file. Note the file is read-only by default. If you use Notepad to edit it, you’ll need to make it writable first.

[ISAPI_Rewrite]

# Translate /my-super.product.html to /Product.aspx?ProductID=123

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123

7. Switch back to your browser again, and this time load http://seoasp/my-super-product.html. If everything works as it should, you should get the output that’s shown in Figure 3-5.

131473 fg0305

Figure 3-5

Congratulations! You’ve just written your first rewrite rule using

Helicon

‘s ISAPI_Rewrite. The free edition of this product only allows server-wide rewriting rules, whereas the commercial edition would allow you to use an application-specific httpd.ini configuration file, located in the root of your web site. However, this limitation shouldn’t affect your learning process.

The exercise you’ve just finished features a very simplistic scenario, without much practical value-at least compared with what you’ll learn next! Its purpose was to install ISAPI_Rewrite, and to ensure your working environment is correctly configured.

You started by creating a very simple ASP.NET Web Form that takes a numeric parameter from the query string. You could imagine this is a more involved page that displays lots of details about the product with the ID mentioned by the ProductID query string parameter, but in our case we’re simply displaying a text message that confirms the ID has been correctly read from the query string.

Product.aspx is indeed very simple! It starts by reading the product ID value:

protected void Page_Load(object sender, EventArgs e)

{

// retrieve the product ID from the query string

string productId = Request.QueryString["ProductID"];

Next, we verify if the value we just read is null. If that is the case, then ProductID doesn’t exist as a query string parameter. Otherwise, we display a simple text message, and update the page title, to confirm that ProductID was correctly read:

// use productId to customize page contents

if (productId != null)

{

// set the page title

this.Title += ": Product " + productId;

// display product details

message.Text =

String.Format("You selected product #{0}. Good choice!", productId);

}

else

{

// display product details

message.Text = "Please select a product from our catalog.";

}

URL Rewriting and ISAPI_Rewrite

As Figure 3-3 describes, the Product.aspx page is accessed after the original URL has been rewritten. This explains why Request.QueryString["ProductID"] reads the value of ProductID from the rewritten version of the URL. This is helpful, because the script works fine no matter if you accessed Product.aspx directly, or if the initial request was for another URL that was rewritten to Product.aspx.

The Request.QueryString collection, as well as the other values you can read through the Request object, work with the rewritten URL. For example, when requesting my-super-product.html in the context of our exercise, Request.RawUrl will return /Product.aspx?ProductID=123.

The rewriting engine allows you to retrieve the originally requested URL by saving its value to a server variable named HTTP_X_REWRITE_URL. You can read this value through Request.ServerVariables["HTTP_X_REWRITE_URL"].This is helpful whenever you need to know what was the original request initiated by the client.

The Request class offers complete details about the current request. The following table describes the most commonly used Request members. You should visit the documentation for the complete list, or use IntelliSense in Visual Web Developer to quickly access the class members.

Server Variable

Description

Request.RawURL

Returns a string representing the URL of the request excluding the domain name, such as /Product.aspx?ID=123. When URL rewriting is involved, RawURL returns the rewritten URL.

Request.Url

Similar to Request.RawURL, except the return value is a Uri object, which also contains data about the request domain.

Request.PhysicalPath

Returns a string representing the physical path of the requested file, such as C:\seoasp\Product.aspx.

Request.QueryString

Returns a NameValueCollection object that contains the query string parameters of the request. You can use this object’s indexer to access its values by name or by index, such as in Request.QueryString[0] or Request.QueryString[ProductID].

Request.Cookies

Returns a NameValueCollection object containing the client’s cookies.

Request.Headers

Returns a NameValueCollection object containing the request headers.

Request.ServerVariables

Returns a NameValueCollection object containing IIS variables.

Request.ServerVariables[HTTP_X_REWRITE_URL]

Returns a string representing the originally requested URL, when the URL is rewritten by

Helicon ‘s ISAPI_Rewrite or IIRF (Ionic ISAPI Rewrite).





 

After testing that Product.aspx works when accessed using its physical name (http://seoasp/Product.aspx?ProductID=123), we moved on to access this same script, but through a URL that doesn’t physically exist on your server. We implemented this feature using

Helicon

‘s ISAPI_Rewrite.

As previously stated, the free version of

Helicon

‘s ISAPI_Rewrite only supports server-wide rewriting rules, which are stored in a file named httpd.ini in the product’s installation folder (\Program Files\Helicon\ISAPI_Rewrite). This file has a section named [ISAPI_Rewrite], usually at the beginning of the file, which can contain URL rewriting rules.

We added a single rule to the file, which translates requests to /my-super-product.html to /Product.aspx?ProductID=123. The line that precedes the RewriteRule line is a comment; comments are marked using the # character at the beginning of the line, and are ignored by the parser:

# Translate my-super.product.html to /Product.aspx?ProductID=123

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123

In its basic form, RewriteRule takes two parameters. The first parameter describes the original URL that needs to be rewritten, and the second specifies what is should be rewritten to. The pattern that describes the form of the original URL is delimited by ^ and $, which mark the beginning and the end of the matched URL. The pattern is written using regular expressions, which you learn about in the next exercise.

In case you were wondering why the .html extension in the rewrite rule has been written as \.html, we will explain it now. In regular expressions-the programming language used to describe the original URL that needs to be rewritten-the dot is a character that has a special significance. If you want that dot to be read as a literal dot, you need to escape it using the backslash character. As you’ll learn, this is a general rule with regular expressions: when special characters need to be read literally, they need to be escaped with the backslash character (which is a special character in turn-so if you wanted to use a backslash, it would be denoted as \\).

At the end of a rewrite rule you can also add one or more flag arguments, which affect the rewriting behavior. For example, the [L] flag, demonstrated in the following example, specifies that when a match is found the rewrite should be performed immediately, without processing any further RewriteRule entries:

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123 [L]

These arguments are specific to the RewriteRule command, and not to regular expressions in general. Table 3-1 lists the possible RewriteRule arguments. The rewrite flags must always be placed in square brackets at the end of an individual rule.

Table 3-1

RewriteRule Option

Significance

Description

I

Ignore case

The regular expression of the RewriteRule and any corresponding RewriteCond directives is performed using case-insensitive matching.

F

Forbidden

In case the RewriteRule regular expression matches, the web server returns a 404 Not Found response, regardless of the format string (second parameter of RewriteRule) specified. Read Chapter 4 for more details about the HTTP status codes.

L

Last rule

If a match is found, stop processing further rules.

N

Next iteration

Restarts processing the set of rules from the beginning, but using the current rewritten URL. The number of restarts is limited by the value specified with the RepeatLimit directive.

NS

Next iteration of the same rule

Restarts processing the rule, using the rewritten URL. The number of restarts is limited by the value specified with the RepeatLimit directive, and is calculated independently of the number of restarts counted for the N directive.

P

Proxy

Immediately passes the rewritten URL to the ISAPI extension that handles proxy requests. The new URL must be a complete URL that includes the protocol, domain name, and so on.

R

Redirect

Sends a 302 redirect status code to the client pointing to the new URL, instead of rewriting the URL. This is always the last rule, even if the L flag is not specified.

RP

Permanent redirect

The same as R, except the 301 status code is used instead.

U

Unmangle log

Log the new URL as it was the originally requested URL.

O

Normalize

Normalize the URL before processing by removing illegal characters, and so on, and also deletes the query string.

CL

Lowercase

Changes the rewritten URL to lowercase.

CU

Uppercase

Changes the rewritten URL to uppercase.

 

Also, you should know that although RewriteRule is arguably the most important directive that you can use for URL rewriting with

Helicon

‘s ISAPI_Rewrite, it is not the only one. Table 3-2 quickly describes a few other directives. Please visit the product’s documentation for a complete reference.

Table 3-2

Directive

Description

RewriteRule

This is the directive that allows for URL rewriting.

RewriteHeader

A generic version of RewriteRule that can rewrite any HTTP headers of the request. RewriteHeader URL is the same as RewriteRule.

RewriteProxy

Similar to RewriteRule, except it forces the result URL to be passed to the ISAPI extension that handles proxy requests.

RewriteCond

Allows defining one or more conditions (when more RewriteCond entries are used) that must be met before the following RewriteRule, RewriteHeader, or RewriteProxy directive is processed.

Introducing Regular Expressions

Before you can implement any really useful rewrite rules, it’s important to learn about regular expressions. We’ll teach them now, while discussing ISAPI_Rewrite, but regular expressions will also be needed when implementing other URL-related tasks, or when performing other kinds of string matching and parsing-so pay attention to this material.

Many love regular expressions, whereas others hate them. Many think they’re very hard to work with, whereas many (or maybe not so many) think they’re a piece of cake. Either way, they’re one of those topics you can’t avoid when URL rewriting is involved. We’ll try to serve a gentle introduction to the subject, although entire books have been written on the subject. The Wikipedia page on regular expressions is great for background information (http://en.wikipedia.org/wiki/Regular_expression).

Appendix A of this book is a generic introduction to regular expressions. You should read it if you find that the theory in the following few pages-which is a fast-track introduction to regular expressions in the context of URL rewriting-is too sparse. For comprehensive coverage of regular expressions we recommend Andrew Watt’s Beginning Regular Expressions (Wrox, 2005).

A regular expression (sometimes referred to as a regex) is a special string that describes a text pattern. With regular expressions you can define rules that match groups of strings, extract data from strings, and transform strings, which enable very flexible and complex text manipulation using concise rules. Regular expressions aren’t specific to ISAPI_Rewrite, or even to URL rewriting in general. On the contrary, they’ve been around for a while, and they’re implemented in many tools and programming languages, including the .NET Framework-and implicitly ASP.NET.

To demonstrate their usefulness with a simple example, we’ll assume your web site needs to rewrite links as shown in Table 3-3.

Table 3-3

Original URL

Rewritten URL

Products/P1.html

Product.aspx?ProductID=1

Products/P2.html

Product.aspx?ProductID=2

Products/P3.html

Product.aspx?ProductID=3

Products/P4.html

Product.aspx?ProductID=4

If you have100,000 products, without regular expressions you’d be in a bit of a trouble, because you’d need to write just as many rules-no more, no less. You don’t want to manage a configuration file with 100,000 rewrite rules! That would be unwieldy.

However, if you look at the Original URL column of the table, you’ll see that all entries follow the same pattern. And as suggested earlier, regular expressions can come to rescue! Patterns are useful because with a single pattern you can match a theoretically infinite number of possible input URLs, so you just need to write a rewriting rule for every type of URL you have in your web site.

In the exercise that follows, we’ll use a regular expression that matches Products/Pn.html, and we’ll use ISAPI_Rewrite to translate URLs that match that pattern to Product.aspx?ProductID=n. This will implement exactly the rules described in Table 3-3.

Exercise: Working with Regular Expressions

1. Open the httpd.ini configuration file and add the following rewriting rule to it.

[ISAPI_Rewrite]

# Defend your computer from some worm attacks

RewriteRule .*(?:global.asa|default\.ida|root\.exe|\.\.).* . [F,I,O]

# Translate my-super.product.html to /Product.aspx?ProductID=123

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123

# Rewrite numeric URLs

RewriteRule ^/Products/P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

2. Switch back to your browser, and load http://seoasp/Products/P1.html. If everything works as planned, you will get the output that’s shown in Figure 3-7.

131473 fg0307

Figure 3-7

3. You can check that the rule really works, even for IDs formed of more digits. Loading http://seoasp/Products/P123456.html would give you the output shown in Figure 3-8.

131473 fg0308

Figure 3-8

Note that by default, regular expression matching is case sensitive. So the regular expression in your RewriteRule directive will match /Products/P123.html, but will not match /products/p123.html, for example. Keep this in mind when performing your tests. To make the matching case sensitive, you need to use the [I] RewriteRule flag, as you’ll soon learn.

Congratulations! The exercise was quite short, but you’ve written your first "real" regular expression! Let’s take a closer look at your new rewrite rule:

RewriteRule ^/Products/P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

If this is your first exposure to regular expressions, it must look scary! Just take a deep breath and read on: we promise, it’s not as complicated as it looks.

As you learned in the previous exercise, a basic RewriteRule takes two arguments. In our example it also received a special flag-[L]-as a third argument. We’ll discuss the meaning of these arguments next.

The first argument of RewriteRule is a regular expression that describes the matching URLs we want to rewrite. The second argument specifies the destination (rewritten) URL-this is not a regular expression. So, in geek-speak, the RewriteRule line from the exercise basically says: "rewrite any URL that matches the ^/Products/P([0-9]+)\.html$ pattern to /Product.aspx?ProductID=$1." In English, the same line can be roughly read as: "delegate any request to a URL that looks like /Products/Pn.html to /Product.aspx?ProductID=n."

In regular expressions, most characters, including alphanumeric characters, are read literally and simply match themselves. Remember the first RewriteRule you’ve written in this chapter to match my-super-product.html, which was mostly created of such "normal" characters. However, what makes regular expressions so powerful (and sometimes complicated), are the special characters (or metacharacters), such as ^, ., or *, which have special meanings. Table 3-4 describes the most frequently used metacharacters.

Table 3-4

Metacharacter

Description

^

Matches the beginning of the line. In our case, it will always match the beginning of the URL. The domain name isn’t considered part of the URL, as far RewriteRule is concerned. It is useful to think of ^ as "anchoring" the characters that follow to the beginning of the string, that is, asserting that they be the first part.

.

Matches any single character.

*

Specifies that the preceding character or expression can be repeated zero or more times-not at all to an infinite number of times.

+

Specifies that the preceding character or expression can be repeated one or more times. In other words, the preceding character or expression must match at least once.

?

Specifies that the preceding character or expression can be repeated zero or one time. In other words, the preceding character or expression is optional.

{m,n}

Specifies that the preceding character or expression can be repeated between m and n times; m and n are integers, and m needs to be lower than n.

( )

The parentheses are used to define a captured expression. The string matching the expression between parentheses can be then read as a variable. The parentheses can also be used to group the contents therein, as in mathematics, and operators such as *, +, or ? can then be applied to the resulting expression.

[ ]

Used to define a character class. For example, [abc] will match any of the characters a, b, c. The – character can be used to define a range of characters. For example, [a-z] matches any lowercase letter. If – is meant to be interpreted literally, it should be the last character before ]. Many metacharacters lose their special function when enclosed between [ and ], and are interpreted literally.

[^ ]

Similar to [ ], except it matches everything except the mentioned character class. For example, [^a-c] matches all characters except a, b, and c.

$

Matches the end of the line. In our case, it will always match the end of the URL. It is useful to think of it as "anchoring" the previous characters to the end of the string, that is, asserting that they be the last part.

\

The backslash is used to escape the character that follows. It is used to escape metacharacters when we need them to be taken for their literal value, rather than their special meaning. For example, \. will match a dot, rather than "any character" (the typical meaning of the dot in a regular expression). The backslash can also escape itself-so if you want to match C:\Windows, you’ll need to refer to it as C:\\Windows.

Using Table 3-4 as reference, let’s analyze the expression ^/Products/P([0-9]+)\.html$. The expression starts with the ^ character, matching the beginning of the requested URL (remember, this doesn’t include the domain name). The characters /Products/P assert that the next characters in the URL string match those characters.

Let’s recap: the expression ^/Products/P will match any URL that starts with /Products/P.

The next characters, ([0-9]+), are the crux of this process. The [0-9] bit matches any character between 0 and 9 (that is, any digit), and the + that follows indicates that the pattern can repeat one or more times, so we can have an entire number rather than just a digit. The enclosing round parentheses around [0-9]+ indicate that the regular expression engine should store the matching string (which will be a digit or number) inside a variable called $1. (We’ll need this variable to compose the rewritten URL.)

Finally, we have \.html$, which means that string should end in .html. The \ is the escaping character that indicates that the . should be taken as a literal dot, not as "any character" (which is the significance of the . metacharacter). The $ matches the end of the string.

The second argument of RewriteRule, /Product.aspx?ProductID=$1, plugs the digit or number extracted by the matching regular expression into the $1 variable. If the regular expression matched more than one string, the subsequent matches could be referenced as $2, $3, and so on. You’ll meet several such examples later in this book.

The second argument of RewriteRule isn’t written using the regular expression language. Indeed, it doesn’t need to, because it’s not meant to match anything. Instead, it simply supplies the form of the rewritten URL. The only part with a special significance here are the variables ($1, $2, and so on) whose values are extracted from the expressions written between parentheses in the first argument of RewriteRule.

As you can see, this rule does indeed rewrite any request for a URL that looks like /Products/Pn.html to Product.aspx?ProductID=n, which can be executed by our Product.aspx page. The [L] makes sure that if a match is found, the rewriting rules that follow won’t be processed.

RewriteRule ^/Products/P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

This is particularly useful if you have a long list of RewriteRule commands, because using [L] improves performance and prevents ISAPI_Rewrite from processing all the RewriteRule commands that follow once a match is found. This is usually what we want regardless.

Helicon’s ISAPI_Rewrite ships with a regular expression tester application, which allows you to verify if a certain rewriting rule matches a test string. The application is named RXTest.exe, and is located in the product’s installation folder (by default Program Files\Helicon\ISAPI_Rewrite\).

Rewriting Numeric URLs with Two Parameters

What you’ve accomplished in the previous exercise is rewriting numeric URLs with one parameter. We’ll now expand that little example to also rewrite URLs with two parameters. The URLs with one parameter that we support looks like http://seoasp/Products/Pn.html. Now we’ll assume that our links need to support links that include a category ID as well, in addition to the product ID. The new URLs will look like:

http://seoasp/Products/C2/P1.html

The existing Product.aspx script will be modified to handle links such as:

http://seoasp/Product.aspx?CategoryID=2&ProductID=1

As a quick reminder, here’s the rewriting rule you used for numeric URLs with one parameter:

RewriteRule ^/Products/P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

For rewriting two parameters, the rule would be a bit longer, but not much more complex:

RewriteRule ^/Products/C([0-9]+)/P([0-9]+)\.html$ @@ta /Product.aspx?CategoryID=$1&ProductID=$2 [L]

Let’s put this to work in a quick exercise.

Exercise: Rewriting Numeric URLs

1. Modify your Product.aspx page that you created in the previous exercise, like this:

<%@ Page Language="C#" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<script runat="server">

protected void Page_Load(object sender, EventArgs e)

{

// retrieve the product ID and category ID from the query string

string productId = Request.QueryString["ProductID"];

string categoryId = Request.QueryString["CategoryID"];

// use productId to customize page contents

if (productId != null && categoryId == null)

{

// set the page title

this.Title += ": Product " + productId;

// display product details

message.Text =

String.Format("You selected product #{0}. Good choice!", productId);

}

// use productId and categoryId to customize page contents

else if (productId != null && categoryId != null)

{

// set the page title

this.Title +=

String.Format(": Product {0}: Category {1}", productId, categoryId);

// display product details

message.Text =

String.Format("You selected product #{0} in category #{1}. Good choice!",

productId, categoryId);

}

else

{

// display product details

message.Text = "Please select a product from our catalog.";

}

}

</script>

<html xmlns="http://www.w3.org/1999/xhtml" >

<head runat="server">

<title>ASP.NET SEO Shop</title>

</head>

<body>

<form id="form1" runat="server">

<asp:Literal runat="server" ID="message" />

</form>

</body>

</html>

2. Test your script with a URL that contains just a product ID, such as http://seoasp/Products/P123456.html, to ensure that the old functionality still works. The result should resemble Figure 3-8.

3. Now test your script by loading http://seoasp/Product.aspx?CategoryID=5&ProductID=99. You should get the output shown in Figure 3-9.

131473 fg0309

Figure 3-9

4. Add a new rewriting rule to the httpd.ini file as shown here:

[ISAPI_Rewrite]

# Defend your computer from some worm attacks

RewriteRule .*(?:global.asa|default\.ida|root\.exe|\.\.).* . [F,I,O]

# Translate my-super.product.html to /Product.aspx?ProductID=123

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123

# Rewrite numeric URLs that contain a product ID

RewriteRule ^/Products/P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

# Rewrite numeric URLs that contain a product ID and a category ID

RewriteRule ^/Products/C([0-9]+)/P([0-9]+)\.html$ @@ta

/Product.aspx?CategoryID=$1&ProductID=$2 [L]

Note that the entire RewriteRule command and its parameters must be written on a single line in your httpd.ini file. If you split it in two lines as printed in the book, it will not work.

5. Load http://seoasp/Products/C5/P99.html, and expect to get the same output as with the previous request, as shown in Figure 3-10.

131473 fg0310

Figure 3-10

In this example you started by modifying Product.aspx to accept URLs that accept a product ID and a category ID. Then you added URL rewriting support for URLs with two numeric parameters. You created a rewriting rule to your httpd.ini file, which handles URLs with two parameters:

RewriteRule ^/Products/C([0-9]+)/P([0-9]+)\.html$ @@ta

/Product.aspx?CategoryID=$1&ProductID=$2 [L]

The rule looks a bit complicated, but if you look carefully, you’ll see that it’s not so different from the rule handling URLs with a single parameter. The rewriting rule has now two parameters-$1 is the number that comes after /Products/C, and is defined by ([0-9]+), and the second parameter, $2, is the number that comes after /P.

The result is that we now delegate any URL that looks like /Products/Cm/Pn.html to /Product.aspx?CategoryID=m&ProductID=n.

Rewriting Keyword-Rich URLs

Here’s where the real fun begins! This kind of URL rewriting is a bit more complex, and there are more strategies you could take. When working with rewritten numeric URLs, it was relatively easy to extract the product and category IDs from a URL such as /Products/C5/P9.html, and rewrite the URL to Product.aspx?CategoryID=5&ProductID=9.

A keyword-rich URL doesn’t necessarily have to include any IDs. Take a look at this one:

http://www.example.com/Products/Tools/Super-Drill.html

(You met a similar example in the first exercise of this chapter, where you handled the rewriting of http://seoasp/my-super-product.html.)

This URL refers to a product named "Super Drill" located in a category named "Tools." Obviously, if you want to support this kind of URL, you need some kind of mechanism to find the IDs of the category and product the URL refers to.

One solution that comes to mind is to add a column in the product information table that associates such beautified URLs to "real" URLs that your application can handle. In such a request you could look up the information in the Category and Product tables, get their IDs, and use them. We demonstrate this technique in an exercise later in this chapter.

We also have a solution for those who prefer an automated solution that doesn’t involve a lookup database. This solution still brings the benefits of a keyword-rich URL, while being easier to implement. Look at the following URLs:

http://www.example.com/Products/Super-Drill-P9.html

http://www.example.com/Products/Tools-C5/Super-Drill-P9.html

These URLs include keywords. However, we’ve sneaked IDs in these URLs, in a way that isn’t unpleasant to the human eye, and doesn’t distract attention from the keywords that matter, either. In the case of the first URL, the rewriting rule can simply extract the number that is tied at the end of the product name (-P9), and ignore the rest of the URL. For the second URL, the rewriting rule can extract the category ID (-C5) and product ID (-P9), and then use these numbers to build a URL such as Product.aspx?CategoryID=5&ProductID=9.

This book generally uses such keyword-rich URLs, which also contain item IDs. Later in this chapter, however, you’ll be taught how to implement ID-free keyword-rich URLs as well.

The rewrite rule for keyword-rich URLs with a single parameter looks like this:

RewriteRule ^/Products/.*-P([0-9]+)\.html?$ /Product.aspx?ProductID=$1 [L]

The rewrite rule for keyword-rich URLs with two parameters looks like this:

RewriteRule ^/Products/.*-C([0-9]+)/.*-P([0-9]+)\.html$ @@ta /Product.aspx?CategoryID=$1&ProductID=$2 [L]

Let’s see these rules at work in an exercise.

Exercise: Rewriting Keyword-Rich URLs

1. Modify the httpd.ini configuration file like this:

[ISAPI_Rewrite]

# Rewrite numeric URLs that contain a product ID

RewriteRule ^/Products/P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

# Rewrite numeric URLs that contain a product ID and a category ID

RewriteRule ^/Products/C([0-9]+)/P([0-9]+)\.html$ @@ta

/Product.aspx?CategoryID=$1&ProductID=$2 [L]

# Rewrite keyword-rich URLs with a product ID and a category ID

RewriteRule ^/Products/.*-C([0-9]+)/.*-P([0-9]+)\.html$ @@ta

/Product.aspx?CategoryID=$1&ProductID=$2 [L]

# Rewrite keyword-rich URLs with a product ID

RewriteRule ^/Products/.*-P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

2. Load http://seoasp/Products/Tools-C5/Super-Drill-P9.html, and voila, you should get the result that’s shown in Figure 3-12.

131473 fg0312

Figure 3-12

3. To test the rewrite rule that matches product keyword-rich URLs that don’t include a category, try loading http://seoasp/Products/Super-Drill-P9.html. The result should be the expected one.

There’s one interesting gotcha for you to keep in mind when developing web applications, especially when they use URL rewriting. Your web browser sometimes caches the results returned by your URLs-which can lead to painful debugging experiences-so we recommend that you disable your browser’s cache during developing.

You now have two new rules in your httpd.ini file, and they are working beautifully! The first rule handles keyword-rich URLs that include a product ID and a category ID, and the second rule handles keyword-rich URLs that include only a product ID. Note that the order of these rules is important, because the second rule matches the URLs that are meant to be captured by the first rule. Also remember that because we didn’t use the [I] flag, the matching is case sensitive.

The first new rule matches URLs that start with the string /Products/, then contain a number of zero or more characters (.*), followed by -C. This is expressed by ^/Products/.*-C. The next characters must be one or more digits, which as a whole are saved to the $1 variable, because the expression is written between parentheses -([0-9]+). This first variable extracted from the URL, $1, is the category ID.

After the category ID, the URL must contain a slash, then zero or more characters (.*), then -P, as expressed by /.*-P. Afterwards, another captured group follows, to extract the ID of the product, ([0-9]+), which becomes the $2 variable. The final bit of the regular expression, \.html$, specifies the URL needs to end in .html.

The two extracted values, $1 and $2, are used to create the new URL, /Product.aspx?CategoryID=$1&ProductID=$2.

The second rewrite rule you implemented is a simpler version of this one.

Technical Considerations

Apart from basic URL rewriting, no matter how you implement it, you need to be aware of additional technical issues you may encounter when using such techniques in your web sites:

* If your web site contains ASP.NET controls or pages that generate postback events that you handled at server-side, you need to perform additional changes to your site so that it handles the postbacks correctly.

* You need to make sure the relative links in your pages point to the correct absolute locations after URL rewriting.

Let’s deal with these issues one at a time.

Handling Postbacks Correctly

Although they appear to be working correctly, the URL-rewritten pages you’ve loaded in all the exercises so far have a major flaw: they can’t handle postbacks correctly. Postback is the mechanism that fires server-side handlers as response of client events by submitting the ASP.NET form. In other words, a postback occurs every time a control in your page that has the runat="server" attribute fires an event that is handled at server-side with C# or VB.NET code.

To understand the flaw in our solution, add the following button into the form in Product.aspx:

<body>

<form id="form1" runat="server">

<asp:Literal runat="server" ID="message" />

<asp:Button ID="myButton" runat="server" Text="Click me!" />

</form>

</body>

Switch the form to Design view, and double-click the button in the designer to have Visual Web Developer generate its Click event handler for you. Then complete its code by adding the following line:

protected void myButton_Click(object sender, EventArgs e)

{

message.Text += "<br />You clicked the button!";

}

Alright, you have one button that displays a little message when clicked. To test this button, load http://seoasp/Product.aspx, and click the button to ensure it works as expected. The result should resemble that in Figure 3-16. (Note that clicking it multiple times doesn’t display additional text, because the contents of the Literal control used for displaying the message is refreshed on every page load.)

131473 fg0316

Figure 3-16

Now, load the same Product.aspx form, but this time using a rewritten URL. I’ll choose http://seoasp/Products/Super-AJAX-PHP-Book-P35.html, which should be properly handled by your existing code and rewritten to http://seoasp/Product.aspx?ProductID=35. Then click the button. Oops! You’ll get an error, as shown in Figure 3-17.

131473 fg0317

Figure 3-17

If you look at the new URL in the address bar of your web browser, you can intuit what happens: the page is unaware that it was loaded using a rewritten URL, and it submits the form to the wrong URL-in this example, http://seoasp/Products/Product.aspx?ProductID=35. The presence of the Products folder in the initial URL broke the path to which the form is submitted.

The new URL doesn’t exist physically in our web site, and it’s also not handled by any rewrite rules. This happens because the action attribute of the form points back to the name of the physical page it’s located on, which in this case is Products.aspx (this behavior isn’t configurable via properties). This can be verified simply by looking at the HTML source of the form, before clicking the button:

<form name="form1" method="post" action="Product.aspx?ProductID=35" id="form1">

When this form is located on a page that contains folders, the action path will be appended to the path including the folders. When URL rewriting is involved, it’s easy to intuit that this behavior isn’t what we want. Additionally, even if the original path doesn’t contain folders, the form still submits to a dynamic URL, rendering our URL rewriting efforts useless.

To overcome this problem, there are three potential solutions. The first works with any version of ASP.NET, and involves creating a new HtmlForm class that removes the action attribute, like this:

namespace ActionlessForm

{

public class Form : System.Web.UI.HtmlControls.HtmlForm

{

protected override void RenderAttributes(System.Web.UI.HtmlTextWriter writer)

{

Attributes.Add("enctype", Enctype);

Attributes.Add("id", ClientID);

Attributes.Add("method", Method);

Attributes.Add("name", Name);

Attributes.Add("target", Target);

Attributes.Render(writer);

}

}

}

If you save this file as ActionlessForm.cs, you can compile it into a library file using the C# compiler, like this:

csc.exe /target:library ActionlessForm.cs

The default location of the .NET 2.0 C# compiler is \windows\microsoft.net\framework\v2.0.50727\csc.exe. Note that you may need to download and install the Microsoft .NET Software Development Kit to have access to the C# compiler. To create libraries you can also use Visual C# 2005 Express Edition, in which case you don’t need to compile the C# file yourself. Copying the resulted file, SuperHandler.dll, to the Bin folder of your application would make it accessible to the rest of the application. Then you’d need to replace all the <form> elements in your Web Forms and Master Pages with the new form, like this:

<%@ Page Language="C#" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<%@ Register TagPrefix="af" Namespace="ActionlessForm" Assembly="ActionlessForm" %>

<html xmlns="http://www.w3.org/1999/xhtml" >

<head id="Head1" runat="server">

<title>ASP.NET SEO Shop</title>

</head>

<body>

<af:form id="form1" runat="server">

<asp:Literal runat="server" ID="message" />

<asp:Button ID="myButton" runat="server" Text="Click me!" OnClick="myButton_Click" />

</af:form>

</body>

</html>

Needless to say, updating all your Web Forms and Master Pages like this isn’t the most elegant solution in the world, but it’s the best option you have with ASP.NET 1.x. Fortunately, ASP.NET 2.0 offers a cleaner solution, which doesn’t require you to alter your existing pages, and it consists of using the ASP.NET 2.0 Control Adapter extensibility architecture. This method is covered by Scott Guthrie in his article at http://weblogs.asp.net/scottgu/archive/2007/02/26/tip-trick-url-rewriting-with-asp-net.aspx.

The last solution implies using Context.RewritePath to rewrite the current path to /?, effectively stripping the action tag of the form. This technique is demonstrated in the case study in Chapter 14 in Professional Search Engine Optimization with ASP.NET: A Developer’s Guide to SEO, but as you’ll see, it’s not recommended that you use it in more complex applications because of the restrictions it implies on your code, and its potential side effects.

Absolute Paths and ~/

Another potential problem when using URL rewriting is that relative links will stop working when folders are used. For example, a link to /image.jpg in Product.aspx would be translated to http://seoasp/image.jpgif read from Product.aspx?ProductID=10, or to http://seoasp/Products/image.jpg if read through a rewritten URL such as http://seoasp/Products/P-10.html. To avoid such problems, you should use at least one of the following two techniques:

* Always use absolute paths. Creating a URL factory library, as shown later in this chapter, can help with this task.

* Use the ~ syntax supported by ASP.NET controls. The ~ symbol always references the root location of your application, and it is replaced by its absolute value when the controls are rendered by the server.

Problems Rewriting Doesn’t Solve

URL rewriting is not a panacea for all dynamic site problems. In particular, URL rewriting in and of itself does not solve any duplicate content problems. If a given site has duplicate content problems with a dynamic approach to its URLs, the problem would likely also be manifest in the resulting rewritten static URLs as well. In essence, URL rewriting only obscures the parameters-however many there are, from the search engine spider’s view. This is useful for URLs that have many parameters as we mentioned. Needless to say, however, if the varying permutations of obscured parameters do not dictate significant changes to the content, the same duplicate content problems remain.

A simple example would be the case of rewriting the page of a product that can exist in multiple categories. Obviously, these two pages would probably show duplicate (or very similar content) even if accessed through static-looking links, such as:

http://www.example.com/College-Books-C1/Some-Book-Title-P2.html

http://www.example.com/Out-of-Print-Books-C2/Some-Book-Title-P2.html

Additionally, in the case that you have duplicate content, using static-looking URLs may actually exacerbate the problem. This is because whereas dynamic URLs make the parameter values and names obvious, rewritten static URLs obscure them. Search engines are known to, for example, attempt to drop a parameter it heuristically guesses is a session ID and eliminate duplicate content. If the session parameter were rewritten, a search engine would not be able to do this at all.

There are solutions to this problem. They typically involve removing any parameters that can be avoided, as well as excluding any of the remaining the duplicate content. These solutions are explored in depth in the chapter on duplicate content.

A Last Word of Caution

URLs are much more difficult to revise than titles and descriptions once a site is launched and indexed. Thus, when designing a new site, special care should be devoted to them. Changing URLs later requires one to redirect all of the old URLs to the new ones, which can be extremely tedious, and has the potential to influence rankings for the worse if done improperly and link equity is lost. Even the most trivial changes to URL structure should be accompanied by some redirects, and such changes should only be made when it is absolutely necessary.

This is relatively simple process. In short, you use the URL factory that we just created to create the new URLs based on the parameters in the old dynamic URLs. Then you employ what is called a "301-redirect" to the new URLs. The various types of redirects are discussed in the following chapter.

So, if you are retrofitting a web application that is powering a web site that is already indexed by search engines, you must redirect the old dynamic URLs to the new rewritten ones. This is especially important, because without doing this every page would have a duplicate and result in a large quantity of duplicate content. You can safely ignore this discussion, however, if you are designing a new web site.

Summary

We covered a lot of material here! We detailed how to employ static-looking URLs in a dynamic web site step-by-step. Such URLs are both search engine friendly and more enticing to the user. This can accomplished through several techniques, and you’ve tested the most popular of them in this chapter. A "URL factory" can be used to enforce consistency in URLs. It is important to realize, however, that URL rewriting is not a panacea for all dynamic site problems-in particular, duplicate content problems.

IT技术, 互联网技术