存档

‘互联网’ 分类的存档

实现自己的http server

2014年2月15日
实现自己的http server已关闭评论

实现自己的http server

Write your own http server

author : Kevin Lynx

Why write your own?

    看这个问题的人证明你知道什么是http server,世界上有很多各种规模的http server,为什么要自己实现一个?其实没什么
理由。我自己问自己,感觉就是在自己娱乐自己,或者说只是练习下网络编程,或者是因为某日我看到某个库宣称自己附带一个小
型的http server时,我不知道是什么东西,于是就想自己去实现一个。

What’s httpd ?

    httpd就是http daemon,这个是类unix系统上的名称,也就是http server。httpd遵循HTTP协议,响应HTTP客户端的request,
然后返回response。
    那么,什么是HTTP协议?最简单的例子,就是你的浏览器与网页服务器之间使用的应用层协议。虽然官方文档说HTTP协议可以
建立在任何可靠传输的协议之上,但是就我们所见到的,HTTP还是建立在TCP之上的。
    httpd最简单的response是返回静态的HTML页面。在这里我们的目标也只是一个响应静态网页的httpd而已(也许你愿意加入CGI
特性)。

More details about HTTP protocol

    在这里有必要讲解HTTP协议的更多细节,因为我们的httpd就是要去解析这个协议。
    关于HTTP协议的详细文档,可以参看rfc2616。但事实上对于实现一个简单的响应静态网页的httpd来说,完全没必要读这么一
分冗长的文档。在这里我推荐<HTTP Made Really Easy>,以下内容基本取自于本文档。

– HTTP协议结构
  HTTP协议无论是请求报文(request message)还是回应报文(response message)都分为四部分:
  * 报文头 (initial line )
  * 0个或多个header line
  * 空行(作为header lines的结束)
  * 可选body
  HTTP协议是基于行的协议,每一行以\r\n作为分隔符。报文头通常表明报文的类型(例如请求类型),报文头只占一行;header line
  附带一些特殊信息,每一个header line占一行,其格式为name:value,即以分号作为分隔;空行也就是一个\r\n;可选body通常
  包含数据,例如服务器返回的某个静态HTML文件的内容。举个例子,以下是一个很常见的请求报文,你可以截获浏览器发送的数据
  包而获得:

    1  GET /index.html HTTP/1.1
    2  Accept-Language: zh-cn
    3  Accept-Encoding: gzip, deflate
    4  User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; MAXTHON 2.0)
    5  Host: localhost
    6  Connection: Keep-Alive
    7
  我为每一行都添加了行号,第1行就是initial line,2-6行是header lines,7行是一个header line的结束符,没有显示出来。
  以下是一个回应报文:
    1  HTTP/1.1 200 OK
    2  Server: klhttpd/0.1.0
    3  Content-Type: text/html
    4  Content-Length: 67
    5
    6  <head><head><title>index.html</title></head><body>index.html</body>
  第6行就是可选的body,这里是index.html这个文件的内容。

– HTTP request method
  因为我们做的事服务器端,所以我们重点对请求报文做说明。首先看initial line,该行包含几个字段,每个字段用空格分开,例
  如以上的GET /index.html HTTP/1.1就可以分为三部分:GET、/index.html、HTTP/1.1。其中第一个字段GET就是所谓的request
  method。它表明请求类型,HTTP有很多method,例如:GET、POST、HEAD等。

  就我们的目标而言,我们只需要实现对GET和HEAD做响应即可。

  GET是最普遍的method,表示请求一个资源。什么是资源?诸如HTML网页、图片、声音文件等都是资源。顺便提一句,HTTP协议
  中为每一个资源设置一个唯一的标识符,就是所谓的URI(更宽泛的URL)。
  HEAD与GET一样,不过它不请求资源内容,而是请求资源信息,例如文件长度等信息。

– More detail
  继续说说initial line后面的内容:
  对应于GET和HEAD两个method,紧接着的字段就是资源名,其实从这里可以看出,也就是文件名(相对于你服务器的资源目录),例
  如这里的/index.html;最后一个字段表明HTTP协议版本号。目前我们只需要支持HTTP1.1和1.0,没有多大的技术差别。

  然后是header line。我们并不需要关注每一个header line。我只罗列有用的header line :
  – Host : 对于HTTP1.1而言,请求报文中必须包含此header,如果没有包含,服务器需要返回bad request错误信息。
  – Date : 用于回应报文,用于客户端缓存数据用。
  – Content-Type : 用于回应报文,表示回应资源的文件类型,以MIME形式给出。什么是MIME?它们都有自己的格式,例如:
    text/html, image/jpg, image/gif等。
  – Content-Length : 用于回应报文,表示回应资源的文件长度。

body域很简单,你只需要将一个文件全部读入内存,然后附加到回应报文段后发送即可,即使是二进制数据。

– 回应报文
  之前提到的一个回应报文例子很典型,我们以其为例讲解。首先是initial line,第一个字段表明HTTP协议版本,可以直接以请求
  报文为准(即请求报文版本是多少这里就是多少);第二个字段是一个status code,也就是回应状态,相当于请求结果,请求结果
  被HTTP官方事先定义,例如200表示成功、404表示资源不存在等;最后一个字段为status code的可读字符串,你随便给吧。

  回应报文中最好跟上Content-Type、Content-Length等header。

具体实现
    正式写代码之前我希望你能明白HTTP协议的这种请求/回应模式,即客户端发出一个请求,然后服务器端回应该请求。然后继续
这个过程(HTTP1.1是长连接模式,而HTTP1.0是短连接,当服务器端返回第一个请求时,连接就断开了)。
    这里,我们无论客户端,例如浏览器,发出什么样的请求,请求什么资源,我们都回应相同的数据:

/* 阻塞地接受一个客户端连接 */
        SOCKET con = accept( s, 0, 0 ); 
/* recv request */
char request[1024] = { 0 };
        ret = recv( con, request, sizeof( request ), 0 );
        printf( request );
/* whatever we recv, we send 200 response */
{
char content[] = "<head><head><title>index.html</title></head><body>index.html</body>";
char response[512];
            sprintf( response, "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: %d\r\n\r\n%s", strlen( content ), content );
            ret = send( con, response, strlen( response ), 0 );
        }
        closesocket( con );

    程序以最简单的阻塞模式运行,我们可以将重点放在协议的分析上。运行程序,在浏览器里输入http://localhost:8080/index.html
,然后就可以看到浏览器正常显示content中描述的HTML文件。假设程序在8080端口监听。

   现在你基本上明白了整个工作过程,我们可以把代码写得更全面一点,例如根据GET的URI来载入对应的文件然后回应给客户端。
其实这个很简单,只需要从initial line里解析出(很一般的字符串解析)URI字段,然后载入对应的文件即可。例如以下函数:

void http_response( SOCKET con, const char *request )
{
/* get the method */
char *token = strtok( request, " " );
char *uri = strtok( 0, " " );
char file[64];
    sprintf( file, ".%s", uri ); 

{
/* load the file content */
        FILE *fp = fopen( file, "rb" );
if( fp == 0 )
{
/* response 404 status code */
char response[] = "HTTP/1.1 404 NOT FOUND\r\n\r\n";
            send( con, response, strlen( response ), 0 );
        }
else
{
/* response the resource */
/* first, load the file */
int file_size ;
char *content;
char response[1024];
            fseek( fp, 0, SEEK_END );
            file_size = ftell( fp );
            fseek( fp, 0, SEEK_SET );
            content = (char*)malloc( file_size + 1 );
            fread( content, file_size, 1, fp );
            content[file_size] = 0; 

            sprintf( response, "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: %d\r\n\r\n%s", file_size, content );
            send( con, response, strlen( response ), 0 );
            free( content );
        }
    }
}


其他

    要将这个简易的httpd做完善,我们还需要注意很多细节。包括:对不支持的method返回501错误;对于HTTP1.1要求有Host这个
header;为了支持客户端cache,需要添加Date header;支持HEAD请求等。

    相关下载中我提供了一个完整的httpd library,纯C的代码,在其上加上一层资源载入即可实现一个简单的httpd。在这里我将
对代码做简要的说明:
    evbuffer.h/buffer.c : 取自libevent的buffer,用于缓存数据;
    klhttp-internal.h/klhttp-internal.c :主要用于处理/解析HTTP请求,以及创建回应报文;
    klhttp-netbase.h/klhttp-netbase.c :对socket api的一个简要封装,使用select模型;
    klhttp.h/klhttp.c :库的最上层,应用层主要与该层交互,这一层主要集合internal和netbase。
    test_klhttp.c :一个测试例子。

相关下载:
klhttpd
文中相关代码

参考资料:

http://www.w3.org/Protocols/rfc2616/rfc2616.html
http://jmarshall.com/easy/http/
http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html

互联网技术 , ,

互联网的历史(1969-2009)

2010年1月21日
互联网的历史(1969-2009)已关闭评论

 

原文作者:Cameron Chapman
原文链接:The History of the Internet in a Nutshell
译者:jcky
译文链接:http://www.yeeyan.com/articles/view/jcky/70880

如果你正在读这篇文章,那么很可能你花费了很多时间在网上。然而,考虑到互联网在我们日常生活中的影响,又有多少人了解互联网是如何起家的呢?

下面是互联网的一个简史,包括重要的日期、人物、项目、网站以及其它可以让你至少明白我们现在称作互联网的到底是什么或者是从那里来的。

由于互联网完整的历史需要几本书来写,这篇文章只是让你熟悉在1969-2009年期间互联网的里程碑和给互联网带来变革和进步的事件。


1969:阿帕网(Arpanet )

阿帕网是第一个使用包交换技术(当时来说,这是一项新技术)的真实网络。1969年10月29日,斯坦福大学和加州大学洛杉矶分校的计算机首次连接了起来。实际上,他们是未来互联网的第一台主机。

在网络上发送的第一条消息应该是"Login",但据报道,在发送字母"g"的时候,连接断了。


1969: Unix

60年代的另一个重要的里程碑是开始使用Unix:一个在设计上对Linux和FreeBSD(当今,在网络服务器和网络主机提供商中最有名的操作系统)产生重大影响的操作系统。


1970:阿帕网络(Arpanet network)

1970年,在哈佛大学、麻省理工学院和BBN(该公司发明了可用于连接上网络的"接口信息处理器")公司之间建立的网络。


1971: 电子邮件

电子邮件于1971年首次被 Ray Tomlinson开发出来,他也是那个决定使用"@"符号将用户名和电脑名字(后来变成了域名)分开的人。


1971: 古登堡计划和电子图书

1971年,最引人注目的开发是古登堡计划的开始。对于那些不熟悉这个网站的人来说,古登堡计划是一个全球性的努力,它的目的是将那些公共领域的书籍做成各种格式的电子书并供免费使用。


事情起因于当Michael Hart 获得了大量的计算时间的时候,他同时意识到未来的计算机不仅仅是计算本身,但是还包括存储、检索和查找信息,就像那个时候只有图书馆有的功能。他手动 动键入(无光学字符识别仪器)了"独立宣言",并推出了古登堡计划,以便使得书籍中的信息也可以以电子的形式广泛传播。事实上,这就是电子书的诞生之日


1972: CYCLADES(法国的网络)

法国于1972年开始建立自己的类似于阿帕网的项目,叫做CYCLADES。虽然CYCLADES最终关闭了,但是它的一个想法很关键:主机只负责数据的传输而不是网络本身。


1973: 第一个跨大西洋的连接和电子邮件的普及

1973年,阿帕网第一次跨过了大西洋,和英国伦敦的一所大学连了起来。同一年,电子邮件占所有网络活动的75%。


1974: TCP/IP协议的诞生

1974年是有突破性的一年。通过了一项将类阿帕网连接到一个所谓的"网 际网路"的提议,这将没有中央控制,并会在传输控制协议(最终变成了TCP / IP协议)的规定下工作。


1975: 电子邮件客户端

随着电子邮件的流行,1975年,南加州大学的程序员John Vittal开发了第一个现代电子邮件程序。这个程序(叫MSG)在技术上的最大进步是增加了 "回复"和"转发"功能。


1977: 电脑上的调制解调器

1977年是我们今天所知道的互联网发展的重要一年。这一年, Dennis Hayes Dale Heatherington开发了调制解调器,并介绍和出售给了计算机爱好者。


1978: 电子公告栏系统 (BBS)

在1978年的一次暴风雪期间,第一个公告栏系统(BBS)诞生了。


1978:垃圾邮件的诞生

1978年,不请自来的商业电子邮件(后来称为垃圾)第一次诞生,Gary Thuerk给加利福尼亚的600个用户发了垃圾邮件。

 

1979: MUD – 最早的多角色游戏

魔兽世界和第二人生是从1979年开始开发的,当时被称作MUD(多用户地牢的简称)。MUD是 完全基于文本的虚拟世界,将角色扮演游戏、互动、剧情和网上聊天结合在了一起。


1979: 新闻组(Usenet)

新闻组(Usenet)也于1979年由两名研究生创建。 新闻组是一个基于互联网的讨论系统,来自世界各地的人们可以在相关的新闻组中发布、公开信息,并就某一主题进行讨论。


1980: 查询软件

欧洲核研究组织(也就是更广为人知的CERN)开发了ENQUIRE(由 Tim Berners-Lee编写),这是一个用超文本写的程序,世界各地物理实验室的科学家可以利用超文本(超连接)跟踪人、软件和项目。


1982: 第一个表情

虽然很多人认为表情是Kevin MacKenzie于1979年发明的。但是它是在1982年 Scott Fahlman在一个笑话之后用了一个:-),而不是MacKenzie用的-)。现代的表情从此诞生了。


1983: 阿帕网上的计算机通过TCP/IP交换数据

1983年1月1日阿帕网开始通过Vinton Cerf开发的TCP/IP协议交换数据 。数以百计的电脑都连到了交换机上,服务器这一名字也是83年开始叫的。


1984: 域名系统 (DNS)

域名系统于1984和第一个域名服务器(DNS)一起创建。 域名系统是很重要的,因为与以前的数字相比,它使得互联网上的地址更加人性化。 DNS服务器使互联网用户可以输入一个容易记住的域名,然后它会自动将它转换成IP地址。


1985: 虚拟社区(Virtual communities)

1985年,WELL(简称全球电子链接)出现了,那个时候最古老的一个虚拟社区现在还在运行中。它由Stewart Brand和Larry Brilliant于85年2月开发。 它开始是为了让全球的读者和作者进行交流,并且是一个开放的的但是却是"有文化底蕴的、高智商的"人的聚会点。连线杂志曾一度将WELL评为"最有影响的国际在线社区。"


1986: 协议战争(Protocol wars)

所谓的协议战争开始于1986年。 当时欧洲推行开放互联系统(OSI),而美国正在利用因特网/阿帕网协议(最终取得了胜利)。


1987: 互联网在发展

到了1987年,互联网上有近三万名主机。以前的阿帕网协议只能限于有1000台主机,但是采用了 TCP / IP标准后,使得有更多的主机变成了现实。


1988: IRC – 互联网中继聊天

此外,在1988年,互联网中继聊天(IRC)首次被部署,从而为今天使用的实时聊天和即时消息程序开了先河。


1988: 第一次重大的、恶意的基于互联网的攻击

第一个主要的互联网蠕虫是1988年发行的。它 被称为"莫里斯蠕虫",作者是Robert Tappan Morris,导致了大部分地区的互联网的中断。

 

1989:美国在线(AOL)诞生了

当苹果在1989年推出AppleLink程序后,该项目被重新命名,美国在线(AOL)就诞生了。美国在线,今天仍然存在,后来使得互联网在普通用户之间受到了欢迎。


1989: 万维网(WWW)的推出


1989年,Tim Berners-Lee写的万维网协议也诞生了它最初发表在MacWorld的3月刊上,并在1990年5月重新发表。它是为了告诉欧洲粒子物理研究所(CERN),一个全球性的超文本系统是CERN的最佳选择。 它最初被称为"Mesh",当Berners-Lee在1990年编写代码的时候,"万维网"这个词诞生了。

1990: 第一个商业性的拨号上网ISP

第一个商业性质的互联网拨号服务供应商也于1990年诞生–The World同年,阿帕网停止使用了。


1990: 万维网协议完成了

万维网协议的代码由 Tim Berners-Lee编写,基于他一年前提出的建议和HTML、HTTP、URL标准。


1991: 第一个网页诞生了

1991年在互联网世界有很多重大创新。 第一个网页被创建,并且很像第一份电子邮件解释什么是电子邮件,他的目的是解释什么是万维网。


1991: 第一个基于内容的搜索协议

同一年,第一个查找文件内容而不仅仅是查找文件名称的搜索协议诞生了,叫做Gopher


1991: MP3成为标准

也是在同一年,MP3文件格式正式被接纳为标准。 被高度压缩后的 MP3文件,后来成为通过互联网分享歌曲和整个专辑的流行格式。


1991: 第一个摄像头

这个时代的有趣的发明之一就是第一个摄像头。它 部署在剑桥大学的计算机实验室,其唯一目的是监视一个咖啡壶,使实验室用户可避免将时间浪费在一个空的咖啡壶上。


1993: Mosaic —第一个供大众使用的图形化浏览器

第一个被广泛下载的互联网浏览器是1993年开发的Mosaic 虽然Mosaic不是第一个Web浏览器,但它被认为是第一个可以使非技术人员上网的浏览器。


1993: 政府加入的乐趣

1993年,白宫和联合国网站均上线了,标志着.gov和.org域名的开始使用。


1994: 网景浏览器(Netscape Navigator)

Mosaic的第一个大的竞争对手–Netscape Navigator–在一年之后(1994)发布了。


1995: 互联网的商业化

1995年通常被认为是网络商业化的第一年。 虽然在95年之前,有一些已经上线的商业企业,但是在那一年有一些关键的事态进展。首先,SSL (Secure Sockets Layer)由网景公司开发出来了, 使在线进行金融交易(如信用卡付款)更加安全。

此外,两个主要的网上企业在同一年开始运营。在"Echo Bay"上的第一次交易在这一年进行,Echo Bay 后来变成了 eBay Amazon.com在1995年也开始运营了,虽然它在6年内没有盈利,直到2001年才开始盈利。


1995: Geocities和 Vatican 的上线,还有JavaScript

这一年的其他重大进展还有新推出的Geocities(于2009年10月26日终止)。


Vatican也第一次上线。

Java和JavaScript(刚开始被其创始人 Brendan Eich称为 LiveScript,并将其作为Netscape Navigator浏览器的一部分进行了部署)在1995年首次被介绍给了大众。第二年,微软发布了ActiveX


1996: 第一个基于网络的服务(webmail)

1996年, HoTMaiL的(大写字母合在一起是HTML)–第一个邮件服务启动了。


1997: 术语"博客"出现了

虽然第一个博客有这样或那样的形式,但是"博客"这个词在1997年被第一次使用。


1998:第一个不是靠传统媒体报道的新闻

1998年,第一个打破传统方式的重大新闻报道是克林顿/莱温斯基的性丑闻(也有像"Monicagate"之类的绰号),在新闻周刊宣布这一事件结束之后, The Drudge Report 网站发布了这条新闻。


1998: Google!

Google在1998年上线,给人们在网上搜索信息的方式带来了革命性的变革。


1998: 基于互联网的文件共享开始生根发芽

同样是在1998年,Napster公司在互联网上为音频文件的共享打开了大门。


1999: SETI@home项目

1999年是另外一个更加有趣的项目上线的时候:SETI@home项目。 该项目是一个通过互联网利用世界范围内的300多万台计算机进行计算的分布式计算项目,一旦计算机处于屏幕保护状态,那么意外着计算机就处于空闲状态了,这样就可以利用这些些计算机的处理能力了。该项目目的是通过分析天文数据来探索外星球智能的迹象。


2000: 网络泡沫破裂

2000年是网络泡沫破裂的一年,给大批投资者造成了巨大损失。数百家公司被迫关闭,有一些还没有为他们的投资者盈利。 纳斯达克,列出了受泡沫影响的许多高科技公司,最高时达到了5,000点,然后在一天之内失去了10%的价值,并最终在2002年10月降到了谷底。


2001: 维基百科发布

在网络泡沫依然强劲的时候,维基百科于2001年启动,为聚合式的网站内容/社会化媒体铺平了道路。


2003: 网络电话(VoIP)成为主流

2003年:Skype面向大众发布,给用户提供了一个界面友好的IP语音电话。


2003: MySpace 变成了最流行的社交网络

同样是在2003年,MySpace发布。 它后来发展成为一个时期内(现在已经被Facebook取代)最流行的社交网络。


2003: CAN-SPAM Act 将垃圾邮件拒之门外

2003年的另一个重大进展是在控制未经请求的色情和营销信息方面的成果,即众所周知的CAN-SPAM Act


2004: Web 2.0

虽然在1999年Darcy DiNucc就创造了"Web2.0"这个词,它 指的是高度互动并由用户驱动的网站和富互联网应用(RIA),直到2004年才得到广泛使用。 在第一次Web 2.0会议上, John Batelle和 Tim O’Reilly提出了"网络平台"这个概念:应用软件构建在互联网上,逐渐远离桌面(桌面软件有依赖操作系统、缺乏互操作性的缺点)。


2004:社会华媒体和Digg

术语"社会化媒体"被认为首先由Chris Sharpley提出,并在同一年,"Web2.0"成为了主流概念。社会化媒体网站和网络应用允许用户创建和分享内容,并且在这个平台上可以相互交流。

Digg,一个全新的社会新闻网站,于2004年11月推出,为诸如 Reddit, Mixx,和 Yahoo! Buzz之类的网站开了先河。 Dig对传统的发现和产生网络内容的方式产生了革命性的影响,新闻和网站连接全都是由社区投票民主决定。


2004: Facebook向大学学生开放

Facebook于2004年推出,当时只是对大学生开放并叫做"The Facebook",后来,"The"被从名字中去掉了,虽然 http://www.thefacebook.com仍然存在。


2005: YouTube –大众可以分享的流视频

YouTube于2005年推出,提供免费网络在线视频存储,并给大众分享。


2006: Twitter开始推了

Twitter于2006年推出,它最初的名字是 twittr(受Flickr的启发);Twitter的第一条信息是:"just setting up my twtt"。


2007: 网络电视


Hulu在2007年首次推出,与美国广播公司、全国广播公司和Fox合资,目的是使流行的电视节目可以在网上观看。


2007: iPhone和移动网络

2007年最大的创新肯定非iPhone莫属,在移动网络的应用和设计上,它几乎负责了全部。


2008: "网络选举"

"网络选举"第一次发生在2008年的美国总统选举期间。这是第一次总统候选人利用了互联网上所有可以利用的资源。希拉里.克林顿的视频在YouTube早早的就出现了。几乎每一个候选人都有Facebook页面或Twitter帐户,或者两者都有。

Ron Paul 通过网络筹款,单日筹到了430万美元,创下了历史记录,并且几个星期之后,以一天筹到440完美元的记录打破了自己创下的记录。

2008年的选举将政治和竞争移到了网上,这一趋势在将来没有任何改变的迹象。


2009:ICANN的政策变化

2009年互联网最大的变化之一是美国放松了长期以来的对ICANN的控制,ICANN是互联网的官方命名机构(它是域名注册的主管部门)。

互联网 ,

创新工场董事长兼首席执行官李开复的发言稿

2009年9月7日
创新工场董事长兼首席执行官李开复的发言稿已关闭评论

发 言 稿
创新工场董事长兼首席执行官 李开复
(2009年9月7日)

非常感谢各位出席今天的发布会。

大家上周肯定已经听到了我离职的消息。网上的说法很多,今天我想还是跟大家交流一下。另外,更重要的是,向大家详细说明我的下一步方向。

2009年,是我职业经理人生涯的第20个年头。从苹果、SGI到微软和谷歌,我踏踏实实地走过了20年职业经理人生涯。但过去半年来,我的心中总有一种急迫感,心中常有个声音告诉自己:是开始职业人生新篇章的时候了。经过反复思考,我决定在北京创立创新工场。

大家都在追问我,为什么?其实,我过去的20年职业生涯中,有过三次职业变动,但不变的是心中的一个追求、一个职业坐标,那就是:尽我所能,改变可以改变的一切。而过去10年来,我接近中国、接近中国青年的梦想越来越清晰。同时,我也希望用最可以掌控的方式,推动科技创新。可以说,"创新"、"中国"、"青年"是深深打动我的主旋律。所以,今天的变动看起来是一个转折,实际上却是一种自然的延续,一种跳跃中的传承。

今年4月,我与谷歌签订的四年合约期满。4月20日,公司首席执行官Eric Schmidt和我的直接上司Alan Eustace约我谈续约的事。总部对过去四年,谷歌中国取得的成绩深表满意。为了事业的稳步发展,公司自然希望中国领导层高度稳定。但对于我,公司正健康、稳步发展,功成身退,这是一个最佳时机。于是我非常坦诚地告诉他们,我可能不会续约。我的回答自然出乎他们的预料。但是,他们希望我能回去仔细考虑一下。

我原计划6月底到总部正式辞职,并且当面感谢Eric和Alan对我多年来的支持和关照。但6月18日,一件事情打乱了我的计划:谷歌中国碰到一个严峻的挑战,也就是大家都已经了解的"涉黄"案。我虽然有个人的时间表,但我也明理:绝不能在公司面临这么大危机时离开。我也坚信,我可以带领公司应对挑战。

于是,我们以最快速度过滤搜索结果里的相关链接。同时,最大程度地与政府相关部门沟通。7月中旬在和部委领导见面时,领导们一致肯定:谷歌中国在打击低俗内容方面态度认真,而且非常有效,通过了政府的测试。7月中旬开始,谷歌中国重新回到平稳发展的轨道上。

得到这个好消息后,我马不停蹄地带着团队做了一个新的品牌推广计划给总部,内容大致为:公司产品已经领先业界,万事俱备,必须在推广上下大力气、花大工夫。这份关键的报告得到批准并且开始运行之后,我才决定重提个人计划。8月初,我再次到总部和Eric、Alan和其他高管约谈我的离职计划。之前,他们已经给出了破例的条件挽留我,同时,他们看到我在六月危机时的超强度工作,以及在新推广计划设计上的呕心沥血,都以为我会接受续约的邀请。

但是,我对他们说:我非常感谢公司对我的安排。但我想我已经下定了决心,谷歌中国现在已经发展到一个平稳的阶段,而我的人生还有一个缺憾没有实现,我想去弥补它。我可能去创办一个帮助中国青年创业的创新工场,和中国青年人一起打造新奇的技术奇迹,我想用自己的主动性做一个掌控全局的工作,我想,我已经到了这个人生阶段,再不去做,我真的很怕来不及了。这也是我从心的选择。

我得到了他们真诚的祝福。今天,我仍然感谢谷歌让我学到了很多,给了我这么好的机会,在我热爱的土壤上,创办了这么一个有朝气、有创意的公司。我也非常舍不得这些员工。我对过去四年,对于公司,对于我的领导们充满感激、感恩。但是新使命的召唤,让我作出了新的选择,我必须继续向前。

我的创业计划得到好友,美国中经合集团创始人兼董事长刘宇环先生(Peter Liu)的大力支持。他执掌的中经合集团是中国风险投资的鼻祖。虽然目前整个经济依然没有走出阴影,但他爽快地投入了五百万美金,作为创新工场的启动资金。过去几周,除了刘宇环先生之外,我们还获得了柳传志、郭台铭、俞敏洪、陈士骏等业界精英的投资,确保了我们在未来5年中用8亿人民币打造创新工场的计划。同时,我也非常荣幸,获得北京市政府和中关村管委会的支持。今天,苟仲文副市长能亲临现场,是对我的肯定和鼓励。

下面,我想跟各位分享一下我的新事业 — 创新工场这个新模式。

目前,在中国,创业者通常要面临很多挑战,比如,缺少创业和管理经验、欠缺初期启动资金、难以吸引卓越技术人才等,诸多因素使得创业实际成功率并不高。而美国硅谷这个创业者的摇篮,已经形成了从天使投资到中、后期风险投资的完整的、流水线式的体系。一个好的想法哪怕是在概念验证期都可以获得10~50万美元天使资金的支持。而天使投资人又往往是前成功创业者或公司高管,有着深厚的业界经验和背景,除了资金之外能够给创新公司带来他们的视野和业界关系,导入专业人员和管理经验,之后风险投资会对天使项目后期跟进。2008年,美国的天使投资占风险投资总体盘子的40%至50%,共有26万多个活跃的天使投资人和组织。然而在中国,虽然2000年以来风险投资快速发展,但是天使投资群体还未成熟,天使投资领域仍是一大块空白。风险投资苦于找不到足够多的、后期的好项目,创业者苦于找不到足够的早期支持。

像美国这样一个大规模的天使投资群体在中国不是能够一蹴而就的,创新工场的诞生就是要填补这个空白,带来规模化、产业化的天使投资,用一套完整、成熟的体系,甄选出最优秀的创意、创业者、工程师,把每一个创业环节和资源进行最佳整合,帮助创业者,确保其初期的良性发展。我们希望开创出最有市场价值和商业潜力的项目,加以研发和市场运营。当项目成熟到一定程度后,自然剥离母公司成为独立子公司,直至最后上市或被收购。我希望能够建立"天使投资 + 创新产品构建"这样一个全新的创业投资模式。

创新工场将立足信息产业最热门领域:即互联网、移动互联网和云计算,选择相关技术作为创业起点。

创新工场的主要预期分为三部分:从人才培养的角度,我们希望看到越来越多的青年在我们的帮助下实现梦想、创造奇迹;从公司商业运作的角度,希望每年能够孵化出3-5个成熟的公司,看到他们一天天的成长;从投资者的角度,我们希望在几年之后就能够有对投资者有优厚的回报。

我相信,我的技术背景、我过去20年职业经理人生涯积累的工作经验和产业关系、我在业内人才中的影响力、国际化的视野和沟通能力能够很好的帮助创业者建立团队、建立团队文化、提升领导力、提高创业成功率。这也是我们创建创新工场的初衷。

提出离职后,我给潜在的投资人做过几场有关创新工场商业模式的报告。我觉得我是幸运的,仅仅一个月就完成了这么大型的融资,获得了业界这么多知名人士的鼎力支持。当然,这一方面表明投资者对我的信任,但更重要的,反映的是他们对创新工场这个商业模式是看好的,是充满信心的。这样的认可,对于我和创新工场是莫大的鼓舞。

说到创新工场,我觉得,我有太多太多想跟大家分享。今天时间有限,我先介绍到这里。

今天,我想利用这个机会,特别感谢各位过去多年来对我持续的关注、支持和帮助。在我面临重大危机的时候,经常得到你们真诚的支持和鼓励。有时,面临互联网上的谣言,因为有你们这些专业的、正直的媒体的存在,让我感到温暖,感觉未来是有希望的。我希望以后你们会一如既往地关注我和我的工作。我们的友情是永远的。

诸位肯定已经看见了,资料袋里有我特别赠送的珍藏版新书《世界因我不同 – 李开复自传》。书中真实记录了我20年来从业的心路历程、我的信念、我对未来的期许。这本书会在9月中旬在中国大陆公开发行。今天提前送给大家先睹为快,也表示我对大家的一个特别感谢!

谢谢大家!
 

互联网 , ,

ISAPI_rewrite中文手册及多站点配置方法

2009年9月3日
ISAPI_rewrite中文手册及多站点配置方法已关闭评论

 在NT 2000 XP和2003平台上,在系统帐户下应该INETINFO程序应该与IIS5以共存模式过滤器运行。所以系统帐户应该给予对所有的ISAPI-REWIRITE DLLS 和所有的HTTPD。INI文件至少可读权限,我们也推荐对给予系统帐户对于所有包括HTTPD。INI文件的文件夹的可写权限,这将允许产生HTTP。PARSE。ERRORS文件,这些文件包含配置文件语法错误。对于PROXY模块也需要额外的权限,因为它将运行于连接池或HIGH-ISPLATED应用模式,IIS帐户共享池和HIGH-ISOLATION池应被给予 对RWHELPERE。DLL的可读权限。缺省情况下IWAM-《计算机名》被用于所有的池,在相应的COM+应用设置中应借助COM+ADMINISTRATION MMC SNAP-IN建立池帐户

配置文件格式化

有两种形式的配置文件-GLOBAL(SERVER-LEVEL)和INDIVIDUAL(SITE-LEVAL)文件,GLOBAL配置文件应被命名为HTTPD.INI并出现在ISAPI-REWRITE安装目录中,文件的快捷方式通过开始菜单提供,INDIVIDUAL配置文件应名为HTTPD。INI并且能够出现在虚拟站点的物理根目录中,两种类型的格式化是相同的并是标准的WINDOWS。INI文件,所有的指令都应该放在这一部分并且所有指令都应该以分隔线放置,任何这一部分以外的文本都将被忽略

HTTPD.INI文件示例
 

Code:
[ISAPI_Rewrite]

# This is a comment

# 300 = 5 minutes
CacheClockRate 300
RepeatLimit 20

# Block external access to the httpd.ini and httpd.parse.errors files
RewriteRule /httpd(?:\.ini|\.parse\.errors) / [F,I,O]

# Block external access to the Helper ISAPI Extension
RewriteRule .*\.isrwhlp / [F,I,O]

# Some custom rules
RewriteCond Host: (.+)

RewriteCond 指令

Syntax:(句法) RewriteCond TestVerb CondPattern [Flags]

这一指令定义一个条件规则,在 RewriteRule 或者 RewriteHeader或 RewriteProxy指令前预行RewriteCond指令,后面的规则 只有它的,模式匹配URI的当前状态并且额外的条件也被应用才会被应用。

TestVerb

Specifies verb that will be matched against regular expression.
特别定义的动词匹配规定的表达式

TestVerb=(URL | METHOD | VERSION | HTTPHeaderName: | %ServerVariable) where:

URL – returns Request-URI of client request as described in RFC 2068 (HTTP 1.1);
返回客户端在RFC2068中描述的需求的Request-URI
METHOD – returns HTTP method of client request (OPTIONS, GET, HEAD, POST, PUT, DELETE or TRACE);
返回客户端需求(OPTIONS, GET, HEAD, POST, PUT, DELETE or TRACE)的HTTP方法
VERSION – returns HTTP version;
返回HTTP版本
HTTPHeaderName – returns value of the specified HTTP header. HTTPHeaderName can be any valid HTTP header name. Header names should include the trailing colon ":". If specified header does not exists in a client’s request TestVerb is treated as empty string.
返回特定义的HTTP头文件的值

Code:
HTTPHeaderName =
Accept:
Accept-Charset:
Accept-Encoding:
Accept-Language:
Authorization:
Cookie:
From:
Host:
If-Modified-Since:
If-Match:
If-None-Match:
If-Range:
If-Unmodified-Since:
Max-Forwards:
Proxy-Authorization:
Range:
Referer:
User-Agent:
Any-Custom-Header

得到更多的关于HTTP头文件的和他们的值的信息参考RFC2068

ServerVariable 返回特定义的服务器变量的值 。例如服务器端口,全部服务器变量列表应在IIS文档中建立,变量名应用%符预定;
CondPattern
The regular expression to match TestVerb
规则表达式匹配TestVerb
 

Code:
[Flags]
Flags is a comma-separated list of the following flags:

O (nOrmalize)

Normalizes string before processing. Normalization includes removing of an URL-encoding, illegal characters, etc. This flag is useful with URLs and URL-encoded headers

RewriteRule 指令

Syntax: RewriteRule Pattern FormatString [Flags]
这个指令可以不止发生一次,每个指令定义一个单独的重写规则,这些规则的定义命令很重要,因为这个命令在应用运行时规则是有用途的

I (ignore case)

不管大小写强行指定字符匹配,这个FLAG影响RewriteRule指令和相应的RewriteCond 指令

F (Forbidden)

对客户端做反应,停止REWRITING进程并且发送403错误,注意在这种情况下FORMATSTRING 是无用的并可以设置为任何非空字符串。

L (last rule)

不应用任何重写规则在此停止重写进程,使用这个FLAG以阻止当前被重写的URI被后面的规则再次重写

N (Next iteration)

强制REWRITINGENGINE调整规则目标并且从头重启规则检查(所有修改将保存),重启次数由RepeatLimit指定的值限制,如果这个数值超过N FLAG将被忽略

NS (Next iteration of the same rule)

以N标记工作不从相同的规则重启规则规则进程(例如强制重复规则应用),通过RepeatLimit指令指定一个反复实行某一规则的最大数目,

P (force proxy)

强制目的URI在内部强制为代理需求并且立即通过ISAPI扩展应付代理需求,必须确认代理字符串是一个有效的URI包括协议 主机等等否则代理将返回错误

R (explicit redirect)

强制服务器对客户端发出重定向指示即时应答,提供目的URI的新地址,重定向规则经常是最后规则

RP (permanent redirect)

几乎和[R]标记相同但是发布301HTTP状态而不是302HTTP状态代码

U (Unmangle Log)

当URI是源需求而不是重写需求时记载URI

O (nOrmalize)

在实行之前标准化字符串。标准化包括URL-ENCODING,不合法的字符的再移动等,这个标记对于URLS和URLS-ENDODED头是有用的

CL (Case Lower)

小写

CU (Case Upper)

大写

RewriteHeader directive
 

Code:
Syntax: RewriteHeader HeaderName Pattern FormatString [Flags]

这个指令是RewriteRule的更概括化变种,它不仅重写URL的客户端需求部分,而且重写HTTP头,这个指令不仅用于重写。生成,删除任何HTTP头,甚至改变客户端请求的方法

HeaderName

指定将被重写的客户头,可取的值与 RewriteCond 指令中TestVerb参数相同

Pattern
限定规则表达式以匹配Request-URI,
FormatString
限定将生成新的URI的FormatString
[Flags]
是一个下列FLAGS的命令分隔列表

I (ignore case)
不管大小写强行指定字符匹配,这个FLAG影响RewriteRule指令和相应的RewriteCond 指令
F (Forbidden)
对客户端做反应,停止REWRITING进程并且发送403错误,注意在这种情况下FORMATSTRING 是无用的并可以设置为任何非空字符串。
L (last rule)
不应用任何重写规则在此停止重写进程,使用这个FLAG以阻止当前被重写的URI被后面的规则再次重写
N (Next iteration)
强制REWRITINGENGINE调整规则目标并且从头重启规则检查(所有修改将保存),重启次数由RepeatLimit指定的值限制,如果这个数值超过N FLAG将被忽略

NS (Next iteration of the same rule)
以N标记工作不从相同的规则重启规则规则进程(例如强制重复规则应用),通过RepeatLimit指令指定一个反复实行某一规则的最大数目,

R (explicit redirect)
强制服务器对客户端发出重定向指示即时应答,提供目的URI的新地址,重定向规则经常是最后规则
RP (permanent redirect)
几乎和[R]标记相同但是发布301HTTP状态而不是302HTTP状态代码
U (Unmangle Log)
当URI是源需求而不是重写需求时记载URI
O (nOrmalize)
在实行之前标准化字符串。标准化包括URL-ENCODING,不合法的字符的再移动等,这个标记对于URLS和URLS-ENDODED头是有用的
CL (Case Lower)
小写
CU (Case Upper)
大写

要重移动头,FORMAT STRING模式应该生成一个空字符串,例如这一规则将从客户请求中重移代理信息
RewriteHeader User-Agent: .* $0
并且这一规则将把OLD-URL HEADER 加入请求中。
RewriteCond URL (.*)RewriteHeader Old-URL: ^$ $1
最后一个例子将通过改变请求方法定向所有的WEBDAV请求到/WEBDAV。ASP
 

Code:
RewriteCond METHOD OPTIONS
RewriteRule (.*) /webdav.asp?$1
RewriteHeader METHOD OPTIONS GET
RewriteProxy directive
Syntax: RewriteProxy Pattern FormatString [Flags]

强制目的URI在内部强制为代理需求并且立即通过ISAPI扩展应付代理需求,这将允许IIS作为代理服务器并且重路由到其他站点和服务器
Pattern
限定规则表达式以匹配Request-URI,
FormatString
限定将生成新的URI的FormatString
[Flags]
是一个下列FLAGS的命令分隔列表
D (Delegate security)
代理模式将试图以当前假冒的用户资格登陆远程服务器,
C (use Credentials)
代理模式将试图一在URL或基本授权头文件中指定的资格登陆远程服务器,用这个标记你可以使用http://user:password@host.com/path/ syntax 作为URL
F (Follow redirects)
缺省情况下ISAPI_Rewrite 将试图将MAP远程服务器返回的重定向指令到本地服务器命名空间,如果远程服务器返回重定向点到那台服务器其他的某个位置,ISAPI_Rewrite 将修改这一重定向指令指向本服务器名,这将避免用户看到真实(内部)服务器名称
使用F标记强制代理模式内部跟踪远程服务器返回的重定向指令,使用这个标记如果你根本不需要接受远程服务器的重定向指令,在WINHTTP设置中有重定向限制以避免远程重定向循环

I (ignore case)
不管大小写强行指定字符匹配
U (Unmangle Log)
当URI是源需求而不是重写需求时记载URI
O (nOrmalize)
在实行之前标准化字符串。标准化包括URL-ENCODING,不合法的字符的再移动等,这个标记对于URLS和URLS-ENDODED头是有用的
CacheClockRate directive
Syntax: CacheClockRate Interval
这个指令只在GLOBAL配置内容中出现,如果这个指令在SITE-LEVEL内容中出现将被忽略并把错误信息写入httpd.parse.errors 文件
ISAPI_Rewrite caches每次在第一次加载时配置,使用这个指令你可以限定当一个特定站点从缓存中清理的不活动周期,把这个参数设置的足够大你可以强制ISAPI_Rewrite 永不清理缓存,记住任何配置文件的改变将在下次请求后立即更新而忽略这个周期
Interval
限定特定配置被清理出缓存的不作为时间(以秒计),缺省值3600(1小时)
EnableConfig and DisableConfig directives
Syntax:
EnableConfig [SiteID|"Site name"]
DisableConfig [SiteID|"Site name"]
对所选站点激活或不激活SITE-LEVEL配置或者改变缺省配置,缺省SITE-LEVEL配置不激活,这个指令只出现在GLOBAL配置内容中
SiteID
Numeric metabase identifier of a site

Site name
Name of the site as it appears in the IIS console
不用参数使用这个命令将改变缺省配置到ENABLE/DISABLE配置进程

例子

下面例子将使配置仅作用于ID=1(典型是缺省站点)名字是MY SITE的站点
 

Code:
DisableConfig
EnableConfig 1
EnableConfig"My site"

下边例子将激活名称为SOMESITE配置因为它分割设置重载了缺省设置
 

Code:
EnableConfig"Some site"
DisableConfig
EnableRewrite and DisableRewrite directives
Syntax:
EnableRewrite [SiteID|"Site name"]
DisableRewrite [SiteID|"Site name"]

对所选站点激活或不激活重写或者改变缺省配置,缺省重写配置激活,这个指令只出现在GLOBAL配置内容中
 

Code:
SiteID
Numeric metabase identifier of a site

Site name
Name of the site as it appears in the IIS console.

不使用参数这个命令将全部激活或者不激活

RepeatLimit directive

Syntax: RepeatLimit Limit
这个指令可以出现在GLOBAL和SITE-LEVEL配置文件中,如果出现在GLOBAL配置文件中竟改变GLOBAL对于所有站点的限制,出现在SITE-LEVEL配置中竟只改变对于这个站点的限制并且这个限制不能超过GLOBAL限制
ISAPI_Rewrite在实行规则时允许循环,这个指令允许限制最大可能循环的数量,可以设置为0或1而不支持循环,
LIMIT
限制最大循环数量,缺省32

RFStyle directive

Syntax: RFStyle Old | New

Configuration Utility
ISAPI_Rewrite Full包括配置功用(可以在 ISAPI_Rewrite 程序组中启动),它允许你浏览测试状态并输入注册码(如果在安装过程中没有注册),并且调整部分与代理模式操作相关的产品功能,UTILITY是由三个页面组成的属性表

Trial page允许你浏览TIRAL状态并输入注册码(如果在安装过程中没有注册)

Settings page

这页包含对下列参数的编辑框

Helper URL

这个参数影响过滤器和代理模块之间的联系方式,它即可以是以点做前缀的文件扩展名(如 .isrwhlp)也可以是绝对路径,

第一种情况下扩展名将追加在初始请求URI上并且代理模块竟通过SCRIPT MAP激活,缺省扩展名isrwhlp在安装进程中加在global script map 中,如果你改变这个扩展名或者你的应用不继承global script map 设置你应该手动添加向script map 所需求的入口。这个应该有如下参数
 

Code:
Executable: An absolute path to the rwhelper.dll in the short form
Extension: Desired extension (.isrwhlp is default)
Verbs radio button: All Verbs
Script engine checkbox: Checked
Check that file exists checkbox: Unchecked

我们已经创建了一个WSH script proxycfg.vbs ,可以简单在一个a script maps中注册,她位于安装文件夹并且可以在命令行一如下方式运行

cscript proxycfg.vbs [-r] [MetabasePath]
Optional -r 强制注册扩展名
Optional MetabasePath parameter allows specification of the first metabase key to process. By default it is "/localhost/W3SVC".
要在所有现存的 script maps 中注册你可以以如下命令行激活 script
cscript proxycfg.vbs -r
第二种情况下你应该提供一个URI作为’Helper URL’的值,你也应该map 一个 ISAPI_Rewrite的安装文件夹作为美意个站点的虚拟文件家
注意:根据顾客反应,IIS5(也许包括IIS4)对长目录名有问题。所以我们强烈推荐使用短目录名
Worker threads limit
这个参数限制在代理扩展线程池中工作线程数,缺省为0意味着这个限制等于处理器数量乘以2
Active threads limit
这个参数限制当前运行线程数,这个数量不可大于"Worker threads limit". 缺省0意思是等于处理器数量
Queue size 这个参数定义最大请求数量,如果你曾经看到Queue timeout expired" 信息在 the Application event log中你可以增加这个参数
Queue timeout
这个参数定义你在内部请求队列中防止新请求的最大等待时间,如果你曾经看到Queue timeout expired" 信息在 the Application event log中你可以增加这个参数
Connect timeout
以毫秒设定代理模块连接超时
Send timeout
以毫秒设定代理模块发送超时
Receive timeout
以毫秒设定代理模块发送超时
About page.
It contains copyright information and a link to the ISAPI_Rewrite’s web site.

Regular expression syntax

这一部分覆盖了 ISAPI_Rewrite规定的表达句法

Literals

所有字符都是原意除了 ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^" and "$".,这些字符在用"\"处理时是原意,原意指一个字符匹配自身

Wildcard
The dot character "." matches any single character except null character and newline character
以下为句法

Repeats

A repeat is an expression that is repeated an arbitrary number of times. An expression followed by "*" can be repeated any number of times including zero. An expression followed by "+" can be repeated any number of times, but at least once. An expression followed by "?" may be repeated zero or one times only. When it is necessary to specify the minimum and maximum number of repeats explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a" repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with no upper limit. Note that there must be no white-space inside the {}, and there is no upper limit on the values of the lower and upper bounds. All repeat expressions refer to the shortest possible previous sub-expression: a single character; a character set, or a sub-expression grouped with "()" for example.

Examples:
 

Code:
"ba*" will match all of "b", "ba", "baaa" etc.
"ba+" will match "ba" or "baaaa" for example but not "b".
"ba?" will match "b" or "ba".
"ba{2,4}" will match "baa", "baaa" and "baaaa".
Non-greedy repeats
Non-greedy repeats are possible by appending a ‘?’ after the repeat; a non-greedy repeat is one which will match the shortest possible string.

For example to match html tag pairs one could use something like:
 

Code:
"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"

In this case $1 will contain the text between the tag pairs, and will be the shortest possible matching string.

Parenthesis
Parentheses serve two purposes, to group items together into a sub-expression, and to mark what generated the match. For example the expression "(ab)*" would match all of the string "ababab". All sub matches marked by parenthesis can be back referenced using \N or $N syntax. It is permissible for sub-expressions to match null strings. Sub-expressions are indexed from left to right starting from 1, sub-expression 0 is the whole expression.

Non-Marking Parenthesis
Sometimes you need to group sub-expressions with parenthesis, but don’t want the parenthesis to spit out another marked sub-expression, in this case a non-marking parenthesis (?:expression) can be used. For example the following expression creates no sub-expressions:

"(?:abc)*"

Alternatives
Alternatives occur when the expression can match either one sub-expression or another, each alternative is separated by a "|". Each alternative is the largest possible previous sub-expression; this is the opposite behaviour from repetition operators.

Examples:

"a(b|c)" could match "ab" or "ac".
"abc|def" could match "abc" or "def".
Sets
A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by "[" and "]" and can contain literals, character ranges, character classes, collating elements and equivalence classes. Set declarations that start with "^" contain the compliment of the elements that follow.

Examples:

Character literals:

"[abc]" will match either of "a", "b", or "c".
"[^abc] will match any character other than "a", "b", or "c".
Character ranges:

"[a-z]" will match any character in the range "a" to "z".
"[^A-Z]" will match any character other than those in the range "A" to "Z".
Character classes
Character classes are denoted using the syntax "[:classname:]" within a set declaration, for example "[[:space:]]" is the set of all whitespace characters. The available character classes are:

alnum Any alpha numeric character.
alpha Any alphabetical character a-z and A-Z. Other characters may also be included depending upon the locale.
blank Any blank character, either a space or a tab.
cntrl Any control character.
digit Any digit 0-9.
graph Any graphical character.
lower Any lower case character a-z. Other characters may also be included depending upon the locale.
print Any printable character.
punct Any punctuation character.
space Any whitespace character.
upper Any upper case character A-Z. Other characters may also be included depending upon the locale.
xdigit Any hexadecimal digit character, 0-9, a-f and A-F.
word Any word character – all alphanumeric characters plus the underscore.
unicode Any character whose code is greater than 255, this applies to the wide character traits classes only.

There are some shortcuts that can be used in place of the character classes:

\w in place of [:word:]
\s in place of [:space:]
\d in place of [:digit:]
\l in place of [:lower:]
\u in place of [:upper:]
Collating elements
Collating elements take the general form [.tagname.] inside a set declaration, where tagname is either a single character, or a name of a collating element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is equivalent to [,]. ISAPI_Rewrite supports all the standard POSIX collating element names, and in addition the following digraphs: "ae", "ch", "ll", "ss", "nj", "dz", "lj", each in lower, upper and title case variations. Multi-character collating elements can result in the set matching more than one character, for example [[.ae.]] would match two characters, but note that [^[.ae.]] would only match one character.

Equivalence classes
Equivalenceclassestakethegeneralform[=tagname=] inside a set declaration, where tagname is either a single character, or a name of a collating element, and matches any character that is a member of the same primary equivalence class as the collating element [.tagname.]. An equivalence class is a set of characters that collate the same, a primary equivalence class is a set of characters whose primary sort key are all the same (for example strings are typically collated by character, then by accent, and then by case; the primary sort key then relates to the character, the secondary to the accentation, and the tertiary to the case). If there is no equivalence class corresponding to tagname, then [=tagname=] is exactly the same as [.tagname.].

To include a literal "-" in a set declaration then: make it the first character after the opening "[" or "[^", the endpoint of a range, a collating element, or precede it with an escape character as in "[\-]". To include a literal "[" or "]" or "^" in a set then make them the endpoint of a range, a collating element, or precede with an escape character.

Line anchors
An anchor is something that matches the null string at the start or end of a line: "^" matches the null string at the start of a line, "$" matches the null string at the end of a line.

Back references
A back reference is a reference to a previous sub-expression that has already been matched, the reference is to what the sub-expression matched, not to the expression itself. A back reference consists of the escape character "\" followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2" to the second etc. For example the expression "(.*)\1" matches any string that is repeated about its mid-point for example "abcabc" or "xyzxyz". A back reference to a sub-expression that did not participate in any match, matches the null string. In ISAPI_Rewrite all back references are global for entire RewriteRule and corresponding RewriteCond directives. Sub matches are numbered up to down and left to right beginning from the first RewriteCond directive of the corresponding RewriteRule directive, if there is one.

Forward Lookahead Asserts
There are two forms of these; one for positive forward lookahead asserts, and one for negative lookahead asserts:

"(?=abc)" matches zero characters only if they are followed by the expression "abc".
"(?!abc)" matches zero characters only if they are not followed by the expression "abc".

Word operators
The following operators are provided for compatibility with the GNU regular expression library.

"\w" matches any single character that is a member of the "word" character class, this is identical to the expression "[[:word:]]".
"\W" matches any single character that is not a member of the "word" character class, this is identical to the expression "[^[:word:]]".
"\<" matches the null string at the start of a word.
"\>" matches the null string at the end of the word.
"\b" matches the null string at either the start or the end of a word.
"\B" matches a null string within a word.
Escape operator
The escape character "\" has several meanings.

The escape operator may introduce an operator for example: back references, or a word operator.
The escape operator may make the following character normal, for example "\*" represents a literal "*" rather than the repeat operator.
Single character escape sequences:
The following escape sequences are aliases for single characters:

Escape sequence Character code Meaning
\a 0x07 Bell character.
\t 0x09 Tab character.
\v 0x0B Vertical tab.
\e 0x1B ASCII Escape character.
\0dd 0dd An octal character code, where dd is one or more octal digits.
\xXX 0xXX A hexadecimal character code, where XX is one or more hexadecimal digits.
\x{XX} 0xXX A hexadecimal character code, where XX is one or more hexadecimal digits, optionally a unicode character.
\cZ z-@ An ASCII escape sequence control-Z, where Z is any ASCII character greater than or equal to the character code for ‘@’.

Miscellaneous escape sequences:
The following are provided mostly for perl compatibility, but note that there are some differences in the meanings of \l \L \u and \U:

Escape sequence Meaning
\w Equivalent to [[:word:]].
\W Equivalent to [^[:word:]].
\s Equivalent to [[:space:]].
\S Equivalent to [^[:space:]].
\d Equivalent to [[:digit:]].
\D Equivalent to [^[:digit:]].
\l Equivalent to [[:lower:]].
\L Equivalent to [^[:lower:]].
\u Equivalent to [[:upper:]].
\U Equivalent to [^[:upper:]].
\C Any single character, equivalent to ‘.’.
\X Match any Unicode combining character sequence, for example "a\x 0301" (a letter a with an acute).
\Q The begin quote operator, everything that follows is treated as a literal character until a \E end quote operator is found.
\E The end quote operator, terminates a sequence begun with \Q.
What gets matched?
The regular expression will match the first possible matching string, if more than one string starting at a given location can match then it matches the longest possible string. In cases where their are multiple possible matches all starting at the same location, and all of the same length, then the match chosen is the one with the longest first sub-expression, if that is the same for two or more matches, then the second sub-expression will be examined and so on. Note that ISAPI_Rewrite uses MATCH algorithm. The result is matched only if the expression matches the whole input sequence. For example:

RewriteCond URL ^/somedir/.* #will match any request to somedir directory and subdirectories, while
RewriteCond URL ^/somedir/ #will match only request to the root of the somedir.
Special note about "pathological" regular expressions
ISAPI_Rewrite uses a very powerful regular expressions engine Regex++ written by Dr. John Maddock. But as any real thing it’s not ideal: There exists some "pathological" expressions which may require exponential time for matching; these all involve nested repetition operators, for example attempting to match the expression "(a*a)*b" against N letter a’s requires time proportional to 2N. These expressions can (almost) always be rewritten in such a way as to avoid the problem, for example "(a*a)*b" could be rewritten as "a*b" which requires only time linearly proportional to N to solve. In the general case, non-nested repeat expressions require time proportional to N2, however if the clauses are mutually exclusive then they can be matched in linear time – this is the case with "a*b", for each character the matcher will either match an "a" or a "b" or fail, where as with "a*a" the matcher can’t tell which branch to take (the first "a" or the second) and so has to try both.

Boost 1.29.0 Regex++ could detect "pathological" regular expressions and terminate theirs matching. When a rule fails ISAPI_Rewrite sends "500 Internal Server error – Rule Failed" status to a client to indicate configuration error. Also the failed rule is disabled to prevent performance losses
Format string syntax
In format strings, all characters are treated as literals except: "(", ")", "$", "\", "?", ":".

To use any of these as literals you must prefix them with the escape character
The following special sequences are recognized:

Grouping:
Use the parenthesis characters ( and ) to group sub-expressions within the format string, use \( and \) to represent literal ‘(‘ and ‘)’.

Sub-expression expansions:
The following perl like expressions expand to a particular matched sub-expression:

$` Expands to all the text from the end of the previous match to the start of the current match, if there was no previous match in the current operation, then everything from the start of the input string to the start of the match.
$’ Expands to all the text from the end of the match to the end of the input string.
$& Expands to all of the current match.
$0 Expands to all of the current match.
$N Expands to the text that matched sub-expression N.

Conditional expressions:
Conditional expressions allow two different format strings to be selected dependent upon whether a sub-expression participated in the match or not:

?Ntrue_expression:false_expression

Executes true_expression if sub-expression N participated in the match, otherwise executes false_expression.

Example: suppose we search for "(while)|(for)" then the format string "?1WHILE:FOR" would output what matched, but in upper case.

Escape sequences:
The following escape sequences are also allowed:

\a The bell character.
\f The form feed character.
\n The newline character.
\r The carriage return character.
\t The tab character.
\v A vertical tab character.
\x A hexadecimal character – for example \x0D.
\x{} A possible unicode hexadecimal character – for example \x{1A0}
\cx The ASCII escape character x, for example \c@ is equivalent to escape-@.
\e The ASCII escape character.
\dd An octal character constant, for example \10

Examples例子

Emulating host-header-based virtual sites on a single site
例如你在两个域名注册www.site1.com 和 www.site2.com,现在你可以创建两个不同的站点而使用单一的物理站点。把以下规则加入到你的httpd.ini 文件
 

Code:
[ISAPI_Rewrite]

#Fix missing slash char on folders
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,R]

#Emulate site1
RewriteCond Host: (?:www\.)?site1\.com
RewriteRule (.*) /site1$1 [I,L]

#Emulate site2
RewriteCond Host: (?:www\.)?site2\.com
RewriteRule (.*) /site2$1 [I,L]

现在你可以把你的站点放在/site1 和 /site2 目录中.

或者你可以应用更多的类规则:
 

Code:
[ISAPI_Rewrite]

#Fix missing slash char on folders
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,R]

RewriteCond Host: (www\.)?(.+)
RewriteRule (.*) /$2$3

为站点应该命名目录为 /somesite1.com, /somesite2.info, etc.
Using loops (Next flag) to convert request parameters
假如你希望有物理URL如 http://www.myhost.com/foo.asp?a=A&b=B&c=C 使用请求如 http://www.myhost.com/foo.asp/a/A/b/B/c/C 参数数量可以从两个请求之间变化

至少有两个解决办法。你可以简单的为每一可能的参数数量添加一个分隔规则或者你可以使用一个技术说明如下面的例子
 

Code:
ISAPI_Rewrite]
RewriteRule (.*?\.asp)(\?[^/]*)?/([^/]*)/([^/]*)(.*) $1(?2$2&:\?)$3=$4$5 [NS,I]

这个规则将从请求的URL中抽取一个参数追加在请求字符的末尾并且从头重启规则进程。所以它将循环直到所有参数被移动到适当的位置,或者直到超过RepeatLimit
也存在许多这个规则的变种。但使用不同的分隔字符,例如。使用URLS如http://www.myhost.com/foo.asp~a~A~b~B~c~C 可以应中下面的规则:
 

Code:
[ISAPI_Rewrite]
RewriteRule (.*?\.asp)(\?[^~]*)?~([^~]*)~([^~]*)(.*) $1(?2$2&:\?)$3=$4$5 [NS,I]
Running servers behind IIS

假如我们有一个内网服务器运行IIS而几个公司服务器运行其他平台,这些服务器不能从INTERNET直接进入,而只能从我们公司的网络进入,有一个简单的例子可以使用代理标记映射其他服务器到IIS命名空间:

 

Code:
[ISAPI_Rewrite]
RewriteProxy /mappoint(.+) http\://sitedomain$1 [I,U]
Moving sites from UNIX to IIS

这个规则可以帮助你把URL从 /~username 改变到 /username 和从 /file.html 改变到 /file.htm. 这个在你仅仅把你的站从UNIX移动到IIS并且保持搜索引擎和其他外部页面对老页面的连接时是有用的

 

Code:
[ISAPI_Rewrite]

#redirecting to update old links
RewriteRule (.*)\.html $1.htm
RewriteRule /~(.*) http\://myserver/$1 [R]

Moving site location

许多网管问这样的问题:他们要重定向所有的请求到一个新的网络服务器,当你需要建立一个更新的站点取代老的的时候经常出现这样的问题,解决方案是用ISAPI_Rewrite 于老服务器中
 

Code:
[ISAPI_Rewrite]

#redirecting to update old links
RewriteRule (.+) http\://newwebserver$1 [R]

Browser-dependent content
Dynamically generated robots.txt

robots.txt是一个搜索引擎用来发现能不能被索引的文件,但是为一个大站创建一个有许多动态内容的这个文件是很复杂的工作,我们可以写一个robots.asp script

现在使用单一规则生成 robots.txt
 

Code:
[ISAPI_Rewrite]

RewriteRule /robots\.txt /robots.asp
Making search engines to index dynamic pages

站点的内容存储在XML文件中,在服务器上有一个/XMLProcess.asp 文件处理XML文件并返回HTML到最终用户,URLS到文档有如下形式
http://www.mysite.com/XMLProcess.asp?xml=/somdir/somedoc.xml
但是许多公共引擎不能索引此类文档,因为URLS包含问号(文档动态生成),
ISAPI_Rewrite可以完全消除这个问题
 

Code:
[ISAPI_Rewrite]

RewriteRule /doc(.*)\.htm /XMLProcess.asp\?xml=$1.xml

现在使用如同http://www.mysite.com/doc/somedir/somedoc.htm的URL进入文档,搜索引擎将不知道不是somedoc.htm 文件并且内容是动态生成的

Negative expressions (NOT

有时当模式不匹配你需要应用规则,这种情况下你可以使用在规则表达式中称为Forward Lookahead Asserts

例如你需要不使用IE把所有用户移动到别的地点
 

Code:
[ISAPI_Rewrite]
# Redirect all non Internet Explorer users
# to another location
RewriteCond User-Agent: (?!.*MSIE).*
RewriteRule (.*) /nonie$1
Dynamic authentification

例如我们在站点上有一些成员域,我们在这个域上需要密码保护文件而我们不喜欢用BUILT-IN服务器安全,这个情况下可以建立一个ASP脚本(称为proxy.asp),这个脚本将代理所有请求到成员域并且检查请求允许,这里有一个简单的模板你可以放进你自己的授权代码

现在我们要通过配置 ISAPI_Rewrite 通过这个页面代理请求:
 

Code:
[ISAPI_Rewrite]
# Proxy all requests through proxy.asp
RewriteRule /members(.+) /proxy.asp\?http\://mysite.com/members$1
Blocking inline-images (stop hot linking

假设我们在http://www.yundong78.com/下有些页面有一些内联 GIF图片很好,他人可以不直接协商通过盗链到他们的页面上,我们不喜欢这样因为加大了服务器流量
当我们不能100%保护图片,我们至少可以在浏览器发送一个HTTP Referer header的地方限制这种情况
 

Code:
[ISAPI_Rewrite]
RewriteCond Host: (.+)
RewriteCond Referer: (?!http://\1.*).*
RewriteRule .*\.(?:gif|jpg|png) /block.gif [I,O]

多站点配置

只需要将httpd.ini文件放置到相应站点的根目录下即可.

IT技术, 互联网技术 ,

URL Rewriting Using ISAPI_Rewrite

2009年9月3日
URL Rewriting Using ISAPI_Rewrite已关闭评论

URL Rewriting Using ISAPI_Rewrite

by Cristian Darie and Jaimie Sirovich

"Click me!" If the ideal URL could speak, its speech would resemble the communication of an experienced salesman. It would grab your attention with relevant keywords and a call to action; and it would persuasively argue that one should choose it instead of the other one. Other URLs on the page would pale in comparison.

URLs are more visible than many realize, and a contributing factor in CTR. They are often cited directly in copy, and they occupy approximately 20% of the real estate in a given search engine result page. Apart from "looking enticing" to humans, URLs must be friendly to search engines. URLs function as the "addresses" of all content in a web site. If confused by them, a search engine spider may not reach some of your content in the first place. This would clearly reduce search engine friendliness.

So let’s enumerate all of the benefits of placing keywords in URLs:

1. Doing so has a small beneficial effect on search engine ranking in and of itself.

2. The URL is roughly 20% of the real estate you get in a SERP result. It functions as a call to action and increases perceived relevance.

3. The URL appears in the status bar of a browser when the mouse hovers over anchor text that references it. Again-it functions as a call to action and increases perceived relevance.

4. Keyword-based URLs tend to be easier to remember than ?ProductID=5&CategoryID=2.

5. Query keywords, including those in the URL, are highlighted in search result pages.

6. Often, the URL is cited as the actual anchor text, that is:

<a href="http://www.example.com/foo.html">http://www.example.com/foo.html</a>

Obviously, a user is more likely to click a link to a URL that contains relevant keywords, than a link that does not. Also, because keywords in anchor text are a decisive ranking factor, having keywords in the URL-anchor-text will help you rank better for "foos."

To sum up these benefits in one phrase:

Keyword-rich URLs are more aesthetically pleasing and more visible, and are likely to enhance your CTR and search engine rankings.

Implementing URL Rewriting

The hurdle we must overcome to support keyword-rich URLs like those shown earlier is that they don’t actually exist anywhere in your web site. Your site still contains a script-named, say, Product.aspx-which expects to receive parameters through the query string and generate content depending on those parameters. This script would be ready to handle a request such as this:

http://www.example.com/Product.aspx?ProductID=123

but your web server would normally generate a 404 error if you tried any of the following:

http://www.example.com/Products/123.html

http://www.example.com/my-super-product.html

URL rewriting allows you to transform the URL of such an incoming request (which we’ll call the original URL) to a different, existing URL (which we’ll call the rewritten URL), according to a defined set of rules. You could use URL rewriting to transform the previous nonexistent URLs to Product.aspx?ProductID=123, which does exist.

If you happen to have some experience with the Apache web server, you probably know that it ships by default with the mod_rewrite module, which is the standard way to implement URL rewriting in the LAMP (Linux/Apache/MySQL/PHP) world. That is covered in the PHP edition of this book.

Unfortunately, IIS doesn’t ship by default with such a module. IIS 7 contains a number of new features that make URL rewriting easier, but it will take a while until all existing IIS 5 and 6 web servers will be upgraded. Third-party URL-rewriting modules for IIS 5 and 6 do exist, and also several URL-rewriting libraries, hacks, and techniques, and each of them can (or cannot) be used depending on your version and configuration of IIS, and the version of ASP.NET. In this chapter we try to cover the most relevant scenarios by providing practical solutions.

To understand why an apparently easy problem-that of implementing URL rewriting-can become so problematic, you first need to understand how the process really works. To implement URL rewriting, there are three steps:

1. Intercept the incoming request. When implementing URL rewriting, it’s obvious that you need to intercept the incoming request, which usually points to a resource that doesn’t exist on your server physically. This task is not trivial when your web site is hosted on IIS 6 and older. There are different ways to implement URL rewriting depending on the version of IIS you use (IIS 7 brings some additional features over IIS 5/6), and depending on whether you implement rewriting using an IIS extension, or from within your ASP.NET application (using C# or VB.NET code). In this latter case, usually IIS still needs to be configured to pass the requests we need to rewrite to the ASP.NET engine, which doesn’t usually happen by default.

2. Associate the incoming URL with an existing URL on your server. There are various techniques you can use to calculate what URL should be loaded, depending on the incoming URL. The "real" URL usually is a dynamic URL.

3. Rewrite the original URL to the rewritten URL. Depending on the technique used to capture the original URL and the form of the original URL, you have various options to specify the real URL your application should execute.

The result of this process is that the user requests a URL, but a different URL actually serves the request. The rest of the article covers how to implement these steps using ISAPI_Rewrite by Helicontech. For background information on how IIS processes incoming requests, we recommend Scott Mitchell’s article "How ASP.NET Web Pages are Processed on the Web Server," located at http://aspnet.4guysfromrolla.com/articles/011404-1.aspx.

URL Rewriting with ISAPI_Rewrite v2

Using a URL rewriting engine such as

Helicon

‘s ISAPI_Rewrite has the following advantages over writing your own rewriting code:

* Simple implementation. Rewriting rules are written in configuration files; you don’t need to write any supporting code.

* Task separation. The ASP.NET application works just as if it was working with dynamic URLs. Apart from the link building functionality, the ASP.NET application doesn’t need to be aware of the URL rewriting layer of your application.

* You can easily rewrite requests for resources that are not processed by ASP.NET by default, such as those for image files, for example.

To process incoming requests, IIS works with ISAPI extensions, which are code libraries that process the incoming requests. IIS chooses the appropriate ISAPI extension to process a certain request depending on the extension of the requested file. For example, an ASP.NET-enabled IIS machine will redirect ASP.NET-specific requests (which are those for .aspx files, .ashx files, and so on), to the ASP.NET ISAPI extension, which is a file named aspnet_isapi.dll.

Figure 3-3 describes how an ISAPI_Rewrite fits into the picture. Its role is to rewrite the URL of the incoming requests, but doesn’t affect the output of the ASP.NET script in any way.

At first sight, the rewriting rules can be added easily to an existing web site, but in practice there are other issues to take into consideration. For example, you’d also need to modify the existing links within the web site content. This is covered in Chapter 4 of Professional Search Engine Optimization with ASP.NET: A Developer’s Guide to SEO.

f0303

Figure 3-3

ISAPI_Rewrite allows the programmer to easily declare a set of rules that are applied by IIS on-the-fly to map incoming URLs requested by the visitor to dynamic query strings sent to various ASP.NET pages. As far as a search engine spider is concerned, the URLs are static.

The following few pages demonstrate URL rewriting functionality by using

Helicon

‘s ISAPI_Rewrite filter. You can find its official documentation at http://www.isapirewrite.com/docs/. Ionic’s ISAPI rewriting module has similar functionality.

In the first exercise we’ll create a simple rewrite rule that translates my-super-product.html to Product.aspx?ProductID=123. This is the exact scenario that was presented in Figure 3-3.

The Product.aspx Web Form is designed to simulate a real product page. The script receives a query string parameter named ProductID, and generates a very simple output message based on the value of this parameter. Figure 3-4 shows the sample output that you’ll get by loading http://seoasp/Product.aspx?ProductID=3.

131473 fg0304

Figure 3-4

In order to improve search engine friendliness, we want to be able to access the same page through a static URL: http://seoasp/my-super-product.html. To implement this feature, we’ll use-you guessed it!-URL rewriting, using

Helicon

‘s ISAPI_Rewrite.

As you know, what ISAPI_Rewrite basically does is to translate an input string (the URL typed by your visitor) to another string (a URL that can be processed by your ASP.NET code). In this exercise we’ll make it rewrite my-super-product.html to Product.aspx?ProductID=123.

This article covers ISAPI_Rewrite version 2. At the moment of writing, ISAPI_Rewrite 3.0 is in beta testing. The new version comes with an updated syntax for the configuration files and rewriting rules, which is compatible to that of the Apache mod_rewrite module, which is the standard rewriting engine in the Apache world. Please visit Cristian’s web page dedicated to this book, http://www.cristiandarie.ro/seo-asp/, for updates and additional information regarding the following exercises.

Exercise: Using

Helicon

‘s ISAPI_Rewrite

1. The first step is to install ISAPI_Rewrite. Navigate to http://www.helicontech.com/download.htmand download ISAPI_Rewrite Lite (freeware). The file name should be something like isapi_rwl_x86.msi. At the time of writing, the full (not freeware) version of the product comes in a different package if you’re using Windows Vista and IIS 7, but the freeware edition is the same for all platforms.

2. Execute the MSI file you just downloaded, and install the application using the default options all the way through.

If you run into trouble, you should visit the Installation section of the product’s manual, at http://www.isapirewrite.com/docs/#install. If you run Windows

Vista

, you need certain IIS modules to be installed in order for ISAPI_Rewrite to function. If you configured IIS as shown in Chapter 1 of the book Professional Search Engine Optimization with ASP.NET: A Developer’s Guide to SEO, you already have everything you need, and the installation of ISAPI_Rewrite should run smoothly.

3. Make sure your IIS web server is running and open the http://seoasp/ web site using Visual Web Developer. (Code samples for this demo site are available from Wrox at http://www.wrox.com/WileyCDA/WroxTitle/productCd-0470131470,descCd-download_code.html.)

4. Create a new Web Form named Product.aspx in your project, with no code-behind file or Master Page. Then modify the generated code as shown in the following code snippet. (Remember that you can have Visual Web Developer generate the Page_Load signature for you by switching to Design view, and double-clicking an empty area of the page or using the Properties window.)

<%@ Page Language="C#" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<script runat="server">

protected void Page_Load(object sender, EventArgs e)

{

// retrieve the product ID from the query string

string productId = Request.QueryString["ProductID"];

// use productId to customize page contents

if (productId != null)

{

// set the page title

this.Title += ": Product " + productId;

// display product details

message.Text =

String.Format("You selected product #{0}. Good choice!", productId);

}

else

{

// display product details

message.Text = "Please select a product from our catalog.";

}

}

</script>

<html xmlns="http://www.w3.org/1999/xhtml" >

<head runat="server">

<title>ASP.NET SEO Shop</title>

</head>

<body>

<form id="form1" runat="server">

<asp:Literal runat="server" ID="message" />

</form>

</body>

</html>

5. Test your Web Form by loading http://seoasp/Product.aspx?ProductID=3. The result should resemble Figure 3-4.

6. Let’s now write the rewriting rule. Open the Program Files/Helicon/ISAPI_Rewrite/httpd.ini file (you can find a shortcut to this file in Programs), and add the following highlighted lines to the file. Note the file is read-only by default. If you use Notepad to edit it, you’ll need to make it writable first.

[ISAPI_Rewrite]

# Translate /my-super.product.html to /Product.aspx?ProductID=123

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123

7. Switch back to your browser again, and this time load http://seoasp/my-super-product.html. If everything works as it should, you should get the output that’s shown in Figure 3-5.

131473 fg0305

Figure 3-5

Congratulations! You’ve just written your first rewrite rule using

Helicon

‘s ISAPI_Rewrite. The free edition of this product only allows server-wide rewriting rules, whereas the commercial edition would allow you to use an application-specific httpd.ini configuration file, located in the root of your web site. However, this limitation shouldn’t affect your learning process.

The exercise you’ve just finished features a very simplistic scenario, without much practical value-at least compared with what you’ll learn next! Its purpose was to install ISAPI_Rewrite, and to ensure your working environment is correctly configured.

You started by creating a very simple ASP.NET Web Form that takes a numeric parameter from the query string. You could imagine this is a more involved page that displays lots of details about the product with the ID mentioned by the ProductID query string parameter, but in our case we’re simply displaying a text message that confirms the ID has been correctly read from the query string.

Product.aspx is indeed very simple! It starts by reading the product ID value:

protected void Page_Load(object sender, EventArgs e)

{

// retrieve the product ID from the query string

string productId = Request.QueryString["ProductID"];

Next, we verify if the value we just read is null. If that is the case, then ProductID doesn’t exist as a query string parameter. Otherwise, we display a simple text message, and update the page title, to confirm that ProductID was correctly read:

// use productId to customize page contents

if (productId != null)

{

// set the page title

this.Title += ": Product " + productId;

// display product details

message.Text =

String.Format("You selected product #{0}. Good choice!", productId);

}

else

{

// display product details

message.Text = "Please select a product from our catalog.";

}

URL Rewriting and ISAPI_Rewrite

As Figure 3-3 describes, the Product.aspx page is accessed after the original URL has been rewritten. This explains why Request.QueryString["ProductID"] reads the value of ProductID from the rewritten version of the URL. This is helpful, because the script works fine no matter if you accessed Product.aspx directly, or if the initial request was for another URL that was rewritten to Product.aspx.

The Request.QueryString collection, as well as the other values you can read through the Request object, work with the rewritten URL. For example, when requesting my-super-product.html in the context of our exercise, Request.RawUrl will return /Product.aspx?ProductID=123.

The rewriting engine allows you to retrieve the originally requested URL by saving its value to a server variable named HTTP_X_REWRITE_URL. You can read this value through Request.ServerVariables["HTTP_X_REWRITE_URL"].This is helpful whenever you need to know what was the original request initiated by the client.

The Request class offers complete details about the current request. The following table describes the most commonly used Request members. You should visit the documentation for the complete list, or use IntelliSense in Visual Web Developer to quickly access the class members.

Server Variable

Description

Request.RawURL

Returns a string representing the URL of the request excluding the domain name, such as /Product.aspx?ID=123. When URL rewriting is involved, RawURL returns the rewritten URL.

Request.Url

Similar to Request.RawURL, except the return value is a Uri object, which also contains data about the request domain.

Request.PhysicalPath

Returns a string representing the physical path of the requested file, such as C:\seoasp\Product.aspx.

Request.QueryString

Returns a NameValueCollection object that contains the query string parameters of the request. You can use this object’s indexer to access its values by name or by index, such as in Request.QueryString[0] or Request.QueryString[ProductID].

Request.Cookies

Returns a NameValueCollection object containing the client’s cookies.

Request.Headers

Returns a NameValueCollection object containing the request headers.

Request.ServerVariables

Returns a NameValueCollection object containing IIS variables.

Request.ServerVariables[HTTP_X_REWRITE_URL]

Returns a string representing the originally requested URL, when the URL is rewritten by

Helicon ‘s ISAPI_Rewrite or IIRF (Ionic ISAPI Rewrite).





 

After testing that Product.aspx works when accessed using its physical name (http://seoasp/Product.aspx?ProductID=123), we moved on to access this same script, but through a URL that doesn’t physically exist on your server. We implemented this feature using

Helicon

‘s ISAPI_Rewrite.

As previously stated, the free version of

Helicon

‘s ISAPI_Rewrite only supports server-wide rewriting rules, which are stored in a file named httpd.ini in the product’s installation folder (\Program Files\Helicon\ISAPI_Rewrite). This file has a section named [ISAPI_Rewrite], usually at the beginning of the file, which can contain URL rewriting rules.

We added a single rule to the file, which translates requests to /my-super-product.html to /Product.aspx?ProductID=123. The line that precedes the RewriteRule line is a comment; comments are marked using the # character at the beginning of the line, and are ignored by the parser:

# Translate my-super.product.html to /Product.aspx?ProductID=123

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123

In its basic form, RewriteRule takes two parameters. The first parameter describes the original URL that needs to be rewritten, and the second specifies what is should be rewritten to. The pattern that describes the form of the original URL is delimited by ^ and $, which mark the beginning and the end of the matched URL. The pattern is written using regular expressions, which you learn about in the next exercise.

In case you were wondering why the .html extension in the rewrite rule has been written as \.html, we will explain it now. In regular expressions-the programming language used to describe the original URL that needs to be rewritten-the dot is a character that has a special significance. If you want that dot to be read as a literal dot, you need to escape it using the backslash character. As you’ll learn, this is a general rule with regular expressions: when special characters need to be read literally, they need to be escaped with the backslash character (which is a special character in turn-so if you wanted to use a backslash, it would be denoted as \\).

At the end of a rewrite rule you can also add one or more flag arguments, which affect the rewriting behavior. For example, the [L] flag, demonstrated in the following example, specifies that when a match is found the rewrite should be performed immediately, without processing any further RewriteRule entries:

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123 [L]

These arguments are specific to the RewriteRule command, and not to regular expressions in general. Table 3-1 lists the possible RewriteRule arguments. The rewrite flags must always be placed in square brackets at the end of an individual rule.

Table 3-1

RewriteRule Option

Significance

Description

I

Ignore case

The regular expression of the RewriteRule and any corresponding RewriteCond directives is performed using case-insensitive matching.

F

Forbidden

In case the RewriteRule regular expression matches, the web server returns a 404 Not Found response, regardless of the format string (second parameter of RewriteRule) specified. Read Chapter 4 for more details about the HTTP status codes.

L

Last rule

If a match is found, stop processing further rules.

N

Next iteration

Restarts processing the set of rules from the beginning, but using the current rewritten URL. The number of restarts is limited by the value specified with the RepeatLimit directive.

NS

Next iteration of the same rule

Restarts processing the rule, using the rewritten URL. The number of restarts is limited by the value specified with the RepeatLimit directive, and is calculated independently of the number of restarts counted for the N directive.

P

Proxy

Immediately passes the rewritten URL to the ISAPI extension that handles proxy requests. The new URL must be a complete URL that includes the protocol, domain name, and so on.

R

Redirect

Sends a 302 redirect status code to the client pointing to the new URL, instead of rewriting the URL. This is always the last rule, even if the L flag is not specified.

RP

Permanent redirect

The same as R, except the 301 status code is used instead.

U

Unmangle log

Log the new URL as it was the originally requested URL.

O

Normalize

Normalize the URL before processing by removing illegal characters, and so on, and also deletes the query string.

CL

Lowercase

Changes the rewritten URL to lowercase.

CU

Uppercase

Changes the rewritten URL to uppercase.

 

Also, you should know that although RewriteRule is arguably the most important directive that you can use for URL rewriting with

Helicon

‘s ISAPI_Rewrite, it is not the only one. Table 3-2 quickly describes a few other directives. Please visit the product’s documentation for a complete reference.

Table 3-2

Directive

Description

RewriteRule

This is the directive that allows for URL rewriting.

RewriteHeader

A generic version of RewriteRule that can rewrite any HTTP headers of the request. RewriteHeader URL is the same as RewriteRule.

RewriteProxy

Similar to RewriteRule, except it forces the result URL to be passed to the ISAPI extension that handles proxy requests.

RewriteCond

Allows defining one or more conditions (when more RewriteCond entries are used) that must be met before the following RewriteRule, RewriteHeader, or RewriteProxy directive is processed.

Introducing Regular Expressions

Before you can implement any really useful rewrite rules, it’s important to learn about regular expressions. We’ll teach them now, while discussing ISAPI_Rewrite, but regular expressions will also be needed when implementing other URL-related tasks, or when performing other kinds of string matching and parsing-so pay attention to this material.

Many love regular expressions, whereas others hate them. Many think they’re very hard to work with, whereas many (or maybe not so many) think they’re a piece of cake. Either way, they’re one of those topics you can’t avoid when URL rewriting is involved. We’ll try to serve a gentle introduction to the subject, although entire books have been written on the subject. The Wikipedia page on regular expressions is great for background information (http://en.wikipedia.org/wiki/Regular_expression).

Appendix A of this book is a generic introduction to regular expressions. You should read it if you find that the theory in the following few pages-which is a fast-track introduction to regular expressions in the context of URL rewriting-is too sparse. For comprehensive coverage of regular expressions we recommend Andrew Watt’s Beginning Regular Expressions (Wrox, 2005).

A regular expression (sometimes referred to as a regex) is a special string that describes a text pattern. With regular expressions you can define rules that match groups of strings, extract data from strings, and transform strings, which enable very flexible and complex text manipulation using concise rules. Regular expressions aren’t specific to ISAPI_Rewrite, or even to URL rewriting in general. On the contrary, they’ve been around for a while, and they’re implemented in many tools and programming languages, including the .NET Framework-and implicitly ASP.NET.

To demonstrate their usefulness with a simple example, we’ll assume your web site needs to rewrite links as shown in Table 3-3.

Table 3-3

Original URL

Rewritten URL

Products/P1.html

Product.aspx?ProductID=1

Products/P2.html

Product.aspx?ProductID=2

Products/P3.html

Product.aspx?ProductID=3

Products/P4.html

Product.aspx?ProductID=4

If you have100,000 products, without regular expressions you’d be in a bit of a trouble, because you’d need to write just as many rules-no more, no less. You don’t want to manage a configuration file with 100,000 rewrite rules! That would be unwieldy.

However, if you look at the Original URL column of the table, you’ll see that all entries follow the same pattern. And as suggested earlier, regular expressions can come to rescue! Patterns are useful because with a single pattern you can match a theoretically infinite number of possible input URLs, so you just need to write a rewriting rule for every type of URL you have in your web site.

In the exercise that follows, we’ll use a regular expression that matches Products/Pn.html, and we’ll use ISAPI_Rewrite to translate URLs that match that pattern to Product.aspx?ProductID=n. This will implement exactly the rules described in Table 3-3.

Exercise: Working with Regular Expressions

1. Open the httpd.ini configuration file and add the following rewriting rule to it.

[ISAPI_Rewrite]

# Defend your computer from some worm attacks

RewriteRule .*(?:global.asa|default\.ida|root\.exe|\.\.).* . [F,I,O]

# Translate my-super.product.html to /Product.aspx?ProductID=123

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123

# Rewrite numeric URLs

RewriteRule ^/Products/P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

2. Switch back to your browser, and load http://seoasp/Products/P1.html. If everything works as planned, you will get the output that’s shown in Figure 3-7.

131473 fg0307

Figure 3-7

3. You can check that the rule really works, even for IDs formed of more digits. Loading http://seoasp/Products/P123456.html would give you the output shown in Figure 3-8.

131473 fg0308

Figure 3-8

Note that by default, regular expression matching is case sensitive. So the regular expression in your RewriteRule directive will match /Products/P123.html, but will not match /products/p123.html, for example. Keep this in mind when performing your tests. To make the matching case sensitive, you need to use the [I] RewriteRule flag, as you’ll soon learn.

Congratulations! The exercise was quite short, but you’ve written your first "real" regular expression! Let’s take a closer look at your new rewrite rule:

RewriteRule ^/Products/P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

If this is your first exposure to regular expressions, it must look scary! Just take a deep breath and read on: we promise, it’s not as complicated as it looks.

As you learned in the previous exercise, a basic RewriteRule takes two arguments. In our example it also received a special flag-[L]-as a third argument. We’ll discuss the meaning of these arguments next.

The first argument of RewriteRule is a regular expression that describes the matching URLs we want to rewrite. The second argument specifies the destination (rewritten) URL-this is not a regular expression. So, in geek-speak, the RewriteRule line from the exercise basically says: "rewrite any URL that matches the ^/Products/P([0-9]+)\.html$ pattern to /Product.aspx?ProductID=$1." In English, the same line can be roughly read as: "delegate any request to a URL that looks like /Products/Pn.html to /Product.aspx?ProductID=n."

In regular expressions, most characters, including alphanumeric characters, are read literally and simply match themselves. Remember the first RewriteRule you’ve written in this chapter to match my-super-product.html, which was mostly created of such "normal" characters. However, what makes regular expressions so powerful (and sometimes complicated), are the special characters (or metacharacters), such as ^, ., or *, which have special meanings. Table 3-4 describes the most frequently used metacharacters.

Table 3-4

Metacharacter

Description

^

Matches the beginning of the line. In our case, it will always match the beginning of the URL. The domain name isn’t considered part of the URL, as far RewriteRule is concerned. It is useful to think of ^ as "anchoring" the characters that follow to the beginning of the string, that is, asserting that they be the first part.

.

Matches any single character.

*

Specifies that the preceding character or expression can be repeated zero or more times-not at all to an infinite number of times.

+

Specifies that the preceding character or expression can be repeated one or more times. In other words, the preceding character or expression must match at least once.

?

Specifies that the preceding character or expression can be repeated zero or one time. In other words, the preceding character or expression is optional.

{m,n}

Specifies that the preceding character or expression can be repeated between m and n times; m and n are integers, and m needs to be lower than n.

( )

The parentheses are used to define a captured expression. The string matching the expression between parentheses can be then read as a variable. The parentheses can also be used to group the contents therein, as in mathematics, and operators such as *, +, or ? can then be applied to the resulting expression.

[ ]

Used to define a character class. For example, [abc] will match any of the characters a, b, c. The – character can be used to define a range of characters. For example, [a-z] matches any lowercase letter. If – is meant to be interpreted literally, it should be the last character before ]. Many metacharacters lose their special function when enclosed between [ and ], and are interpreted literally.

[^ ]

Similar to [ ], except it matches everything except the mentioned character class. For example, [^a-c] matches all characters except a, b, and c.

$

Matches the end of the line. In our case, it will always match the end of the URL. It is useful to think of it as "anchoring" the previous characters to the end of the string, that is, asserting that they be the last part.

\

The backslash is used to escape the character that follows. It is used to escape metacharacters when we need them to be taken for their literal value, rather than their special meaning. For example, \. will match a dot, rather than "any character" (the typical meaning of the dot in a regular expression). The backslash can also escape itself-so if you want to match C:\Windows, you’ll need to refer to it as C:\\Windows.

Using Table 3-4 as reference, let’s analyze the expression ^/Products/P([0-9]+)\.html$. The expression starts with the ^ character, matching the beginning of the requested URL (remember, this doesn’t include the domain name). The characters /Products/P assert that the next characters in the URL string match those characters.

Let’s recap: the expression ^/Products/P will match any URL that starts with /Products/P.

The next characters, ([0-9]+), are the crux of this process. The [0-9] bit matches any character between 0 and 9 (that is, any digit), and the + that follows indicates that the pattern can repeat one or more times, so we can have an entire number rather than just a digit. The enclosing round parentheses around [0-9]+ indicate that the regular expression engine should store the matching string (which will be a digit or number) inside a variable called $1. (We’ll need this variable to compose the rewritten URL.)

Finally, we have \.html$, which means that string should end in .html. The \ is the escaping character that indicates that the . should be taken as a literal dot, not as "any character" (which is the significance of the . metacharacter). The $ matches the end of the string.

The second argument of RewriteRule, /Product.aspx?ProductID=$1, plugs the digit or number extracted by the matching regular expression into the $1 variable. If the regular expression matched more than one string, the subsequent matches could be referenced as $2, $3, and so on. You’ll meet several such examples later in this book.

The second argument of RewriteRule isn’t written using the regular expression language. Indeed, it doesn’t need to, because it’s not meant to match anything. Instead, it simply supplies the form of the rewritten URL. The only part with a special significance here are the variables ($1, $2, and so on) whose values are extracted from the expressions written between parentheses in the first argument of RewriteRule.

As you can see, this rule does indeed rewrite any request for a URL that looks like /Products/Pn.html to Product.aspx?ProductID=n, which can be executed by our Product.aspx page. The [L] makes sure that if a match is found, the rewriting rules that follow won’t be processed.

RewriteRule ^/Products/P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

This is particularly useful if you have a long list of RewriteRule commands, because using [L] improves performance and prevents ISAPI_Rewrite from processing all the RewriteRule commands that follow once a match is found. This is usually what we want regardless.

Helicon’s ISAPI_Rewrite ships with a regular expression tester application, which allows you to verify if a certain rewriting rule matches a test string. The application is named RXTest.exe, and is located in the product’s installation folder (by default Program Files\Helicon\ISAPI_Rewrite\).

Rewriting Numeric URLs with Two Parameters

What you’ve accomplished in the previous exercise is rewriting numeric URLs with one parameter. We’ll now expand that little example to also rewrite URLs with two parameters. The URLs with one parameter that we support looks like http://seoasp/Products/Pn.html. Now we’ll assume that our links need to support links that include a category ID as well, in addition to the product ID. The new URLs will look like:

http://seoasp/Products/C2/P1.html

The existing Product.aspx script will be modified to handle links such as:

http://seoasp/Product.aspx?CategoryID=2&ProductID=1

As a quick reminder, here’s the rewriting rule you used for numeric URLs with one parameter:

RewriteRule ^/Products/P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

For rewriting two parameters, the rule would be a bit longer, but not much more complex:

RewriteRule ^/Products/C([0-9]+)/P([0-9]+)\.html$ @@ta /Product.aspx?CategoryID=$1&ProductID=$2 [L]

Let’s put this to work in a quick exercise.

Exercise: Rewriting Numeric URLs

1. Modify your Product.aspx page that you created in the previous exercise, like this:

<%@ Page Language="C#" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<script runat="server">

protected void Page_Load(object sender, EventArgs e)

{

// retrieve the product ID and category ID from the query string

string productId = Request.QueryString["ProductID"];

string categoryId = Request.QueryString["CategoryID"];

// use productId to customize page contents

if (productId != null && categoryId == null)

{

// set the page title

this.Title += ": Product " + productId;

// display product details

message.Text =

String.Format("You selected product #{0}. Good choice!", productId);

}

// use productId and categoryId to customize page contents

else if (productId != null && categoryId != null)

{

// set the page title

this.Title +=

String.Format(": Product {0}: Category {1}", productId, categoryId);

// display product details

message.Text =

String.Format("You selected product #{0} in category #{1}. Good choice!",

productId, categoryId);

}

else

{

// display product details

message.Text = "Please select a product from our catalog.";

}

}

</script>

<html xmlns="http://www.w3.org/1999/xhtml" >

<head runat="server">

<title>ASP.NET SEO Shop</title>

</head>

<body>

<form id="form1" runat="server">

<asp:Literal runat="server" ID="message" />

</form>

</body>

</html>

2. Test your script with a URL that contains just a product ID, such as http://seoasp/Products/P123456.html, to ensure that the old functionality still works. The result should resemble Figure 3-8.

3. Now test your script by loading http://seoasp/Product.aspx?CategoryID=5&ProductID=99. You should get the output shown in Figure 3-9.

131473 fg0309

Figure 3-9

4. Add a new rewriting rule to the httpd.ini file as shown here:

[ISAPI_Rewrite]

# Defend your computer from some worm attacks

RewriteRule .*(?:global.asa|default\.ida|root\.exe|\.\.).* . [F,I,O]

# Translate my-super.product.html to /Product.aspx?ProductID=123

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123

# Rewrite numeric URLs that contain a product ID

RewriteRule ^/Products/P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

# Rewrite numeric URLs that contain a product ID and a category ID

RewriteRule ^/Products/C([0-9]+)/P([0-9]+)\.html$ @@ta

/Product.aspx?CategoryID=$1&ProductID=$2 [L]

Note that the entire RewriteRule command and its parameters must be written on a single line in your httpd.ini file. If you split it in two lines as printed in the book, it will not work.

5. Load http://seoasp/Products/C5/P99.html, and expect to get the same output as with the previous request, as shown in Figure 3-10.

131473 fg0310

Figure 3-10

In this example you started by modifying Product.aspx to accept URLs that accept a product ID and a category ID. Then you added URL rewriting support for URLs with two numeric parameters. You created a rewriting rule to your httpd.ini file, which handles URLs with two parameters:

RewriteRule ^/Products/C([0-9]+)/P([0-9]+)\.html$ @@ta

/Product.aspx?CategoryID=$1&ProductID=$2 [L]

The rule looks a bit complicated, but if you look carefully, you’ll see that it’s not so different from the rule handling URLs with a single parameter. The rewriting rule has now two parameters-$1 is the number that comes after /Products/C, and is defined by ([0-9]+), and the second parameter, $2, is the number that comes after /P.

The result is that we now delegate any URL that looks like /Products/Cm/Pn.html to /Product.aspx?CategoryID=m&ProductID=n.

Rewriting Keyword-Rich URLs

Here’s where the real fun begins! This kind of URL rewriting is a bit more complex, and there are more strategies you could take. When working with rewritten numeric URLs, it was relatively easy to extract the product and category IDs from a URL such as /Products/C5/P9.html, and rewrite the URL to Product.aspx?CategoryID=5&ProductID=9.

A keyword-rich URL doesn’t necessarily have to include any IDs. Take a look at this one:

http://www.example.com/Products/Tools/Super-Drill.html

(You met a similar example in the first exercise of this chapter, where you handled the rewriting of http://seoasp/my-super-product.html.)

This URL refers to a product named "Super Drill" located in a category named "Tools." Obviously, if you want to support this kind of URL, you need some kind of mechanism to find the IDs of the category and product the URL refers to.

One solution that comes to mind is to add a column in the product information table that associates such beautified URLs to "real" URLs that your application can handle. In such a request you could look up the information in the Category and Product tables, get their IDs, and use them. We demonstrate this technique in an exercise later in this chapter.

We also have a solution for those who prefer an automated solution that doesn’t involve a lookup database. This solution still brings the benefits of a keyword-rich URL, while being easier to implement. Look at the following URLs:

http://www.example.com/Products/Super-Drill-P9.html

http://www.example.com/Products/Tools-C5/Super-Drill-P9.html

These URLs include keywords. However, we’ve sneaked IDs in these URLs, in a way that isn’t unpleasant to the human eye, and doesn’t distract attention from the keywords that matter, either. In the case of the first URL, the rewriting rule can simply extract the number that is tied at the end of the product name (-P9), and ignore the rest of the URL. For the second URL, the rewriting rule can extract the category ID (-C5) and product ID (-P9), and then use these numbers to build a URL such as Product.aspx?CategoryID=5&ProductID=9.

This book generally uses such keyword-rich URLs, which also contain item IDs. Later in this chapter, however, you’ll be taught how to implement ID-free keyword-rich URLs as well.

The rewrite rule for keyword-rich URLs with a single parameter looks like this:

RewriteRule ^/Products/.*-P([0-9]+)\.html?$ /Product.aspx?ProductID=$1 [L]

The rewrite rule for keyword-rich URLs with two parameters looks like this:

RewriteRule ^/Products/.*-C([0-9]+)/.*-P([0-9]+)\.html$ @@ta /Product.aspx?CategoryID=$1&ProductID=$2 [L]

Let’s see these rules at work in an exercise.

Exercise: Rewriting Keyword-Rich URLs

1. Modify the httpd.ini configuration file like this:

[ISAPI_Rewrite]

# Rewrite numeric URLs that contain a product ID

RewriteRule ^/Products/P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

# Rewrite numeric URLs that contain a product ID and a category ID

RewriteRule ^/Products/C([0-9]+)/P([0-9]+)\.html$ @@ta

/Product.aspx?CategoryID=$1&ProductID=$2 [L]

# Rewrite keyword-rich URLs with a product ID and a category ID

RewriteRule ^/Products/.*-C([0-9]+)/.*-P([0-9]+)\.html$ @@ta

/Product.aspx?CategoryID=$1&ProductID=$2 [L]

# Rewrite keyword-rich URLs with a product ID

RewriteRule ^/Products/.*-P([0-9]+)\.html$ /Product.aspx?ProductID=$1 [L]

2. Load http://seoasp/Products/Tools-C5/Super-Drill-P9.html, and voila, you should get the result that’s shown in Figure 3-12.

131473 fg0312

Figure 3-12

3. To test the rewrite rule that matches product keyword-rich URLs that don’t include a category, try loading http://seoasp/Products/Super-Drill-P9.html. The result should be the expected one.

There’s one interesting gotcha for you to keep in mind when developing web applications, especially when they use URL rewriting. Your web browser sometimes caches the results returned by your URLs-which can lead to painful debugging experiences-so we recommend that you disable your browser’s cache during developing.

You now have two new rules in your httpd.ini file, and they are working beautifully! The first rule handles keyword-rich URLs that include a product ID and a category ID, and the second rule handles keyword-rich URLs that include only a product ID. Note that the order of these rules is important, because the second rule matches the URLs that are meant to be captured by the first rule. Also remember that because we didn’t use the [I] flag, the matching is case sensitive.

The first new rule matches URLs that start with the string /Products/, then contain a number of zero or more characters (.*), followed by -C. This is expressed by ^/Products/.*-C. The next characters must be one or more digits, which as a whole are saved to the $1 variable, because the expression is written between parentheses -([0-9]+). This first variable extracted from the URL, $1, is the category ID.

After the category ID, the URL must contain a slash, then zero or more characters (.*), then -P, as expressed by /.*-P. Afterwards, another captured group follows, to extract the ID of the product, ([0-9]+), which becomes the $2 variable. The final bit of the regular expression, \.html$, specifies the URL needs to end in .html.

The two extracted values, $1 and $2, are used to create the new URL, /Product.aspx?CategoryID=$1&ProductID=$2.

The second rewrite rule you implemented is a simpler version of this one.

Technical Considerations

Apart from basic URL rewriting, no matter how you implement it, you need to be aware of additional technical issues you may encounter when using such techniques in your web sites:

* If your web site contains ASP.NET controls or pages that generate postback events that you handled at server-side, you need to perform additional changes to your site so that it handles the postbacks correctly.

* You need to make sure the relative links in your pages point to the correct absolute locations after URL rewriting.

Let’s deal with these issues one at a time.

Handling Postbacks Correctly

Although they appear to be working correctly, the URL-rewritten pages you’ve loaded in all the exercises so far have a major flaw: they can’t handle postbacks correctly. Postback is the mechanism that fires server-side handlers as response of client events by submitting the ASP.NET form. In other words, a postback occurs every time a control in your page that has the runat="server" attribute fires an event that is handled at server-side with C# or VB.NET code.

To understand the flaw in our solution, add the following button into the form in Product.aspx:

<body>

<form id="form1" runat="server">

<asp:Literal runat="server" ID="message" />

<asp:Button ID="myButton" runat="server" Text="Click me!" />

</form>

</body>

Switch the form to Design view, and double-click the button in the designer to have Visual Web Developer generate its Click event handler for you. Then complete its code by adding the following line:

protected void myButton_Click(object sender, EventArgs e)

{

message.Text += "<br />You clicked the button!";

}

Alright, you have one button that displays a little message when clicked. To test this button, load http://seoasp/Product.aspx, and click the button to ensure it works as expected. The result should resemble that in Figure 3-16. (Note that clicking it multiple times doesn’t display additional text, because the contents of the Literal control used for displaying the message is refreshed on every page load.)

131473 fg0316

Figure 3-16

Now, load the same Product.aspx form, but this time using a rewritten URL. I’ll choose http://seoasp/Products/Super-AJAX-PHP-Book-P35.html, which should be properly handled by your existing code and rewritten to http://seoasp/Product.aspx?ProductID=35. Then click the button. Oops! You’ll get an error, as shown in Figure 3-17.

131473 fg0317

Figure 3-17

If you look at the new URL in the address bar of your web browser, you can intuit what happens: the page is unaware that it was loaded using a rewritten URL, and it submits the form to the wrong URL-in this example, http://seoasp/Products/Product.aspx?ProductID=35. The presence of the Products folder in the initial URL broke the path to which the form is submitted.

The new URL doesn’t exist physically in our web site, and it’s also not handled by any rewrite rules. This happens because the action attribute of the form points back to the name of the physical page it’s located on, which in this case is Products.aspx (this behavior isn’t configurable via properties). This can be verified simply by looking at the HTML source of the form, before clicking the button:

<form name="form1" method="post" action="Product.aspx?ProductID=35" id="form1">

When this form is located on a page that contains folders, the action path will be appended to the path including the folders. When URL rewriting is involved, it’s easy to intuit that this behavior isn’t what we want. Additionally, even if the original path doesn’t contain folders, the form still submits to a dynamic URL, rendering our URL rewriting efforts useless.

To overcome this problem, there are three potential solutions. The first works with any version of ASP.NET, and involves creating a new HtmlForm class that removes the action attribute, like this:

namespace ActionlessForm

{

public class Form : System.Web.UI.HtmlControls.HtmlForm

{

protected override void RenderAttributes(System.Web.UI.HtmlTextWriter writer)

{

Attributes.Add("enctype", Enctype);

Attributes.Add("id", ClientID);

Attributes.Add("method", Method);

Attributes.Add("name", Name);

Attributes.Add("target", Target);

Attributes.Render(writer);

}

}

}

If you save this file as ActionlessForm.cs, you can compile it into a library file using the C# compiler, like this:

csc.exe /target:library ActionlessForm.cs

The default location of the .NET 2.0 C# compiler is \windows\microsoft.net\framework\v2.0.50727\csc.exe. Note that you may need to download and install the Microsoft .NET Software Development Kit to have access to the C# compiler. To create libraries you can also use Visual C# 2005 Express Edition, in which case you don’t need to compile the C# file yourself. Copying the resulted file, SuperHandler.dll, to the Bin folder of your application would make it accessible to the rest of the application. Then you’d need to replace all the <form> elements in your Web Forms and Master Pages with the new form, like this:

<%@ Page Language="C#" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<%@ Register TagPrefix="af" Namespace="ActionlessForm" Assembly="ActionlessForm" %>

<html xmlns="http://www.w3.org/1999/xhtml" >

<head id="Head1" runat="server">

<title>ASP.NET SEO Shop</title>

</head>

<body>

<af:form id="form1" runat="server">

<asp:Literal runat="server" ID="message" />

<asp:Button ID="myButton" runat="server" Text="Click me!" OnClick="myButton_Click" />

</af:form>

</body>

</html>

Needless to say, updating all your Web Forms and Master Pages like this isn’t the most elegant solution in the world, but it’s the best option you have with ASP.NET 1.x. Fortunately, ASP.NET 2.0 offers a cleaner solution, which doesn’t require you to alter your existing pages, and it consists of using the ASP.NET 2.0 Control Adapter extensibility architecture. This method is covered by Scott Guthrie in his article at http://weblogs.asp.net/scottgu/archive/2007/02/26/tip-trick-url-rewriting-with-asp-net.aspx.

The last solution implies using Context.RewritePath to rewrite the current path to /?, effectively stripping the action tag of the form. This technique is demonstrated in the case study in Chapter 14 in Professional Search Engine Optimization with ASP.NET: A Developer’s Guide to SEO, but as you’ll see, it’s not recommended that you use it in more complex applications because of the restrictions it implies on your code, and its potential side effects.

Absolute Paths and ~/

Another potential problem when using URL rewriting is that relative links will stop working when folders are used. For example, a link to /image.jpg in Product.aspx would be translated to http://seoasp/image.jpgif read from Product.aspx?ProductID=10, or to http://seoasp/Products/image.jpg if read through a rewritten URL such as http://seoasp/Products/P-10.html. To avoid such problems, you should use at least one of the following two techniques:

* Always use absolute paths. Creating a URL factory library, as shown later in this chapter, can help with this task.

* Use the ~ syntax supported by ASP.NET controls. The ~ symbol always references the root location of your application, and it is replaced by its absolute value when the controls are rendered by the server.

Problems Rewriting Doesn’t Solve

URL rewriting is not a panacea for all dynamic site problems. In particular, URL rewriting in and of itself does not solve any duplicate content problems. If a given site has duplicate content problems with a dynamic approach to its URLs, the problem would likely also be manifest in the resulting rewritten static URLs as well. In essence, URL rewriting only obscures the parameters-however many there are, from the search engine spider’s view. This is useful for URLs that have many parameters as we mentioned. Needless to say, however, if the varying permutations of obscured parameters do not dictate significant changes to the content, the same duplicate content problems remain.

A simple example would be the case of rewriting the page of a product that can exist in multiple categories. Obviously, these two pages would probably show duplicate (or very similar content) even if accessed through static-looking links, such as:

http://www.example.com/College-Books-C1/Some-Book-Title-P2.html

http://www.example.com/Out-of-Print-Books-C2/Some-Book-Title-P2.html

Additionally, in the case that you have duplicate content, using static-looking URLs may actually exacerbate the problem. This is because whereas dynamic URLs make the parameter values and names obvious, rewritten static URLs obscure them. Search engines are known to, for example, attempt to drop a parameter it heuristically guesses is a session ID and eliminate duplicate content. If the session parameter were rewritten, a search engine would not be able to do this at all.

There are solutions to this problem. They typically involve removing any parameters that can be avoided, as well as excluding any of the remaining the duplicate content. These solutions are explored in depth in the chapter on duplicate content.

A Last Word of Caution

URLs are much more difficult to revise than titles and descriptions once a site is launched and indexed. Thus, when designing a new site, special care should be devoted to them. Changing URLs later requires one to redirect all of the old URLs to the new ones, which can be extremely tedious, and has the potential to influence rankings for the worse if done improperly and link equity is lost. Even the most trivial changes to URL structure should be accompanied by some redirects, and such changes should only be made when it is absolutely necessary.

This is relatively simple process. In short, you use the URL factory that we just created to create the new URLs based on the parameters in the old dynamic URLs. Then you employ what is called a "301-redirect" to the new URLs. The various types of redirects are discussed in the following chapter.

So, if you are retrofitting a web application that is powering a web site that is already indexed by search engines, you must redirect the old dynamic URLs to the new rewritten ones. This is especially important, because without doing this every page would have a duplicate and result in a large quantity of duplicate content. You can safely ignore this discussion, however, if you are designing a new web site.

Summary

We covered a lot of material here! We detailed how to employ static-looking URLs in a dynamic web site step-by-step. Such URLs are both search engine friendly and more enticing to the user. This can accomplished through several techniques, and you’ve tested the most popular of them in this chapter. A "URL factory" can be used to enforce consistency in URLs. It is important to realize, however, that URL rewriting is not a panacea for all dynamic site problems-in particular, duplicate content problems.

IT技术, 互联网技术

w3wp.exe占用CPU高达100% 要怎么解决

2009年8月24日
w3wp.exe占用CPU高达100% 要怎么解决已关闭评论

服务器正常运行CUP一般应该都在60%以下,有时候CUP出现上下波动很大,或者是服务器突然很卡,或很慢.查看任务管理器,可以发现很多的 w3wp.exe消耗CPU,结束后立即有新的w3wp.exe出现占CPU,管理员在这种情况.只好重新启动IIS服务,奇怪的是,重新启动IIS服务后一切正常,但可能过了一段时间后,问题又再次出现了。

w3wp.exe占用CPU高直接原因:
有一个或多个ACCESS数据库在多次读写过程中损坏,微软的MDAC系统在写入这个损坏的ACCESS文件时,ASP线程处于BLOCK状态,结果其他线程只能等待,IIS被死锁了,全部的CPU时间都消耗在w3wp.exe中。

w3wp.exe占用CPU高解决办法:1
在iis管理器里面设置多个应用程序池,并把虚拟主机站点分别加入应用程序池。在多应用程序池的情况下,每个应用程序池会对应生成一个w3wp.exe文件。通过任务管理器可以查看到所有 w3wp.exe占用cpu利用率情况。通过iis管理器打开应用程序池,可以逐个停掉应用程序池。一边停应用程序池,一边在任务管理器里面观察哪个w3wp.exe的cpu利用率一下子从降下来,cpu利用率恢复正常。这样可以判断是那个应用程序池出了问题。然后可以再建立多个应用程序池,每一个应用程序池对应一个站点。这样逐个停应用程序池,就可以确定到底是哪个网站出问题。最有问题的往往是计数器类的ACCESS文件,例如:"**COUNT.MDB" ,"*COUNT.ASP",找到这个文件后,可以删除它,或下载下来,用ACCESS2000修复它,问题就解决。

w3wp.exe占用CPU高解决办法:2
w3wp.exe占用cpu过高查询方法,很多web提供商最头疼的问题,在任务管理器经常出现w3wp.exe占cup过高,导致整台服务器受影响.解决办法如下:
1.先把任务管理器打开,发现那个w3wp.exe占cup关高就结束进程
2.在我的电脑-管理-事件查看器-系统-会找到关w3wp.exe的错误报告,把程序池名字记录下,再把具体时间记录下来.(如12.59.56).
3.在到系统上面的安全性.找到对应时间(如12.59.56)登陆过的用户.
4.打开iis找到2中查询到的程序池,和3查询到登陆过的用户,取交集,即可精确查到是那个web占点站cup.

 

在WINDOWS2003+IIS6下,经常出现w3wp的内存占用不能及时释放,从而导致服务器响应速度很慢。
今天研究了一下,可以做以下配置:
1、在IIS中对每个网站进行单独的应用程序池配置。即互相之间不影响。
2、设置应用程序池的回收时间,默认为1720小时,可以根据情况修改。同时,设置同时运行的w3wp进程数目为1。再设置当内存或者cpu占用超过多少,就自动回收内存

一般来说,这样就可以解决了。但仍然会出现个别网站因为程序问题,不能正确释放。
那么,怎么样才能找到是哪一个网站的?

1、在任务管理器中增加显示pid字段。就可以看到占用内存或者cpu最高的进程pid
2、在命令提示符下运行iisapp -a。注意,第一次运行,会提示没有js支持,点击确定。然后再次运行就可以了。这样就可以看到pid对应的应用程序池
3、到iis中察看该应用程序池对应的网站,就ok了。

问:我的具体情况是这样的:
服务器配置 至强2.8G 内存512M SCSI硬盘 2块 (软镜像)
系统 windows 2003
现在挂了一个asp.net开发的网站 访问量不大 但是出现一个 问题就是
每当服务器运行2-3天后 访问网站就特别慢 重启动服务器后就 正常了
查看进程使用内存的情况 发现w3wp.exe 和sqlservr.exe 进程 占用内存
相当大 达到了170多M( 每个) 物理可用内存几乎用光
(服务器重启动时 占用的内存很小才40多M 每个)
以前网站挂在一个虚拟机上 数据库是分开挂的 从没出现这种情况
后来 原版移植到新服务器上就 出现这样的问题~~
还个一问题就是 我在SQL企业管理器中查看SQL进程 发现有很多是 。net 引起的进程是sleeping 但是却占用了内存~ 无法释放

搞了很久了 一直都没解决
求救~~请高手 指教~~ 万分感谢~~~~~

答:IIS服务管理器—-》应用程序池—-》添加你的应用,并设置最大内存,当程序达到最大内存后其会自动重启。

我的问题跟你一样,不过我的内存是2G的,访问量比较高,一般是差不多运行24小时后就得重启,内存没耗完,W3WP进程占到一百八九十兆,SQL占了二百多兆时,就得重启,不然整个站点就当在那边….55555555,搞了快半个月了还是不行,痛苦啊

w3wp.exe就是你的ASP.NET应用宿主,如果你使用了大量的Session、Cache等资源,并且Session超市时间很长,那么内存占用量就比较大。应用池是为增加性能而设的一个特性,但是也消耗很大的内存。另外关掉Windows Server 2003里的大多数Service(那个不用都可以关掉),也可以节省一部分内存

 

1.怀疑在程序中应用的CACHE,
2.CACHE中有大量的数据
3.频繁刷新CACHE
4.没有设计好CACHE的方式

你的问题我以前也遇见过,我以前是用的Session,后我全部改成cook之后就好多了,应该是你的Session或是你的CACHE有问题(CACHE不太懂,但多多少应该是有的)

跟踪下SQL的调用记录,在每次往CACHE或SESSION写入大量数据时记录一下时间,看是否太过频繁

1.在win2003里asp.net的进程就是w3wp.exe

2.512M内存个人用是够用了,但是放在服务器上就有点不够用了,尤其是win2003 + asp.net +sql server 。尤其是sql server 他是很吃内存的,如果不控制的话,他会占光所有的物理内存(只剩下几十M 倒 100M 吧)。win2003 本身就要占用150M左右。也就剩不下什么了。

3.优化asp.net程序,就向楼上的说的那样,少用或不用session cache application之类的东西,再有就是是不是有翻页的地方,翻页处理不好也是会占很多内存的。

4.限制sql的内存。企业管理器–SQL的属性(一般是local)–"内存"标签
在这里看内存的设置,把最大值改成100M吧。

第四条是最快的方法,可以试一试。

我的一个自开发OA系统也存在这样的问题。
总结上面,大概原因是因为 session 和 cache 的不合理使用造成的。
我的应用程序中,确实用了很多的Session 和 Cache,
在 MSDN 中找到 了 "动态内存分配"这一篇,今天就试看看,是否有效。
希望有经验的朋友多给些信息,大家也好总结下出现类似错误的原因,谢谢!!

不知道你是什么网站。按理说是不会占用这么大的。如上你用了cache存放了超额的内容。当然。象session这种是不太可能占用这么大的了,或用了application 类似的一些

有超长时间或永久保持性的对象来保存大量数据。如利用单例保存数据这些都有可能造成使用大量的内存。

 

建义2003系统安装至少1G内存。

w3wp.exe是2003下的一个iis进程,至于楼主说的sql占用内存,那有可能是因为你的sql没有设置占用内存上限

在IIS6下,经常出现w3wp.exe的内存及CPU占用不能及时释放,从而导致服务器响应速度很慢。

解决内存占用过多,可以做以下配置:
1、在IIS中对每个网站进行单独的应用程序池配置。即互相之间不影响。
2、设置应用程序池的回收时间,默认为1720小时,可以根据情况修改。再设置当内存占用超过多少(如500M),就自动回收内存。

解决CPU占用过多:
1、在IIS中对每个网站进行单独的应用程序池配置。即互相之间不影响。
2、设置应用程序池的CPU监视,不超过25%(服务器为4CPU),每分钟刷新,超过限制时关闭。

根据w3wp取得是那个一个应用程序池:
1、在任务管理器中增加显示pid字段。就可以看到占用内存或者cpu最高的进程pid
2、在命令提示符下运行iisapp -a。注意,第一次运行,会提示没有js支持,点击确定。然后再次运行就可以了。这样就可以看到pid对应的应用程序池。(iisapp实际上是存放在 C:\windows\system32目录下的一个VBS脚本,全名为iisapp.vbs,如果你和我一样,也禁止了Vbs默认关联程序,那么就需要手动到该目录,先择打开方式,然后选"Microsoft (r) Windows Based Script Host"来执行,就可以得到PID与应用程序池的对应关系。)
3、到iis中察看该应用程序池对应的网站,就ok了,做出上面的内存或CPU方面的限制,或检查程序有无死循环之类的问题。

 

IT技术, 互联网 , ,

流行wiki源代码下载

2009年8月22日
流行wiki源代码下载已关闭评论

流行wiki源代码下载

 

Wiki 是一个协同著作平台或称开放编辑系统。所谓协同工作, 即它能够让浏览网页的人都能够去修订网页,其简介的 … Wiki 是什么做到的. Wiki 使用 了简化的语法,替代复杂的HTML,加上WEB 界面的编辑工具,降低内容维护的门槛; ……
相信很多的站长都需要WiKi,我们可以用Wiki来建设帮助系统,知识系统,松散的讨论平台,甚至收藏夹……
在这里我推荐几款常用的WiKi程序。

一,ASP的WiKi程序。ASP的WiKi程序总体感觉比较弱,或许开放的WiKi更喜欢开放的linux系统吧
1,Operator Wiki 0.3
语言环境:ASP+ACCESS
官方主页:http://cosoft.org.cn/projects/operatorwiki/
演示:http://my.yeew.net/maxzone/operatorwiki/wiki.asp
下载:http://down2.codepub.com/codepubcom/2006/4/8/operatorwik03.rar
介绍:免费开源的国产WIKI程序,ASP+JavaScript写的 Wiki 引擎,支持多语言、ACL,综合各种 WIKI 的功能。
Operator Wiki 升级日志0.3:.
* 完整的用户权限
* 支持 ACL 进行权限控制
* 修正了一个标记冲突问题
* 修正登录问题
* 源代码采用 Tab 代替空格,进一步缩小体积,主程序目前仅 34.4K
* 更良好的多语言支持
* 修正数十个关于表格和列表的问题

2,OpenWiki 中文版Build20060328
语言环境:ASP+ACCESS/SQLServer
官方主页:http://www.openwiki.com/
演示:http://www.3d-gis.com/yow/
下载:http://down2.codepub.com/codepubcom/2006/4/8/openviki_yow.rar
介绍:国外的一个ASP Wiki程序,3d-gis汉化。

3,JsWiki – 开源ASP WIKI程序
语言环境:ASP
官方主页:http://sourceforge.net/projects/jswiki/
演示:http://www.jswiki.com/
下载:http://down2.codepub.com/codepubcom/2006/2/10/jswiki.rar
介绍:安装只需要一个文件jswiki.asp
使用javascript写成,能够运行于任何一台支持asp的windows主机
支持丰富而方便的text语法(混合了标准wiki/textile/markdown三种常见文法)
支持页面的历史记录和版本差异
支持页面锁定和保密
支持RSS输出最近更新内容
提供InterWiki?链接
使用宏提供额外的功能和扩展

二,CGI的WiKi程序
4,TWiki Release 4.0.2
语言环境:Perl
官方主页:http://twiki.org/
演示:http://www.stlchina.org/twiki/bin/view.pl/TWiki/TWikiQickStart
下载:http://down2.codepub.com/codepubcom/2006/4/8/TWiki-4.0.2.tgz
介绍:TWiki是一个开源(GPL)的wiki程序。软件定位为"灵活、强大、易于使用的企业协作平台",运行于Perl环境。
TWiki从2001年开始开发,大约每年发行一个重要版本。最新稳定版本为2004-9-4版本,最新beta版本为2006-01-31版本。
Twiki被很多大型商业公司采用,例如Yahoo、SAP、Motorola、Wind River等。
TWiki的官方站点内容有版权,TWiki名称是Peter Thoeny所有的一个注册商标,内容的贡献属于Peter Thoeny和其贡献者共同所有。
TWiki的特点:
TWiki是一个功能完善的wiki系统
专注于为网站赋予结构,所有页面自动归为TWiki Web,这样就很容易创建协作小组。拥有编程技能的人可以使用变量创建动态页面,譬如内容表格,或者嵌入式搜索结果的页面。
易于定制和扩展
允许页面编辑:Darkar版本已经支持所见即所得编辑。
访问控制: 细化的授权机制让管理员可以限制不同部门的读写访问权限。
TWIKI完全是一个不需要任何数据库,完全基于文件目录的格式化引擎。

5,UseModWiki Version 1.0
语言环境:Perl
官方主页:http://www.usemod.com/
演示:http://www.usemod.com/
下载:http://down2.codepub.com/codepubcom/2006/4/8/usemod10.tar.gz
介绍:September 12, 2003: Version 1.0,官方应该是停止了更新!
UseModWiki(Usenet Moderation Project (Usemod))是Clifford Adams 所开发的维基引擎,它采用Perl做为开发的程序语言,它最大的特点是不使用任何的数据库管理系统来储存页面内容,任何的新增页面都直接储存于档案系统内,维基百科曾采用UseModWiki做为所有语言版本的维基引擎,之后才自行开发MediaWiki做为现有的接口。
功能特色:
采取单一档案就可以运作
不需要任何的延伸扩充程序
所有的变量都直接撰写于程序码中
页面直接储存于档案系统
采用CamelCase的连结样式
可透过对照表而修改其显示语言

三,PHP的WiKi程序
6,MediaWiki 1.6.2 -应用最广的WiKi程序
语言环境:PHP+MySQL
官方主页:http://www.mediawiki.org/
演示:http://www.mediawiki.org/
下载:http://down2.codepub.com/codepub … iawiki-1.6.2.tar.gz
介绍:MediaWiki全球最著名的开源wiki引擎,运行于PHP+MySQL环境。从2002年2月25日被作为维基百科全书的系统软件,并有大量其他应用实例。目前MediaWiki的开发得到维基媒体基金会的支持。
wiki的重要特征
记录所有的改动版本,能方便的查阅历史更新记录,这使得开放性编辑成为可能
自动产生链接,编辑文本中中括号中的内容(如"[[X条目]]")将自动产生链接
允许使用模板,方便对相同内容的重复使用、更新
支持分类,并根据分类在不同的文章之间自动产生关联
允许每个用户自行选择系统外观
中文支持好

7,Tikiwiki v1.9.2 多国语言版 – 又是WiKi又是CMS系统
语言环境:PHP+MySQL
官方主页:http://tikiwiki.org/
演示:http://tikiwiki.org/
下载:http://down2.codepub.com/codepubcom/2005/11/13/tikiwiki192.rar
介绍:非常优秀的网站内容管理系统,基于 PHP+ADOdb+Smarty等技术构建,功能非常齐全,主要特点:
1、有文章、论坛、分类目录、blog、图库、文件下载、在线调查、Wiki等功能。
2、用户权限管理很棒,可以设置启用哪些功能,设置哪些用户使用哪些功能。
3、管理后台和用户界面合在一起,通过用户权限控制界面的显示。
4、界面被分割成上中下、左中右区域,非常结构化。
5、有很多实用模块,如菜单、登录、搜索、在线调查、最新发表文章等等,可以灵活定制显示在左右界面区域。
6、界面很简洁,有很多界面模版来换肤。

8,CooCooWakka v0.09 rc3 – 国人开放的PHP WiKi程序
语言环境:PHP+MySQL
官方主页:http://coo.hsfz.net/wiki/
演示:[url]http://coo.hsfz.net/wiki/[/[/url]
下载:http://down2.codepub.com/codepubcom/2006/3/6/CooCooWakka.tar.gz
介绍:文wiki引擎程序,2004年被很多网站采用。2004年9月后基本停止更新,一直到2005年7月发布v0.0.9rc1,2006年2开发布0.0.9rc3,支持PHP5。
CooCooWakka是咕咕基于WakkaWiki 0.1.2进行修改强化而来。
CooCooWakka从2003年开始开发,作为CooYip的一项业余小爱好,现在CooCooWakka也是cosoft.org.cn和sourceforge.net的开源项目。至今(2005年4月)已经发布了8个主要的版本。
CooCooWakka是一种重于合作的超文本编辑环境(基于PHP+MYSQL的Wiki引擎),简单来说,对于基于CooCooWakka的网站,任何人(包括你!)都可以在线编辑他的几乎(视管理员意愿)任何页面。CooCooWakka可以用于共笔系统、读书会、档案开发、写书、翻译、资料整理(例如课堂笔记、软件使用资料)、常见问题整理等等。由于其使用及扩展方面的快速便捷,CooCooWakka甚至可以作为小型的CMS系统。
CooCooWakka并不是Wakka的汉化版本,有60%~70%的Wakka代码被CooCooWakka修改或重写(有兴趣的可以比较一下,如果希望得到相似于WakkaWiki?的版本请下载0.0.2,0.0.3,当时自己对修改还是比较节制)。许多特性和策略也有所改变,所以CooCooWakka的站点不支持回复到WakkaWiki?。此外,CooCooWakka也对整个WakkaWiki?进行了多国语言支持化,所以,CooCooWakka支持多种语言–现在提供中文(包括简体gb,utf-8,繁体big5,utf-8),英语的支持。此外,CooCooWakka会自动针对中文站进行功能上的优化。
咕咕最早是在参与WikiPedia的时候对Wiki产生兴趣的,后来也不知道有什么WIKI好用,结果就下载了WAKKA 0.1.2.
结果这套引擎刚好不像MoinMoin或者Tavi那样支持中文,自己就从改charset开始,对wakka0.1.2开始建立自己觉得好用的WikiEngine.本来是本着改给自己用的想法去做CooCooWakka的,结果发现有很多朋友都希望拿份代码试试,也提供了很多好意见,就这样CooCooWakka发展到了现在.(更多历史:History)

9,PhpWiki 1.3.12p2 released
语言环境:PHP
官方主页:http://phpwiki.sourceforge.net/
演示:http://phpwiki.sourceforge.net/p … 83fc0492e961639b13f
下载:http://down2.codepub.com/codepub … iki-1.3.12p2.tar.gz
介绍:无需数据库的小巧WIKI程序。架设简单,权限控制、插件扩展都不错。

10,PmWiki 2.1.5
语言环境:PHP
官方主页:http://www.pmwiki.org/
演示:http://www.emacs.cn/
下载:http://down2.codepub.com/codepubcom/2006/4/8/pmwiki-latest.tgz
介绍:PmWiki,一款用PHP编写的,无需数据库支持的维基,个人网站尤其适合。
在国内还是有不少的应用,演示站点是linuxsir.org旗下一个站点。

四,JSP的WiKi程序
11,JSPWiki stable release v2.2.33
语言环境:JDK+tomcat
官方主页:http://www.jspwiki.org/
演示:http://www.jspwiki.org/wiki/%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3
下载:http://down2.codepub.com/codepub … wiki-2.2.33-src.rar
介绍:JSPWiki是一个不错的wiki引擎,纯jsp/servlet写的。JSPWiki不使用现成的数据库管理软件,所有的文件以文本文件的形式存放。它利用类似CVS的机制保证了文件版本的完整性。支持中文,支持版本比较、权限管理等功能!

五,其他WiKi程序
12,MoinMoin 1.5.3 RC1
语言环境:Python
官方主页:http://www.wikiwikiweb.de/
演示:http://www.wikiwikiweb.de/
下载:http://down2.codepub.com/codepub … in-1.5.3-rc1.tar.gz
介绍:MoinMoin是一个基于Python环境的wiki引擎程序,支持包括中文在内的多语种特性。
MoinMoin程序是遵循GNU GPL的开源项目,启动于2000年7月20日,最初由JürgenHermann撰写。最近的版本为2006年02月05日发布的1.5.2,最高稳定版本为1.3.5,并一直保持正常的更新。
MoinMoin可运行在Windows、Linux/BSD/UNIX、OS X等环境下。目前能够处理英文、德文、繁简体中文、日文、俄文等约20种语言。
MoinMoin的特点:
完全使用文件来存储内容,不使用数据库
实现了全部Wiki规范,Unicode编码支持多语种
完整实用的wiki文本约定,编辑规则比较轻巧易学
支持多种扩展方式: 宏,插件,预处理……
为数众多的插件中包括Tex科技文本输入、FreeMind思维图谱、GraphViz示意图、gnuplot数据图表绘图等
支持几种很实用的不同页面样式
真正跨平台

13,TiddlyWiki 2.0.7
语言环境:CSS+HTML+javascript
官方主页:http://www.osmosoft.com/
演示:http://www.tiddlywiki.com/
下载:http://down2.codepub.com/codepubcom/2006/4/8/ptw-2.0.7.rar
介绍:TiddlyWiki是个非常小巧酷炫的wiki引擎,全部程序只是一个一百多K的HTML页面。TiddlyWiki用CSS+HTML+javascript写成,可以在多种浏览器上使用。
TiddlyWiki页面所有元素都能都订制,能方便地修改页面结构和CSS表现形式.它不需要任何服务器端的脚本支持,你要你的电脑有有浏览器就能运行.非常适合放在U盘里到处带着走的个人做记事本。
TiddlyWiki无法真正将数据存储到服务器上,因而不能用来进行团体协作。
一个使用过的用户这样评价TiddlyWiki :
TiddlyWiki是一个好玩的Wiki记事本,支持Tag,支持丰富的Wiki语法,还支持搜索,用来做记事本不错。
它很简单,界面简单、操作简单,文件简单到所有内容只有一个html文件,所以最适合随身携带,想起了以前U盘携带的Wordpress,再带上TiddlyWiki就齐了。
TiddlyWiki是一个纯Javascript操作的Wiki,所有动作都是AJAX的,感觉好酷,要研究AJAX的又多了一个对象。

 

IT技术, 互联网 ,

解决wordpress中文tags显示错误

2009年8月15日
解决wordpress中文tags显示错误已关闭评论

 

解决wordpress中文tags显示错误

好不容易修改了permalink,在检查的时候却发现Tags出现了问题,字母的可以访问,google了一下发现WordPress的中文支持有问题,特别是在使用Permalink的时候。我也想原创文章,可是在这种时候,只能是留个记号,以便以后查询 了。
本文将分析其中的原因和网上流传的多种解决方案,并给出一个具体的解决结论。
这个问题主要表现为,在默认情况下,Wordpress对于形如这样的链接(链接1):

www.example.com/tag/中文

不能正常访问,会产生404或500错误,或者其他的错误。
而对于这样的链接(链接2):

www.example.com/?tag=中文

WordPress就能够正确解析。
原因:这是URL编码问题造成的。对于上面的链接1,这是一个PathInfo,对于链接2,这是一个QueryString。事实证明,对于UTF-8 的页面,IE和FF都会正确发送PathInfo和QueryString(而不像有些文章中说的,他们在不同的设置下会有错误的反应),但服务器 端,IIS会将PathInfo转换成GBK编码从而造成错误,于是Windows下的此类问题只需要转回来就行了;但是Linux下,Apache不支 持中文PathInfo,要么对Apache进行改造,要么只能像我一样,Linux主机无法使用中文permalink。于是,我们只能寻找绕路的方 法。

解决方案分析:
一、转换编码
原理是,IIS会将PathInfo中的UTF-8转换成GBK,而QueryString中就不会转换,故而为了使用Permalink,采用以下方法:
打开wp-includes/classes.php文件,

if ( isset($_SERVER['PATH_INFO']) )
  $pathinfo = $_SERVER['PATH_INFO'];
else
  $pathinfo = '';
$pathinfo_array = explode('?', $pathinfo);
$pathinfo = str_replace("%", "%25", $pathinfo_array[0]);
$req_uri = $_SERVER['REQUEST_URI'];

改为

if ( isset($_SERVER['PATH_INFO']) )
  $pathinfo = mb_convert_encoding($_SERVER['PATH_INFO'], "UTF-8", "GBK");
else
  $pathinfo = '';
$pathinfo_array = explode('?', $pathinfo);
$pathinfo = str_replace("%", "%25", $pathinfo_array[0]);
$req_uri = mb_convert_encoding($_SERVER['REQUEST_URI'], "UTF-8", "GBK");

局限:只对Windows主机、且必须是Windows下的IIS主机有效。

mb_convert_encoding的使用需打开 php_mbstring.dll,相关说明详见

PHP的内码转换函数 mb_convert_encoding()

 

二、修改wp-includes/rewrite.php
这是网上最常见的方法,原理是,让WordPress在对其他内容使用Permalink的时候,对tag不使用,而使用链接2的QueryString模式发送中文编码:

 function get_tag_permastruct() {
if (isset($this->tag_structure)) {
return $this->tag_structure;
}
if (empty($this->permalink_structure)) { //-----this line need change------
$this->tag_structure = '';
return false;
}

把第5行改为

if (!empty($this->permalink_structure)) {

局限:没有起到Permalink的“漂亮”作用,如果不能自己修改WP的文件就没办法了。

三、修改tag base
原理同上,只要让WordPress在打开了Permalink功能后继续对tag不理不问就行了。那么,欺骗WordPress,让它用链接2的格式来显示Permalink,可行么?可行,因为WordPress可以自定义Permalink的形式:
在WordPress的 Settings – Permalinks – Tag base 中填上
/?tag=
注意””不能少,引用原文中的写法不对。另外要注意每次输入””,WP都会再次转义为”\”,所以每次点提交都会把””翻一倍,点两次就是”\\”,所以不要多点,一次就对了。
这个方法的结果是使得链接变成这个样子

www.example.com/?tag=/中文/

多出来的斜杠对于服务器丝毫没有影响,还是被视为QueryString,效果同上。
局限是链接变得更加不好看了,更为致命的是插件生成的Sitemap中,tag链接会变成错误的形式,如果你很在乎Sitemap,请不要使用这个方法,除非你真的无法修改自己的rewrite.php文件。

但是当你使用WP-SuperCache或者类似的缓存插件时,它会加入自己的rewrite规则,所有请求先由自己判断,不在缓存中或者不符合缓 存规则才交由WordPress处理。但问题在于,它不支持中文URL的解析,哪怕是QueryString也不行。于是我们必须绕过它。
这是WP-SuperCache在.htaccess文件里所添加的rewrite规则

RewriteEngine On
RewriteBase /
 
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} !.*s=.*
RewriteCond %{QUERY_STRING} !.*p=.*
RewriteCond %{QUERY_STRING} !.*attachment_id=.*
RewriteCond %{QUERY_STRING} !.*wp-subscription-manager=.*
RewriteCond %{HTTP_COOKIE} !^.*(comment_author_|wordpress|wp-postpass_).*$
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz [L]
 
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} !.*s=.*
RewriteCond %{QUERY_STRING} !.*p=.*
RewriteCond %{QUERY_STRING} !.*wp-subscription-manager=.*
RewriteCond %{QUERY_STRING} !.*attachment_id=.*
RewriteCond %{HTTP_COOKIE} !^.*(comment_author_|wordpress|wp-postpass_).*$
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html [L]

我们要做的就是不让它去判断中文tag链接,在两个 RewriteCond %{REQUEST_METHOD} !=POST 后面分别加入这样一句:

RewriteCond %{QUERY_STRING} !.*tag=.*

含义是如果QueryString中含有tag字样,请不要解析(交给下一条规则,一般来说就是WordPress的index.php了)。

结论:
Windows+IIS主机下,通过方案一可以完美解决中文tag问题
Linux+Apache主机下,不能使用中文Permalink,除非修改Apache,否则只有用方案二和方案三绕行。
方案二是较为推荐的方法,但是搭配WP-SuperCache使用的时候,需要自己在.htaccess文件中加入一条不处理tag链接的规则。

IT技术, 互联网技术 , , ,

WordPress在IIS ISAPI ReWrite下的URL规则

2009年8月9日
WordPress在IIS ISAPI ReWrite下的URL规则已关闭评论

WordPress在IIS ISAPI ReWrite下的URL规则

为了便于搜索引擎抓取,俺们可以将wordpress进行静态URL重写、下面是URL ReWrite的规则!

下面是Rewrite规则:(请将下面代码内容存到httpd.ini,然后选择一种”红色部份”的内容填写到wordpress后台自定链接的”永远链接”)

[ISAPI_Rewrite]

# 3600 = 1 hour
CacheClockRate 3600

RepeatLimit 32

# Protect httpd.ini and httpd.parse.errors files
# from accessing through HTTP
# # /%year%%monthnum%%day%/%postname%/

RewriteRule /tag/(.*) /index\.php\?tag=$1

RewriteRule /(about|copyright|leebolin)/ /index\.php\?pagename=$1

RewriteRule /category/(.*)/(feed|rdf|rss|rss2|atom)/?$ /wp-feed\.php\?category_name=$1&feed=$2

RewriteRule /category/?(.*)/ /index\.php\?category_name=$1

RewriteRule /author/(.*)/(feed|rdf|rss|rss2|atom)/?$ /wp-feed\.php\?author_name=$1&feed=$2

RewriteRule /author/?(.*) /index\.php\?author_name=$1

RewriteRule /feed/?$ /wp-feed\.php/\?feed=rss2

RewriteRule /comments/feed/?$ /wp-feed\.php/\?feed=comments-rss2

RewriteRule /page/(.*)/ /index\.php\?paged=$1

RewriteRule /([0-9]{4})([0-9]{1,2})([0-9]{1,2})/([^/]+)/?([0-9]+)?/?$ /index\.php\?year=$1&monthnum=$2&day=$3&name=$4&page=$5

RewriteRule /([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/?$ /index\.php\?year=$1&monthnum=$2&day=$3&page=$4

RewriteRule /([0-9]{4})/([0-9]{1,2})/?$ /index\.php\?year=$1&monthnum=$2&page=$3

RewriteRule /([0-9]{4})([0-9]{1,2})([0-9]{1,2})/([^/]+)/(feed|rdf|rss|rss2|atom)/?$ /index\.php\?year=$1&monthnum=$2&day=$3&name=$4&feed=$5

RewriteRule /([0-9]{4})([0-9]{1,2})([0-9]{1,2})/([^/]+)/trackback/?$ /wp-trackback\.php\?year=$1&monthnum=$2&day=$3&name=$4&tb=1

# # WordPress rewrite rules#
# /Html/%post_id%.html

RewriteRule /tag/(.*) /index\.php\?tag=$1

RewriteRule /(about|copyright|leebolin|favor|archives)/ /index\.php\?

pagename=$1

RewriteRule /Html/category/(.*)/(feed|rdf|rss|rss2|atom)/?$ /wp-feed\.php\?

category_name=$1&feed=$2

RewriteRule /Html/category/?(.*)/ /index\.php\?category_name=$1

RewriteRule /author/(.*)/(feed|rdf|rss|rss2|atom)/?$ /wp-feed\.php\?

author_name=$1&feed=$2

RewriteRule /author/?(.*) /index\.php\?author_name=$1

RewriteRule /rss.xml /wp-feed\.php/\?feed=rss2

RewriteRule /feed/?$ /wp-feed\.php/\?feed=rss2

RewriteRule /comments/feed/?$ /wp-feed\.php/\?feed=comments-rss2

RewriteRule /([0-9]+)/?([0-9]+)?/?$ /index\.php\?p=$1&page=$2

# RewriteRule /Html/([0-9]+)/?([0-9]+)?/?$ /index\.php\?p=$1&page=$2

RewriteRule /Html/([0-9]+).html /index\.php\?p=$1

RewriteRule /page/(.*)/ /index\.php\?paged=$1

RewriteRule /Html/date/([0-9]{4})([0-9]{1,2})([0-9]{1,2})/([^/]+)/?([0-9]+)?/?$

/index\.php\?year=$1&monthnum=$2&day=$3&name=$4&page=$5

RewriteRule /Html/date/([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/?$ /index\.php\?

year=$1&monthnum=$2&day=$3&page=$4

RewriteRule /Html/date/([0-9]{4})/([0-9]{1,2})/?$ /index\.php\?

year=$1&monthnum=$2&page=$3

RewriteRule /Html/([0-9]+).html/(feed|rdf|rss|rss2|atom)/ /index\.php\?

feed=rss2&p=$1

RewriteRule /Html/([0-9]+).html/trackback/ /wp-trackback\.php\?p=$1

RewriteRule /photo/?([^/]*)?/?([^/]*)?/?([^/]*)?/?([^/]*)?/?$ /wp-

content/plugins/fgallery/fim_photos\.php\?$1=$2&$3=$4 [QSA,L,I]

RewriteRule /photo/?(.*) /wp-content/plugins/fgallery/fim_photos\.php\?

$1=$2&$3=$4 [QSA,L,I]

RewriteRule ^(.*)/archiver/([a-z0-9\-]+\.html)$ $1/archiver/index\.php\?$2
RewriteRule ^(.*)/forum-([0-9]+)-([0-9]+)\.html$ $1/forumdisplay\.php\?

fid=$2&page=$3
RewriteRule ^(.*)/thread-([0-9]+)-([0-9]+)-([0-9]+)\.html$ $1/viewthread\.php\?

tid=$2&extra=page\%3D$4&page=$3
RewriteRule ^(.*)/profile-(username|uid)-(.+)\.html$ $1/viewpro\.php\?$2=$3
RewriteCond Host: (.+)
RewriteCond Referer: (?!http://\1.*).*
RewriteRule .*\.(?:gif|jpg|png) /block.gif

RewriteRule ^(.*)/archiver/([a-z0-9\-]+\.html)$ $1/archiver/index\.php\?$2
RewriteRule ^(.*)/forum-([0-9]+)-([0-9]+)\.html$ $1/forumdisplay\.php\?fid=$2&page=$3
RewriteRule ^(.*)/thread-([0-9]+)-([0-9]+)-([0-9]+)\.html$ $1/viewthread\.php\?tid=$2&extra=page\%3D$4&page=$3
RewriteRule ^(.*)/profile-(username|uid)-(.+)\.html$ $1/viewpro\.php\?$2=$3
RewriteCond Host: (.+)
RewriteCond Referer: (?!http://\1.*).*
RewriteRule .*\.(?:gif|jpg|png) /block.gif

IT技术, 互联网技术 ,

WordPress中最好的永久链接结构

2009年8月9日
WordPress中最好的永久链接结构已关闭评论

WordPress中最好的永久链接结构

wordpress中默认的永久链接结构就像是这个样子:http://blogname.com/?p=123,123就代表日志的内部ID,这种结构对于SEO来说是极不有好的,既不利于网站PR也影响你在搜索引擎中的排名。至少你的永久链接中都应该含有日志的关键词。真搞不懂为什么wordpress会拿这样的连接形式作为默认链接~

一个漂亮的链接形式应该像这个样子:http://blogname.com/2008/11/05/some-post-name。现在问题就在于哪种永久链接形式是最好的。一点可以肯定日志名必须包含在结构中。下面是wordpress中常用的几种形式:

日期和名称:链接结构为“年、月、日和日志名称”,这种结构能够帮助读者了解文章发表的详细时间,也可以让作者方便查找日志。

月份和名称:和第一种结构类似,只是少了年份和具体日期显示。

分类和名称:就象这样 http://blogname.com/some-category/some-post-name,这种结构有许多博客采用,据说分类名与日志名相关有利于SEO。

日志名称:例如这样 blogname.com/some-post-name,这种方式也有很大一部分博客使用。

WordPress中还提供了一些代码,让你自定义永久链接的结构,下面是可以使用的代码:

%year% – 显示年份,四位数,如 2008

%monthnum% – 显示月份,如 05

%day% – 显示月份中的日期,如 28

%hour% – 显示小时数,如 15

%minute% – 显示分钟数,如 43

%second% – 显示秒数,如 33

%postname% – 显示日志名称,例如“This Is A Great Post!”转换为this-is-a-great-post,中文名将变成怪怪的符号…

%post_id% – 显示日志独一无二的ID,如 213

%category% – 显示分类名称

%author% – 显示作者名称

你可以使用”-“ 或者 ”/“把这些代码连接起来,例如:/%category%/%postname%-%post_id%/

对于我个人来说,我更倾向于仅显示日志名称和日志ID的结构,为什么?因为你的日志名称会包含有关键字,因此,如果URL很短的话,那些关键词的密 度将非常之高(这一概念称之为关键词相对权重),我并不鼓励仅仅使用日志名称的永久链接(尽管现在我是这样做的),因为这样可能在你重新定义规则时,出现 访问不到网页的情况。另外也有可能出现两篇文章URL相同的情况,尽管你自己没意识到~

另外如果你要重新定义你的永久链接结构,可以使用插件permalinks migration,帮助你方便地重新定向永久链接。

IT技术, 互联网技术 , ,

学习无比强大的query_posts()

2009年7月31日
学习无比强大的query_posts()已关闭评论

wordpress的query_posts()之强大,让人叹为观止!学习学习!

将query_posts()放在LOOP之前限定你所需要的条件,wp_query会产生一个新的使用你的参数的SQL查询,而忽视通过URL接收到的其它的参数,如果想不忽略,可以在调用中使用$query_string。

query_posts($query_string . "&order=ASC")

设置文章显示的顺序,但是不干扰其余的查询字符串,参数前须有“&”符号

其他还有N多用途~~

  • 主页不显示某一分类下的日志
<?php
   if (is_home()) {
      query_posts("cat=-3");
   }
?>
  • 获得特定的日志
<?php
query_posts('p=5');
?>
  • 获得特定的页面
<?php
query_posts('page_id=7');      //只返回网页7
?>
<?php
query_posts('pagename=about'); //只返回关于网页
?>
<?php
query_posts('pagename=parent/child'); // 返回母网页的子网页
?>
  • 创建所有文章列表,并且提供分页功能
<?php
query_posts($query_string.'posts_per_page=-1');
while(have_posts()) { the_post();
put your loop here ;
}
?>

==========================================

类别 参数

显示属于某个类别的文章

  • cat
  • category_name

根据ID显示一个类别

只显示来自一个类别ID的文章

query_posts('cat=4');

根据名称显示一个类别

只显示属于某个类别名的文章

query_posts('category_name=Staff Home');

显示几个类别及ID

显示属于几个类别ID的文章

query_posts('cat=2,6,17,38');

删除某个类别的文章

显示所有的文章,但是类别ID前面有个’-‘(负号)负号的类被除外。

query_posts('cat=-3');

删除属于类别3的所有文章。有一个限制性条款:会删除只属于类别3的所有文章。如果一个类别也同时属于其它的类别,这个类别仍然不会被删除。

标签参数

显示与某个标签相关的文章

  • tag

为某个标签提取文章

query_posts('tag=cooking');

获得拥有任何这样的标签的文章

query_posts('tag=bread,baking');

获取拥有这三个标签的文章

query_posts('tag=bread+baking+recipe');

作者参数

你也可以根据作者限制文章数目

  • author_name=Harriet
  • author=3

author_name在 user_nicename区操作, 同时作者 在作者id上操作。

文章 & 网页参数

返回一篇单独的文章或者一个单独的网页

  • p=1 – 使用文章 ID来显示第一篇文章
  • name=first-post – 使用 post Slug 显示第一篇文章
  • page_id=7
  • pagename=about
  • showposts=1 (你可以使用 showposts=3,或者其它的任何数字显示一定数目的文章)

由于 模板层级方面的原因, home.php先执行了。这意味这你可以编写一个home.php,home.phh调用query_posts()重新得到一个特别的网页并且将那个网页设置为你的首页。没有任何插件或者hacks,你需要运行一个机制,并且显示和维护一个非博客的首页。

更有用的方法,可能是利用WP的网页功能并且为你的首页使用这个功能。你可以将”关于网页”设置为entry point或者设置为站点的末页。你可能执行一些更动态的步骤,设置一个自定义网页,显示最近的评论,文章,类别,存档。请看看下面的例子。

时间参数

得到某个特别的时间段内发表的文章

  • hour=
  • minute=
  • second=
  • day= – 一个月中的每一天; 显示,例如,十五号发表的所有文章。
  • monthnum=
  • year=

网页参数

  • paged=2 -显示使用”以前发表的文章”链接时,通常在网页2上显示的文章。
  • posts_per_page=10 -每个网页显示的文章数目;-1这个值,会显示所有的文章。
  • order=ASC -按时间顺序显示文章,以相反的顺序显示DESC(默认)

Offset 参数

你不能转移或者忽视一个或者更多的原始文章,这些文章一般是你的query同时使用offset参数收集到的。

下面的函数会显示(1)最近的5篇文章

query_posts('showposts=5&offset=1');

根据参数排序

根据这个区给得到的文章排序

  • orderby=author
  • orderby=date
  • orderby=category
  • orderby=title
  • orderby=modified
  • orderby=modified
  • orderby=menu_order
  • orderby=parent
  • orderby=ID
  • orderby=rand

同时考虑”ASC”或者的”DESC”的排序参数

联合参数

你可能从上面的例子中注意到,你使用一个&(&符号)将参数组合在一起,像:

query_posts('cat=3&year=2004');

类别13,关于当前月份显示在主页上的文章:

if (is_home())  {
query_posts ($query_string . '&cat=13&monthnum=' . date('n',current_time('timestamp'))); }

在2.3版本中,这个参数组合会返回属于类别1同时属于类别3的文章,只显示两篇(2)文章,根据标题,按降序排列:

query_posts(array('category__and'=>array(1,3),'showposts'=>2,'orderby'=>title,'order'=>DESC));

在2.3和2.5版本中,你可能期待下面的内容,返回属于类别1并且标签为”苹果”的所有文章

query_posts('cat=1&tag=apples');

一个bug阻止这个运行。请看看Ticket #5433,一个工作区要搜索几个使用+的标签

query_posts('cat=1&tag=apples+apples');

对于先前的查询,这个会产生期待的结果。注意使用’cat=1&tag=apples+oranges’能够产生期待的结果。

资料来源:站长百科 Template Tags/query posts

IT技术, php, 互联网技术 ,

学习制作WordPress主题

2009年7月31日
学习制作WordPress主题已关闭评论

如果你正准备学着制作Wordpress主题,那将非常有用。

WordPress模板基本文件

style.css 样式表文件
index.php 主页文件
single.php 日志单页文件
page.php 页面文件
archvie.php 分类和日期存档页文件
searchform.php 搜索表单文件
search.php 搜索页面文件
comments.php 留言区域文件(包括留言列表和留言框)
404.php 404错误页面
header.php 网页头部文件
sidebar.php 网页侧边栏文件
footer.php 网页底部文件

WordPress Header头部 PHP代码

注: 也就是位于<head>和</head>之间的PHP代码

<?php bloginfo(’name’); ?> 网站标题
<?php wp_title(); ?> 日志或页面标题
<?php bloginfo(’stylesheet_url’); ?> WordPress主题样式表文件style.css的相对地址
<?php bloginfo(’pingback_url’); ?> WordPress博客的Pingback地址
<?php bloginfo(’template_url’); ?> WordPress主题文件的相对地址
<?php bloginfo(’version’); ?> 博客的Wordpress版本
<?php bloginfo(’atom_url’); ?> WordPress博客的Atom地址
<?php bloginfo(’rss2_url’); ?> WordPress博客的RSS2地址
<?php bloginfo(’url’); ?> WordPress博客的绝对地址
<?php bloginfo(’name’); ?> WordPress博客的名称
<?php bloginfo(’html_type’); ?> 网站的HTML版本
<?php bloginfo(’charset’); ?> 网站的字符编码格式

WordPress 主体模板 PHP代码

<?php the_content(); ?> 日志内容
<?php if(have_posts()) : ?> 确认是否有日志
<?php while(have_posts()) : the_post(); ?> 如果有,则显示全部日志
<?php endwhile; ?> 结束PHP函数”while”
<?php endif; ?> 结束PHP函数”if”
<?php get_header(); ?> header.php文件的内容
<?php get_sidebar(); ?> sidebar.php文件的内容
<?php get_footer(); ?> footer.php文件的内容
<?php the_time(’m-d-y’) ?> 显示格式为”02-19-08″的日期
<?php comments_popup_link(); ?> 显示一篇日志的留言链接
<?php the_title(); ?> 显示一篇日志或页面的标题
<?php the_permalink() ?> 显示一篇日志或页面的永久链接/URL地址
<?php the_category(’, ‘) ?> 显示一篇日志或页面的所属分类
<?php the_author(); ?> 显示一篇日志或页面的作者
<?php the_ID(); ?> 显示一篇日志或页面的ID
<?php edit_post_link(); ?> 显示一篇日志或页面的编辑链接
<?php get_links_list(); ?> 显示Blogroll中的链接
<?php comments_template(); ?> comments.php文件的内容
<?php wp_list_pages(); ?> 显示一份博客的页面列表
<?php wp_list_cats(); ?> 显示一份博客的分类列表
<?php next_post_link(’ %link ‘) ?> 下一篇日志的URL地址
<?php previous_post_link(’%link’) ?> 上一篇日志的URL地址
<?php get_calendar(); ?> 调用日历
<?php wp_get_archives() ?> 显示一份博客的日期存档列表
<?php posts_nav_link(); ?> 显示较新日志链接(上一页)和较旧日志链接(下一页)
<?php bloginfo(’description’); ?> 显示博客的描述信息

其它的一些Wordpress模板代码

/%postname%/ 显示博客的自定义永久链接
<?php the_search_query(); ?> 搜索表单的值
<?php _e(’Message’); ?> 打印输出信息
<?php wp_register(); ?> 显示注册链接
<?php wp_loginout(); ?> 显示登入/登出链接
<!–next page–> 在日志或页面中插入分页
<!–more–> 截断日志
<?php wp_meta(); ?> 显示管理员的相关控制信息
<?php timer_stop(1); ?> 显示载入页面的时间
<?php echo get_num_queries(); ?> 显示载入页面查询

介绍完基本的函数代码后,介绍一点高级的使用

指定example.php中的内容只在首页显示

<?php if ( is_home() ) { include ('example.php'); } ?>

为不同分类指定不同的样式表

<?php if ( is_category('15') ) {<link rel="stylesheet" href="<?php bloginfo('template_url'); ?>/cat-15.css"

type="text/css" media="screen" />;

<?php } else { ?>

<link rel="stylesheet" href="<?php bloginfo('stylesheet_url'); ?>"

type="text/css" media="screen" />

<?php } ?>

为不同的分类指定不同的图像

<?php if (is_category('7') ):<img src='<?php bloginfo('template_url');?>/images/cat7.jpg' alt='' />

<?php } elseif (is_category('8') ):

<img src='<?php bloginfo('template_url');?>/images/cat8.jpg' alt='' />

<?php endif; ?>

样式化单篇日志

<div id="post-<?php the_ID();?>">This snippet will assign the post ID to the DIV. For example, if the ID for the post is 8, that line will echo as

<div id=”post-8”></div>. Now you can style that individual post in the CSS as #post-8. Place this code

within the loop.

上一页和下一页链接

<?php next_posts_link('Next Entries »') ?><?php previous_post_link('« Older Entries'); ?>

动态页面链接

<ul><li<?php if(is_home()) { ?><?php } ?>><a href="

<?php bloginfo('home'); ?>">home</a></li>

<?php wp_list_pages('sort_column=menu_order&depth=1&title_li='); ?>

</ul>

This snippet will rst echo the text “home” with a link to the home page. Next, it will echo the WordPress

pages links in a list, in order dened by your settings, excluding the child pages, and excluding a title

header for the list. If one of the pages in the list is active, the link for that page will be assigned the class

“current_page_item”, which can now be styled in your CSS. Place this code in the template les.

动态页面标题

<?phpif (is_home()) { echo bloginfo('name'); } elseif (is_404()) { echo 'WPCandy » 404'; } elseif(is_search()) { echo 'WPCandy » Search Results'; } else { echo 'WPCandy » '; wp_title(''); }

?>

分类日志

<?php query_posts('cat=2&showposts=5'); ?>

CSS样式表头部声明

/*

Theme Name:  WPCandy

Description: Description goes here

Theme URI: http://wpcandy.com/

Version: 2.0

Author: Michael Castilla

Author URI: http://wpcandy.com/

Template: Dene a parent template (optional)

*/

日志循环

The Loop<?php if(have_posts()) : ?><?php while(have_posts()) : the_post(); ?>

// this is the inside of the loop

<?php endwhile; ?>

<?php else : ?>

<?php endif; ?>

标签云(Tag cloud)

<?php wp_tag_cloud('smallest=1&largest=9&'); ?>

页面模板头部声明

<?php/*

Template Name: Gallery

*/

?>

为每个分类指定不同的模板

<?php$post = $wp_query- >post;if ( in_category('3') ) {

include(TEMPLATEPATH . '/cat3.php’);

} elseif ( in_category('4') ) {

include(TEMPLATEPATH . '/cat4.php');

} else {

include(TEMPLATEPATH . '/cat.php');

} ? >

这是来自WPCandy的电子书,由帕兰翻译而成,你可以通过这里下载
转载自 高级Wordpress模板代码帮助手册中文版 | 帕兰映像

IT技术, php, 互联网技术 ,

使用google map的编码问题

2009年7月26日
使用google map的编码问题已关闭评论

周末在给网站的运动场地添加地图,使用的是google map,调试的时候遇到一个奇怪的问题,使用firefox时地图可以正常显示,而使用maxthon、IE浏览时就无法正常显示地图,google、baidu了半天也没找到解决办法,于是在JavaScript添加了alert函数来调试,终于把问题定位到了GBrowserIsCompatible()函数,但还是无法解决。于是又单独建立了一个html文件来调试我写的js,结果是html能正常使用,通过比较两者的差别,把问题锁定在header的<meta http-equiv=“content-type” content=“text/html; charset=gbk” />上,原来我的网页时GBK编码,而google map默认是utf-8,最好的解决办法就是告知浏览器在读取js文件是使用utf-8的编码格式,即引用js文件是添加charset=”utf-8″,如<script src=”http://maps.google.com/maps?file=api&xxxxxxxx” type=”text/javascript” charset=”utf-8″></script>。

告破,大喜!

互联网技术 ,