用C#2.0实现网络蜘蛛(WebSpider)

　2009-05-29 08:31:25　来源：WEB开发网　　　

核心提示： 在getA方法中除了切换到状态0外，其他的状态切换都将已经读入的字符赋给String变量a，用C#2.0实现网络蜘蛛(WebSpider)(9)，如果最后发现变量a中的字符串不可能是后，就将a清空，在中正确的href属性格式有三种情况，这三种情况的主要区别是url两边的符号，并切换到状态0

在getA方法中除了切换到状态0外，其他的状态切换都将已经读入的字符赋给String变量a，如果最后发现变量a中的字符串不可能是后，就将a清空，并切换到状态0后重新读入字符。

在getA方法中使用了一个重要的方法getHref来从中获得href部分。getHref方法的实现如下：

getHref方法的实现

　//　从中获得Href 　　private　String　getHref(string　a) 　　{ 　　try 　　{ 　　string　p　=　@"href\s*=\s*('[^']*'|""[^""]*""|\S+\s+)";　//　获得Href的正则表达式　　MatchCollection　matches　=　Regex.Matches(a,　p, 　　RegexOptions.IgnoreCase　| 　　RegexOptions.ExplicitCapture); 　　foreach　(Match　nextMatch　in　matches) 　　{ 　　return　nextMatch.Value;　//　返回href 　　} 　　return　null; 　　} 　　catch　(Exception　e) 　　{ 　　throw　e; 　　} 　　}