无畏的Java浏览器HmlUnit2.1发布 2008-09-15 20:41

字号:    

一款新的纯Java浏览器发布了,它可以采用更高级的方式来处理web页面.比如说:填写表单,点击超链接,访问页面某个指定元素(element)的属性或值时,不再需要你去用创建基于低级别的TCP/IP或HTTP的request来处理它们.只要调用getPage(url)方法就可以让所的HTML,JavaScript以及AJAX自动进行处理. HtmlUnit最大的亮点就是自动测试web页面,甚至还可以和一些复杂的JavaScript库协同工作.(比如说Google的WebToolkit1.4.60就已经通过测试验证了).某些场合下,还可以用来进行web scraping(注1)或下载网站的内容. HtmlUnit的2.0版本增加了很多新的特性:

·W3C的DOM实现l 

·Java5支持l     

·更好的支持XPathl      

· 增强对不合法的HTML处理能力(特别是抓数据的时候,个人觉得这个比较重要)l      

· 增强对JavaScript的支持

而最新的HtmlUnit2.1版本则主要是改善用户反应的一些性能问题.你可以通过HtmlUnit的官方网站了解更情况,他们期待你的反馈. 下面我们稍微来几个入门例子:

1、最普通的用法:

Java代码public void testHomePage() throws Exception {       final WebClient webClient = new

WebClient();       final HtmlPage page = (HtmlPage) webClient.getPage("http://htmlunit.sourceforge.

net");   assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());     }  public void

testHomePage() throws Exception {

final WebClient webClient = new WebClient();

final HtmlPage page = (HtmlPage) webClient.getPage("http://htmlunit.sourceforge.net");

assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());

}

2、 模拟FireFox2:

Java代码public void testHomePage() throws Exception {       final WebClient webClient =

new WebClient(BrowserVersion.FIREFOX_2);       final HtmlPage page = (HtmlPage) webClient

.getPage("http://htmlunit.sourceforge.net");       assertEquals("HtmlUnit - Welcome to HtmlUnit"

, page.getTitleText());     }  public void testHomePage() throws Exception {

final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_2);

final HtmlPage page = (HtmlPage) webClient.getPage("http://htmlunit.sourceforge.net");

assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());

 3、使用代理:

Java代码public void testHomePage() throws Exception {       final WebClient webClient = new

WebClient(BrowserVersion.FIREFOX_2, "http://myproxyserver", 8000);       final HtmlPage page

= (HtmlPage) webClient.getPage("http://htmlunit.sourceforge.net");       assertEquals("HtmlUnit

- Welcome to HtmlUnit", page.getTitleText());   }  public void testHomePage() throws Exception

{

final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_2, "http://myproxyserver"

, 8000);

final HtmlPage page = (HtmlPage) webClient.getPage("http://htmlunit.sourceforge.net");

assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());

}

4、 一个表单提交示例:

Java代码public void testHomePage() throws Exception {       final WebClient webClient =

new WebClient();         // Get the first page      final HtmlPage page1 = (HtmlPage)

webClient.getPage("http://some_url");         // Get the form that we are dealing with and

within that form,       // find the submit button and the field that we want to change.     

final HtmlForm form = page1.getFormByName("myform");         final HtmlSubmitInput button =

(HtmlSubmitInput) form.getInputByName("submitbutton");       final HtmlTextInput textField =

(HtmlTextInput) form.getInputByName("userid");         // Change the value of the text field 

textField.setValueAttribute("root");         // Now submit the form by clicking the button

and get back the second page.      final HtmlPage page2 = (HtmlPage) button.click();   }  public

void testHomePage() throws Exception {

final WebClient webClient = new WebClient();

// Get the first page

final HtmlPage page1 = (HtmlPage) webClient.getPage("http://some_url");

// Get the form that we are dealing with and within that form,

// find the submit button and the field that we want to change.

final HtmlForm form = page1.getFormByName("myform");

final HtmlSubmitInput button = (HtmlSubmitInput) form.getInputByName("submitbutton");

final HtmlTextInput textField = (HtmlTextInput) form.getInputByName("userid");

// Change the value of the text field

textField.setValueAttribute("root");

// Now submit the form by clicking the button and get back the second page.

final HtmlPage page2 = (HtmlPage) button.click();

}

上述代码非常简洁明了,如果你需要这样一个工具,那么让大胆尝试吧.   

【注1】Web scraper是一种与spider类似的技术,不过它具有更多合法性问题。scraper是一种spider,其目标是为了从Web上获取特定的内容,例如产品的成本或服务。scraper的一种用途是为了获得有竞争力的价格,从而确定给定产品的价格,以便能够制定出自己产品的合理价格或相应地进行宣传。scraper还可以从很多Web站点上搜集大量数据并将这些信息提供给用户。BTW:我曾经用java的URL类做过此类事情,但现在有了HtmlUnit会更得以应手

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
网易公司版权所有 ©1997-2009