前言
了解学习了下wsh爬虫,虽然只能简单的爬取,但是还是挺方便的。vbs语法还不太会,只能单页爬取,下面js代码可以爬取某多页的文档的内容。
请在控制台中运行,否则的话,你将会出现N多弹框。
以下是js运行结果:
js
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| var html = new ActiveXObject("htmlfile") var http = new ActiveXObject("Msxml2.ServerXMLHTTP")
var PageNum = 0; while (PageNum < 42) { PageNum++; html.designMode = "on" var url = "http://www.doczj.com/doc/0f47f800a6c30c2259019e5e-" + PageNum + ".html"; http.open("GET", url) http.send strHtml = http.responseText html.write(strHtml) var text = html.getElementById("contents") WSH.Echo (text.innerText) WSH.Echo("--------------------------------------------------------------------------------------------") html.designMode = "off" }
|
vbs
1 2 3 4 5 6 7 8 9 10 11 12
| Set html = CreateObject("htmlfile") Set http = CreateObject("Msxml2.ServerXMLHTTP") html.designMode = "on" http.open "GET", "http://www.doczj.com/doc/0f47f800a6c30c2259019e5e-1.html", False http.send strHtml = http.responseText html.write strHtml Set bln = html.getElementById("contents") WSH.Echo (bln.innerText)
|
参考文献
Method of parsing HTML document by VBS (htmlfile)