网抓-MSXML2.XMLHTTP库

peter199083 · 发表于 2023-4-19 20:14

本人在学习网抓。我先利用Python库urlopen成功读取了一个网页的内容，Python代码为

from urllib.request import urlopen
html = urlopen('http://pythonscraping.com/pages/page1.html')
print(html.read())

复制代码

打印结果为

b'<html>\n<head>\n<title>A Useful Page</title>\n</head>\n<body>\n<h1>An Intereiqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco lsting Title</h1>\n<div>\nLorem ipsum dolor sit amet, consectetur adipisicing ehenderit in voluptate velit esse cillum dolore eu fugiat nulla parlit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut eniia deserunt mollit anim id est laborum.\n</div>\n</body>\n</html>\m ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n</div>\n</body>\n</html>\n'

我现在想用VBA实现同样的效果，并在VBE中写入如下代码：

Public Sub parsehtml()
Dim http As Object, html As New HTMLDocument, topics As Object, titleElem As Object, detailsElem As Object, topic As HTMLHtmlElement
Dim i As Integer
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://pythonscraping.com/pages/page1.html", False
http.send
html.body.innerHTML = http.responseText
Debug.Print html
End Sub

复制代码

但是立即窗口跳出来的是“[object HTMLDocument]”。所以想请问一下需要调取HTMLDocument的什么属性才可以实现上述效果？

smsn · 发表于 2023-4-20 10:53

http.responseText 不就是网页源代码吗？ Debug.Print http.responseText

perfect131 · 发表于 2023-4-20 11:10

就是获取动态源码

peter199083 · 发表于 2023-4-20 14:32

perfect131 发表于 2023-4-20 11:10
就是获取动态源码

感谢回复。经过测试，您提出的两个解决方案都可以运行。
1. Debug.Print html.all(0).outerHTML
2. Debug.Print http.responseText

		自动登录	找回密码
密码			免费注册

[求助] 网抓-MSXML2.XMLHTTP库

评分