我正在尝试使用Puppeteer进行网络抓取,我需要将值检索到我正在构建的网站中.
我试图在html文件中加载Puppeteer文件,好像它是一个JavaScript文件,但我一直收到错误.但是,如果我在cmd窗口中运行它,它运行良好.
Scraper.js:
getPrice();
function getPrice() {
const puppeteer = require('puppeteer');
void (async () => {
try {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('http://example.com')
await page.setViewport({ width: 1920, height: 938 })
await page.waitForSelector('.m-hotel-info > .l-container > .l-header-section > .l-m-col-2 > .m-button')
await page.click('.m-hotel-info > .l-container > .l-header-section > .l-m-col-2 > .m-button')
await page.waitForSelector('.modal-content')
await page.click('.tile-hsearch-hws > .m-search-tabs > #edit-search-panel > .l-em-reset > .m-field-wrap > .l-xs-col-4 > .analytics-click')
await page.waitForNavigation();
await page.waitForSelector('.tile-search-filter > .l-display-none')
const innerText = await page.evaluate(() => document.querySelector('.tile-search-filter > .l-display-none').innerText);
console.log(innerText)
} catch (error) {
console.log(error)
}
})()
}
index.html的:
<html>
<head></head>
<body>
<script src="../js/scraper.js" type="text/javascript"></script>
</body>
</html>
预期的结果应该是Chrome控制台中的这个:
但我得到了这个错误:
有任何想法吗?
先感谢您!
解决方法:
它适用于浏览器.该包被称为puppeteer-web,专门用于此类情况.
但重点是,必须在某些服务器上运行某些chrome实例.只有这样你才能连接到它.
要使用Browserify捆绑Puppeteer:
Clone Puppeteer存储库:
git clone https://github.com/GoogleChrome/puppeteer && cd puppeteer
npm install
npm run bundle
这将创建包含Puppeteer包的./utils/browser/puppeteer-web.js文件.
您可以稍后在网页中使用它来通过其WS端点驱动另一个浏览器实例:
<script src='./puppeteer-web.js'></script>
<script>
const puppeteer = require('puppeteer');
const browser = await puppeteer.connect({
browserWSEndpoint: '<another-browser-ws-endpont>'
});
// ... drive automation ...
</script>
我和puppeteer和webpack有一些乐趣,
> playground-react-puppeteer
> playground-electron-react-puppeteer-example
有关创建服务器的详细信息,请参阅以下答案,
> Official link to puppeteer-web
> Puppeteer with docker
> Puppeteer with chrome extension
> Puppeteer with local wsEndpoint