javascript – 如何在chrome headless puppeteer evaluate()中使用xpath?

如何在page.evaluate()中使用$x()来使用xpath expression

至于页面不在同一个上下文中,我直接尝试了$x()(就像我在chrome开发工具中所做的那样),但没有雪茄.

脚本进入超时状态.

解决方法:

$x()不是通过XPath选择元素的标准JavaScript方法. $x()它只有helper in chrome devtools.他们在文档中声明:

Note: This API is only available from within the console itself. You cannot access the Command Line API from scripts on the page.

而page.evaluate()在这里被视为“页面上的脚本”.

您有两种选择:

>使用document.evaluate

以下是在page.evaluate()中选择元素(特色文章)的示例:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://en.wikipedia.org', { waitUntil: 'networkidle2' });

    const text = await page.evaluate(() => {
        // $x() is not a JS standard -
        // this is only sugar syntax in chrome devtools
        // use document.evaluate()
        const featureArticle = document
            .evaluate(
                '//*[@id="mp-tfa"]',
                document,
                null,
                XPathResult.FIRST_ORDERED_NODE_TYPE,
                null
            )
            .singleNodeValue;

        return featureArticle.textContent;
    });

    console.log(text);
    await browser.close();
})();

>按Puppeteer页面选择元素.$x()并将其传递给page.evaluate()

此示例实现与1.示例中相同的结果:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://en.wikipedia.org', { waitUntil: 'networkidle2' });

    // await page.$x() returns array of ElementHandle
    // we are only interested in the first element
    const featureArticle = (await page.$x('//*[@id="mp-tfa"]'))[0];
    // the same as:
    // const featureArticle = await page.$('#mp-tfa');

    const text = await page.evaluate(el => {
        // do what you want with featureArticle in page.evaluate
        return el.textContent;
    }, featureArticle);

    console.log(text);
    await browser.close();
})();

Here是如何向脚本注入$x()辅助函数的相关问题.

上一篇:javascript – Puppeteer:获取内部HTML


下一篇:预渲染插件prerender-spa-plugin使用总结