我想输入很长的网址列表,并在源代码中搜索特定的字符串,输出包含该字符串的网址列表.听起来很简单吧?我想出了下面的代码,输入是html形式.您可以在pelican-cement.com/findfrog上尝试一下.
它似乎可以工作一半时间,但是会被多个url /不同顺序的url甩掉.搜索“ adsense”后,它的ID正确标识为政治1.com
cnn.com
politics1.com
但是,如果反转,则输出为空白.如何获得可靠,一致的结果?最好是我可以输入成千上万个网址的东西?
<html>
<body>
<?
set_time_limit (0);
$urls=explode("\n", $_POST['url']);
$allurls=count($urls);
for ( $counter = 0; $counter <= $allurls; $counter++) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$urls[$counter]);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST,'GET');
curl_setopt ($ch, CURLOPT_HEADER, 1);
curl_exec ($ch);
$curl_scraped_page=curl_exec($ch);
$haystack=strtolower($curl_scraped_page);
$needle=$_POST['proxy'];
if (strlen(strstr($haystack,$needle))>0) {
echo $urls[$counter];
echo "<br/>";
curl_close($ch);
}
}
//$FileNameSQL = "/googleresearch" . abs(rand(0,1000000000000000)) . ".csv";
//$query = "SELECT * FROM happyturtle INTO OUTFILE '$FileNameSQL' FIELDS TERMINATED BY ','";
//$result = mysql_query($query) or die(mysql_error());
//exit;
echo '$FileNameSQL';
?>
</body>
</html>
解决方法:
重新整理了一下代码.罪魁祸首是空白.您需要先修剪网址字符串,然后再使用它(即trim($url);).
其他变化:
>将搜索字词设置在for循环之外,因为它永远不变.
>在循环外设置curl对象,并通过每次仅更改URL来重用它.
>使用curl_setopt_array()在一个语句中设置多个curl选项.
>使用foreach循环,因为无论如何都要遍历整个数组,并且代码更简洁.
>使用stripos()比strstr()更有效,并且无论如何都不区分大小写.
>使用!==比较器可防止隐式类型转换(FALSE!== 0,但FALSE == 0).
>检查返回的$html字符串,因为curl_exec()失败会返回FALSE.
>最后关闭curl对象(即,在if语句之外).
下面的代码可以在my quick mockup上运行.
<html>
<body>
<form action="search.php" method="post">
URLs: <br/>
<textarea rows="20" cols="50" input type="text" name="url" /></textarea><br/>
Search Term: <br/>
<textarea rows="20" cols="50" input type="text" name="proxy" /></textarea><br/>
<input type="submit" />
</form>
<?
if(isset($_POST['url'])) {
set_time_limit (0);
$urls = explode("\n", $_POST['url']);
$term = $_POST['proxy'];
$options = array( CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_CUSTOMREQUEST => 'GET',
CURLOPT_HEADER => 1,
);
$ch = curl_init();
curl_setopt_array($ch, $options);
foreach ($urls as $url) {
curl_setopt($ch, CURLOPT_URL, trim($url));
$html = curl_exec($ch);
if ($html !== FALSE && stristr($html, $term) !== FALSE) { // Found!
echo $url;
}
}
curl_close($ch);
}
?>
</body>
</html>