Certain characters have special significance in HTML, and should be represented by HTML entities if they are to preserve their meanings. This function returns a string with some of these conversions made; the translations made are those most useful for everyday web programming. If you require all HTML character entities to be translated, use htmlentities() instead.
Html entities:< &…
Html characters: <>&
使用file_get_contents拿到网页之后,如果直接使用echo 输出,浏览器输出会自动解析,输出仍然为网页。
使用htmlspecialchars转换得到的content,然后获得所有的链接。截取。
截取时候会出现问题,
截取使用htmlspecialchars转换过的内容,截取方式如下:
$word = substr($str,strpos($str,‘>‘,5)+4,strpos($str,"<",10)-strpos($str,‘>‘,5)-4);
function
captureKeyArray($url)
{
$content=file_get_contents($url);
$pattern="/<a\s+href=.*<\/a>/imsU";
$match
=
array();
preg_match_all($pattern,$content,$match);
$matchFilter
= array();
foreach($match[0] as
$key=>$val)
{
$str=
htmlspecialchars($val);
if(strpos($str,"img"))
{
}
else
{
//为什么不能直接过滤掉<,要使用<
$word
=
substr($str,strpos($str,‘>‘,5)+4,strpos($str,"<",10)-strpos($str,‘>‘,5)-4);
if($word!="")
{
array_push($matchFilter,$word);
}
}
}
return
$matchFilter;
}