打算做个模板爬虫,爬啊爬。
爬虫爬过来的代码不显示调用图片,css,js的绝对路径,引用到本地格式就错乱了。
为了解决这个问题,特地请教大师并优化代码,代码如下。
<?php $rpp = ‘<sdf src="bbbs/sdd" <link rel="stylesheet" type="text/css" href="/public/ui/v2/static/css/basic.css?1594346753">‘; //源代码,有加斜杠的,有没加斜杠的 $furl = "http://www.baidu.com"; /暂定为目标url function relative_to_absolute($content, $feed_url) { preg_match(‘/(http|https|ftp):\/\//‘, $feed_url, $protocol); $server_url = preg_replace("/(http|https|ftp|news):\/\//", "", $feed_url); $server_url = preg_replace("/\/.*/", "", $server_url); if ($server_url == ‘‘) { return $content; } if (isset($protocol[0])) { $new_content = preg_replace(‘/href="/‘, ‘href="‘.$protocol[0].$server_url.‘/‘, $content); $new_content = preg_replace(‘/href="\//‘, ‘href="‘.$protocol[0].$server_url.‘/‘, $new_content); $new_content = preg_replace(‘/src="/‘, ‘src="‘.$protocol[0].$server_url.‘/‘, $new_content); $new_content = preg_replace(‘/src="\//‘, ‘src="‘.$protocol[0].$server_url.‘/‘, $new_content); } else { $new_content = $content; } return $new_content; } print_r(relative_to_absolute($rpp,$furl)); ?>
输出结果如下
<sdf src="http://www.baidu.com/bbbs/sdd" <link rel="stylesheet" type="text/css" href="http://www.baidu.com//public/ui/v2/static/css/basic.css?1594346753">
希望能解决你的问题。