检查字符串是单个URL还是TEXT(可能包含url)的最简单,最快的方法是什么
可能的情况:
// successful scenario
$example[] = 'http://sub-domain.my-domain.com/folder/file.php?some=param';
// successful scenario
$example[] = '/assets/scripts/jquery.min.js?v=1.4';
// successful scenario
$example[] = 'jquery.min.js';
// this scenario should fail validation
$example[] = "http://www.domain.com welcome text\n and some other http://www.domain.com";
// this scenario should fail validation
$example[] = "scriptVar=50;";
我尝试使用本机php函数,例如parse_url,filter_var,但它们都不如预期那样工作.
更新1
为了更加清楚,我试图将可能的URI与将作为DOM元素插入的脚本内容分开.所有网址都将作为SRC属性,其余作为内容,例如:
<script type="text/javascript" src="{$string}"></script>
<script type="text/javascript">{$string}</script>
更新2
通过分析可能的内容,我得出结论,包含空格字符或分号的字符串表示该字符串不能为URI,我认为此模式可以解决我的问题:
preg_match('/[\s]|[;]/', $string);
它会覆盖所有可能的javascript / css代码吗?
解决方法:
$exampleData = Array(
'http://sub-domain.my-domain.com/folder/file.php?some=param',
'/assets/scripts/jquery.min.js?v=1.4',
'<a href="/assets/scripts/jquery.min.js?v=1.4">',
'<a href="assets/scripts/jquery.min.js?v=1.4">',
'http://www.domain.com welcome text\n and some other http://www.domain.com',
);
foreach($exampleData as $example)
{
echo "Trying \"" . $example . "\" -> ";
echo (preg_match('%((http(s)?://|www\.)[^ \r\n]+|<a.+?href=(\'|")(http(s)?://|www\.|[^#])[^\4\r\n]*?\4.*?>)%i', $example)) ?
"Match" : "No match";
echo "\r\n";
}
这将产生:
Trying "http://sub-domain.my-domain.com/folder/file.php?some=param" -> Match
Trying "/assets/scripts/jquery.min.js?v=1.4" -> No match
Trying "<a href="/assets/scripts/jquery.min.js?v=1.4">" -> Match
Trying "<a href="assets/scripts/jquery.min.js?v=1.4">" -> Match
Trying "http://www.domain.com welcome text\n and some other http://www.domain.com" -> Match
更新:
阅读完您的最新更新.如果要解析HTML.使用如下的DOM解析器:
http://simplehtmldom.sourceforge.net/
例:
include_once('simple_html_dom.php');
$dom = file_get_html('http://www.*.com/');
foreach($dom->find('script') as $scriptElement)
{
if(strlen(trim($scriptElement->src)) > 0)
{
// Script with URI set
echo "<strong>Found script with URI</strong>";
echo "<p>" . $scriptElement->src . "</p>";
}
else
{
// Script with content
echo "<strong>Found script with content</strong>";
echo("<p>" . nl2br(htmlspecialchars($scriptElement->innertext)) . "</p>");
}
}
将输出类似(HTML剥离)的内容:
Found script with URI
http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js
Found script with URI
http://sstatic.net/js/master.min.js?v=afc76d4deac3
Found script with content
var imagePath='http://sstatic.net/*/img/';
var inboxUnviewedCount = -1;
...etc