正则表达式

1、至少有一个数字:([0-9]+)

2、表示至少有一个汉字:([^/]+)

3、截取字符串(截取内容中所有a标签中的链接) string[] link = Regex.Split(sArray[i], @"<a[^>]*href=(""(?<href>[^""]*)""|'(?<href>[^']*)'|(?<href>[^\s>]*))[^>]*>(?<text>.*?)</a>", RegexOptions.IgnoreCase | RegexOptions.Singleline);

4、//去掉多余空字符串
htmlBody = Regex.Replace(htmlBody, @"\s+", " ", RegexOptions.Multiline | RegexOptions.IgnoreCase);
htmlBody = Regex.Replace(htmlBody, @"\s*=\s*", "=", RegexOptions.Multiline | RegexOptions.IgnoreCase);
htmlBody = Regex.Replace(htmlBody, @"\s*<\s*", "<", RegexOptions.Multiline | RegexOptions.IgnoreCase);
htmlBody = Regex.Replace(htmlBody, @"\s*>\s*", ">", RegexOptions.Multiline | RegexOptions.IgnoreCase);
htmlBody = Regex.Replace(htmlBody, @"\s*/\s*", "/", RegexOptions.Multiline | RegexOptions.IgnoreCase);

5、//去掉背景、去掉颜色
htmlBody = Regex.Replace(htmlBody, @"(style\s*?=[\s\S]*?)(background\s*[:]\s*[\s\S]*?)([;""]+?)", "$1", RegexOptions.Multiline | RegexOptions.IgnoreCase);
htmlBody = Regex.Replace(htmlBody, @"(style\s*?=[\s\S]*?)(background\[\s\S]*?)([;""]+?)", "$1", RegexOptions.Multiline | RegexOptions.IgnoreCase);
htmlBody = Regex.Replace(htmlBody, "(style\\s*?=[\\s\\S]*?)(background-color[\\s\\S]*?)([;\"]+?)", "$1", RegexOptions.Multiline | RegexOptions.IgnoreCase);
htmlBody = Regex.Replace(htmlBody, "(style\\s*?=[\\s\\S]*?)(color*?)([;\"]+?)", "$1", RegexOptions.Multiline | RegexOptions.IgnoreCase);

6、截取字符串(截取内容中所有img标签中的链接) MatchCollection mc2 = Regex.Matches(htmlBody, @"<img\b[^<>]*?\bsrc[\s\t\r\n]*=[\s\t\r\n]*[""']?[\s\t\r\n]*(?<imgUrl>[^\s\t\r\n""'<>]*)[^<>]*?/?[\s\t\r\n]*>", RegexOptions.Multiline | RegexOptions.IgnoreCase);

上一篇:flask(7):动态路由和路由转换器


下一篇:Day31 接昨天