用C#截取指定长度的中英文混合字符串

2022-09-23 19:38:47

很早以前写过一篇文章(用C#截取指定长度的中英文混合字符串)，但是对性能没有测试，有人说我写的这个方法性能有问题，后来想，可能真会有BT之需求要求传入一个几万K甚至几M体积的字符串进来，那将会影响正则Match的速度，比如文章系统中就极有可能用到，今天有点时间，就改进了一下，代码如下：

 public static string getStr(string s,int l,string endStr)

    {

        string temp = s.Substring(, (s.Length < l)?s.Length:l);

        if (Regex.Replace(temp,"[\u4e00-\u9fa5]","zz",RegexOptions.IgnoreCase).Length<=l)

        {

            return temp;

        }

        for (int i=temp.Length;i>=;i--)

        {

            temp = temp.Substring(,i);

            if (Regex.Replace(temp,"[\u4e00-\u9fa5]","zz",RegexOptions.IgnoreCase).Length<=l-endStr.Length)

            {

                return temp + endStr;

            }

        }

        return endStr;

    }

此修改版的方法多加了个参数"string endStr"，是指当字符串"string s"超过指定长度"int l"时，对结尾的处理，比如要不要加上省略号"..."或加上其它字符。
另外，添加了省略号之后，省略号长度也是算在结果的长度之内了。

用法如：

getStr("中国1中国中国中1111中国", 23,"")
//output:中国1中国中国中1111中国

getStr("中国1中国中国中1111中国", 23,"...")
//output:中国1中国中国中1111...

getStr("中国1中国中国中1111中国中国", 23,"")
//output:中国1中国中国中1111中国

getStr("中国1中国中国中1111中国中国", 23,"...")
//output:中国1中国中国中1111...

----------------------------------------------------------------------

补充："kpz"回复说上边的方法会截取失真，而我又无法做到穷尽测试，所以换了另一种写法，为了考虑性能结果把逻辑搞的有点"晕"，反复测试了多次，代码如下：

public static string getStr2(string s, int l,string endStr)

    {

        string temp = s.Substring(, (s.Length < l+)?s.Length:l+);

        byte[] encodedBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(temp);

        string outputStr = "";

        int count = ;     

        for (int i = ; i < temp.Length; i++)

        {

            if ((int)encodedBytes[i] == )

                count += ;

            else

                count += ; 

            if (count <= l-endStr.Length)

                outputStr += temp.Substring(i,);

            else if (count>l)

                break;

        } 

        if (count<=l)

        {

            outputStr=temp;

            endStr="";

        }

        outputStr += endStr;    

        return outputStr;

    }

用法和参数含义均同前，注意省略号也占位置，算了长度。

码农公寓

相关文章