WebRequest 获取网页乱码

问题:在用WebRequest获取网页源码时得到的源码是乱码。

原因:1,编码不对

解决办法:设置对应编码

WebRequest request = WebRequest.Create(Url);
WebResponse response = await request.GetResponseAsync();

Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream, Encoding.GetEncoding(coding));//这里的coding是页面的编码,可以用Ie右键查看编码。
Result = reader.ReadToEnd();

reader.Dispose();
reader.Dispose();

2,页面进行压缩了

看看html的head,ContentEncoding是否是gzip如果是的话需要解压。//下面的代码是在winrt下的

      WebRequest request = WebRequest.Create(Url);
                WebResponse response = await request.GetResponseAsync();
                Debug.WriteLine(((HttpWebResponse)response).StatusDescription);
                if (response.Headers.AllKeys.Contains("Content-Encoding") && response.Headers["Content-Encoding"].ToLower() == "gzip")//如果使用了GZip则先解压
                {
                    using (System.IO.Stream streamReceive = response.GetResponseStream())
                    {
                        using (var zipStream =
                            new System.IO.Compression.GZipStream(streamReceive, System.IO.Compression.CompressionMode.Decompress))
                        {
                            using (StreamReader sr = new System.IO.StreamReader(zipStream, Encoding.GetEncoding(coding)))
                            {
                                Result = sr.ReadToEnd();
                            }
                        }
                    }
                }

上一篇:telnet建立http连接获取网页HTML内容


下一篇:c#利用WebClient和WebRequest获取网页源代码的比较