其实问题很简单: 我的产品页面在 http://www.gdtsearch.com/products.spiderstudio.docapi.htm, 这是一个静态页面; 而我所有的技术博客都在博客园中. 为了让访问者能够直接在产品页面中看到我最新的技术博客, 我将博客页面用IFrame嵌入到了产品页面中:
这效果简直是,,, 矮矬穷, 如何叫人能够忍受? 于是乎一个改造工程开始了, 基本思路简单清晰:
1. 用SS编写采集脚本, 并编译成DLL
2. 用NodeJS编写一个WebApi, 其中调用上面的DLL来获取数据, 然后提供JSONP的结果集
3. 在产品页面中用jquery.Ajax来异步加载数据
开始实施:
首先打开SS编写采集脚本: (http://www.gdtsearch.com/products.spiderstudio.docapi.htm)
public void Run() { Logger.ClearAll(); var docs = GetAllDoc(); foreach(var d in docs) { Logger.Log(d.Title); Logger.Log(d.Url); Logger.Log(d.PubDate); Logger.Log(d.ReadCount.ToString()); Logger.Log(d.Summary); } } public List<Doc> GetAllDoc() { Default.Navigate("about:blank"); Default.Ready(); Default.Navigate("http://www.cnblogs.com/iamzyf/category/498344.html"); Default.Ready(); Logger.Log(string.Format("开始处理 {0} ...", Default.Url.ToString())); var rows = Default.SelectNodes("div.entrylistItem"); List<Doc> result = new List<Doc>(); Logger.Log(string.Format("共加载{0}篇文章.", result.Count)); foreach(var r in rows) { var doc = new Doc(); doc.Title = r.SelectSingleNode("div.entrylistPosttitle>a").Text(); doc.Url = r.SelectSingleNode("div.entrylistPosttitle>a").Attr("href"); doc.Summary = r.SelectSingleNode("div.entrylistPostSummary").Text().Replace("阅读全文", "").Replace("摘要: ", "").Trim(); doc.PubDate = r.SelectSingleNode("div.entrylistItemPostDesc>a:eq(0)").Text(); doc.ReadCount = Regex.Match(r.SelectSingleNode("div.entrylistItemPostDesc").Text(), @"阅读\((?<count>\d+)\)").Groups["count"].Value; result.Add(doc); } Logger.Log("处理完毕."); return result; } public class Doc { public string Title {get;set;} public string Url {get;set;} public string Summary {get;set;} public string PubDate {get;set;} public string ReadCount {get;set;} }
然后生成DLL:
之后再编写NodeJS脚本提供WebApi:
NodeJS调用.NET的DLL需要用到Edge.js (http://tjanczuk.github.io/edge/#/)
然后我还用到了Express.js来简化脚本 (http://expressjs.com/)
先编写一个DLL功能的代理脚本 proxy.js
var edge = require(‘edge‘); exports.GetAllDoc = edge.func({ source: function() {/* using System.Threading; using System.Threading.Tasks; using iamzyf.cnblogs.com; public class Startup { public async Task<object> Invoke(object input) { object result = null; Thread t = new Thread(new ParameterizedThreadStart((p) => { using(var c = new SpiderStudioAPI()) { result = c.GetAllDoc(); } } )); t.SetApartmentState(ApartmentState.STA); t.IsBackground = true; t.Start(input); while (result == null) { Thread.Sleep(100); } return result; } } */}, references: [ ‘iamzyf.cnblogs.com.SpiderStudioAPI.dll‘ ] });
再写WebApi:
var express = require(‘express‘); var app = express(); var port = process.env.port || 1337; var proxy = require(‘./proxy.js‘); app.get(‘/‘, function(req, res) { res.setHeader("Content-Type", "application/json"); proxy.GetAllDoc(null, function(error, result) { if(error) throw error; var str = req.query.callback + ‘(‘ + JSON.stringify(result) + ‘)‘; res.end(str); }); }); app.listen(port);
测试一下, 一切正常!
将Node脚本部署到服务器上运行起来: http://wsoa-mini.cloudapp.net:31337/?callback=test
最后在前端用JQuery.Ajax取数据, 呈现:
HTML
<h2>文章列表</h2> <select id="selDocListSortType"> <option value="Title:asc" selected>标题排序</option> <option value="PubDate:asc">时间正序</option> <option value="PubDate:desc">时间倒序</option> </select> 过滤标题: <input id="txtFilterKeyword" type="text"> <div id="lstDoc" class="doclist">加载中...</div>
CSS
div.doclist { width: 100%; margin: 10px } div.doclist a{ font-size:15px; padding-left: 60px; } div.doclist ol { color: #ccc; list-style-type: none; } div.doclist ol li { position: relative; font: bold italic 45px/1.5 Helvetica, Verdana, sans-serif; margin-bottom: 20px; } div.doclist li p { font: 15px/1.5 Helvetica, sans-serif; color: #666; padding-left: 60px; } div.doclist span { position: absolute; }
Javascript, 除了呈现, 还提供了排序, 搜索功能 :)
<script> var doclist = null; $(document).ready(function() { $("#selDocListSortType").change(function() { var skey = $("#selDocListSortType").val(); var fkey = $("#txtFilterKeyword").val(); bindDocList(skey, fkey); }); $("#txtFilterKeyword").keypress(function() { var skey = $("#selDocListSortType").val(); var fkey = $("#txtFilterKeyword").val(); bindDocList(skey, fkey); }); $.ajax({ type:"get", dataType:"jsonp", url:"http://wsoa-mini.cloudapp.net:31337", success: function (data) { doclist = data; bindDocList(‘Title:asc‘); }, error: function(a, b, c) { //alert("error: " + a + " b:" + b + " c:" + c); } }) }); function bindDocList(sortKey, filterKey) { var data = doclist; if(sortKey) { data = sortByKey(data, sortKey); } if(filterKey) { data = data.filter(function(item) { return item.Title.toUpperCase().indexOf(filterKey.toUpperCase()) != -1; }); } $("#lnkDocApi").text("文档库 (共" + data.length + "篇)"); var html = "<ol>"; for( idx in data ) { html += "<li><span>" + idx + ".</span><a href=‘" + data[idx].Url + "‘ target=_blank>" + data[idx].Title + "</a><p>" + data[idx].Summary + "</p></li>"; } html += "</ol>" $("#lstDoc").html(html); } function sortByKey(array, key) { return array.sort(function(a, b) { var temp = key.split(":"); var mode = temp[1]; var x = a[temp[0]]; var y = b[temp[0]]; if(mode == ‘asc‘) return ((x < y) ? -1 : ((x > y) ? 1 : 0)); else return ((x > y) ? -1 : ((x < y) ? 1 : 0)); }); } </script>
okay, 大功告成!
再次打开页面: http://www.gdtsearch.com/products.spiderstudio.docapi.htm
有没有几分惊艳的赶脚? 呵呵, 反正我认为终于有点高大上了:)
到此我的目的终于达到了, 今后只要在园子里面发文, 就会自动同步到产品页面中, 妥妥的!
本例中用到的工具:
SS - http://www.gdtsearch.com/products.spiderstudio.docapi.htm
NodeJS - http://nodejs.org/
Edge.js - http://tjanczuk.github.io/edge/#/
Express.js - http://expressjs.com/
另需服务器一台host WebApi.
怎么样, 不错吧? 你也动手试试吧!