猿人学之js混淆源码乱码

篇幅有限

完整内容及源码关注公众号:ReverseCode,发送

题目

https://match.yuanrenxue.com/match/1

抓取所有(5页)机票的价格,并计算所有机票价格的平均值,填入答案。

抓包

打开控制台开始抓包,出现无限debugger循环,可以通过Never pause here或Fiddler过掉。

猿人学之js混淆源码乱码

方案一:在第2行选中右键Never pause here

猿人学之js混淆源码乱码

方案二:通过查看该debug的js名为uzt.js,本地创建uzt.js并修改其中的jsFiddler的AutoResponder下,选中Enable automatic reaponses 和Unmatched requests passthrough

猿人学之js混淆源码乱码

通过翻页获取请求https://match.yuanrenxue.com/api/match/1,参数得其中1607516709为秒时间戳,即time.time()

page: 2
m: 3ddf4f4e72bd84562a0e0104d425a791丨1607657864

分析

搜索丨中文竖线,未果,目测做了js混淆加密。添加XHR断点api/match/1跟踪调用栈到request发现被源码被混淆加密。

猿人学之js混淆源码乱码

将该页面js拷贝下来,存到demo.js中,开始反混淆

git clone https://gitee.com/virjar/jsrepair.git
cd jsrepair
npm i
node cli.js demo.js  开始反混淆

反混淆结果如下,_0x2268f9为毫秒时间戳

window['url'] = '/api/match/1';
request = function () {
    var _0x2268f9 = Date['parse'](new Date()) + (16798545 + -72936737 + 156138192), _0x57feae = oo0O0(_0x2268f9['toString']()) + window['f'];
    const _0x5d83a3 = {};
    _0x5d83a3['page'] = window['page'];
    _0x5d83a3['m'] = _0x57feae + '丨' + _0x2268f9 / (-1 * 3483 + -9059 + 13542);
    var _0xb89747 = _0x5d83a3;
    $['ajax']({
        ...
    });
};
request();

猿人学之js混淆源码乱码

搜索oo0O0,扣出所在的js

function oo0O0(mw) {
    window.b = '';
    for (var i = 0, len = window.a.length; i < len; i++) {
        console.log(window.a[i]);
        window.b += String[document.e + document.g](window.a[i][document.f + document.h]() - i - window.c)
    }
    var U = ['W5r5W6VdIHZcT8kU', 'WQ8CWRaxWQirAW=='];
    var J = function(o, E) {
        o = o - 0x0;
        var N = U[o];
        if (J['bSSGte'] === undefined) {
            var Y = function(w) {
                var m = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/=',
                    T = String(w)['replace'](/=+$/, '');
                var A = '';
                for (var C = 0x0, b, W, l = 0x0; W = T['charAt'](l++); ~W && (b = C % 0x4 ? b * 0x40 + W : W, C++ % 0x4) ? A += String['fromCharCode'](0xff & b >> (-0x2 * C & 0x6)) : 0x0) {
                    W = m['indexOf'](W)
                }
                return A
            };
            var t = function(w, m) {
                var T = [],
                    A = 0x0,
                    C, b = '',
                    W = '';
                w = Y(w);
                for (var R = 0x0, v = w['length']; R < v; R++) {
                    W += '%' + ('00' + w['charCodeAt'](R)['toString'](0x10))['slice'](-0x2)
                }
                w = decodeURIComponent(W);
                var l;
                for (l = 0x0; l < 0x100; l++) {
                    T[l] = l
                }
                for (l = 0x0; l < 0x100; l++) {
                    A = (A + T[l] + m['charCodeAt'](l % m['length'])) % 0x100, C = T[l], T[l] = T[A], T[A] = C
                }
                l = 0x0, A = 0x0;
                for (var L = 0x0; L < w['length']; L++) {
                    l = (l + 0x1) % 0x100, A = (A + T[l]) % 0x100, C = T[l], T[l] = T[A], T[A] = C, b += String['fromCharCode'](w['charCodeAt'](L) ^ T[(T[l] + T[A]) % 0x100])
                }
                return b
            };
            J['luAabU'] = t, J['qlVPZg'] = {}, J['bSSGte'] = !![]
        }
        var H = J['qlVPZg'][o];
        return H === undefined ? (J['TUDBIJ'] === undefined && (J['TUDBIJ'] = !![]), N = J['luAabU'](N, E), J['qlVPZg'][o] = N) : N = H, N
    };
    eval(atob(window['b'])[J('0x0', ']dQW')](J('0x1', 'GTu!'), '\x27' + mw + '\x27'));
    return ''
}

由于本js中返回'',所以只需要关注eval(atob(window['b'])[J('0x0', ']dQW')](J('0x1', 'GTu!'), '\x27' + mw + '\x27'));中做了什么操作。atob是解码使用 base-64 编码的字符串的函数,通过console中获取atob(window['b'])

猿人学之js混淆源码乱码

通过WT-JS加载计算window.f=hex_md5(mwqqppz)

猿人学之js混淆源码乱码

报错未定义,mwqqppz可能是被其他地方替换或者加密来的,eval(atob(window['b'])[J('0x0', ']dQW')](J('0x1', 'GTu!'), '\x27' + mw + '\x27'));中的J方法在oo0O0中被定义,且J方法中引用了U,在控制台定义J和U方法后,再执行J方法

var U = ['W5r5W6VdIHZcT8kU', 'WQ8CWRaxWQirAW=='];
var J = function (o, E) {
    ...
}

猿人学之js混淆源码乱码

拼接得到eval(atob(window['b'])[replace]('mwqqppz', '\x27' + mw + '\x27'));,搜索\bmw\b,发现mw为oo0O0的入参,即时间戳

猿人学之js混淆源码乱码

atob(window['b'])返回的js后追加get_m_value方法获取m=window.f的值

function get_m_value() {
    var timestamp = Date.parse(new Date()) + 100000000;
   // timestamp = '1607657864000'
    f = hex_md5(timestamp+'')
    var m = f;
    m = m + '丨' + timestamp / 1000
    return m;
}

爬虫

import time
import execjs
import requests
def get_page(page_num,parameters):
    url = 'http://match.yuanrenxue.com/api/match/1?page={}&m={}'.format(page_num,parameters)
    headers = {
        'Host': 'match.yuanrenxue.com',
        'Referer': 'http://match.yuanrenxue.com/match/1',
        'User-Agent': 'yuanrenxue.project',
        'X-Requested-With': 'XMLHttpRequest',
        'Cookie': 'qpfccr=true; Hm_lvt_c99546cf032aaa5a679230de9a95c7db=1607556997,1607557857; Hm_lpvt_c99546cf032aaa5a679230de9a95c7db=1607557857; no-alert=true'
    }
    response = requests.get(url=url,headers=headers)
    return response.json()

def calculate_m_value():
    with open(r'1.js',encoding='utf-8',mode='r') as f:
        JsData = f.read()
    psd = execjs.compile(JsData).call('request')
    psd = psd.replace('丨','%E4%B8%A8')
    print('this request parameters is :',psd)
    return psd

if __name__ == '__main__':
    sum_num = 0
    index_num = 0
    for page_num in range(1,6):
        res = get_page(page_num,calculate_m_value())
        data = [__['value'] for __ in res['data']]
        print(data)
        sum_num+=sum(data)
        index_num += len(data)
        time.sleep(1)

    average = sum_num/index_num
    print('the answer is :',average)

本文由博客群发一文多发等运营工具平台 OpenWrite 发布

上一篇:【转】Python爬虫之正则表达式


下一篇:grep 命令过滤配置文件中的注释和空