IEEE Fraud Detection Competition思路探索

  • 训练集和测试集的数据分在两个不同的表里。通过统计发现只有少部分train_transaction中的TransactionID可以在train_identity中找到对应
# Here we confirm that all of the transactions in `train_identity`
print(np.sum(train_transaction['TransactionID'].isin(train_identity['TransactionID'].unique())))
print(np.sum(test_transaction['TransactionID'].isin(test_identity['TransactionID'].unique())))
输出:
24.4% of TransactionIDs in train (144233 / 590540) have an associated train_identity.
28.0% of TransactionIDs in test (144233 / 590540) have an associated train_identity.
  • TransactionDT 列是时间相关的特征,train_transaction和test_transaction之间没有重复的部分。
train_transaction['TransactionDT'].plot(kind='hist',
                                        figsize=(15, 5),
                                        label='train',
                                        bins=50,
                                        title='Train vs Test TransactionDT distribution')
test_transaction['TransactionDT'].plot(kind='hist',
                                       label='test',
                                       bins=50)
plt.legend()
plt.show()

IEEE Fraud Detection Competition思路探索

  • Categorical Features - Transaction

ProductCD
emaildomain
card1 - card6
addr1, addr2
P_emaildomain
R_emaildomain
M1 - M9

  • Categorical Features - Identity

DeviceType
DeviceInfo
id_12 - id_38

上一篇:Luogu P4017 最大食物链计数


下一篇:P4017 最大食物链计数 【拓扑排序】