论文数据统计Task1
数据集
链接:数据集
运行环境:AI Studio
具体代码实现
导入所需包
# 导入所需的package
import seaborn as sns #用于画图
from bs4 import BeautifulSoup #用于爬取arxiv的数据
import re #用于正则表达式,匹配字符串的模式
import requests #用于网络连接,发送网络请求,使用域名获取对应信息
import json #读取数据,我们的数据为json格式的
import pandas as pd #数据处理,数据分析
import matplotlib.pyplot as plt #画图工具
读入数据并查看数据大小
# 读入数据
data = [] #初始化
#使用with语句优势:1.自动关闭文件句柄;2.自动显示(处理)文件读取数据异常
with open("/home/aistudio/data/data67990/arxiv-metadata-oai-2019.json", 'r') as f:
for line in f:
data.append(json.loads(line))
data = pd.DataFrame(data) #将list变为dataframe格式,方便使用pandas进行分析
data.shape #显示数据大小
(170618, 14)
显示数据的前五行
data.head() #显示数据的前五行
abstract | authors | authors_parsed | categories | comments | doi | id | journal-ref | license | report-no | submitter | title | update_date | versions | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | We systematically explore the evolution of t... | Sung-Chul Yoon, Philipp Podsiadlowski and Step... | [[Yoon, Sung-Chul, ], [Podsiadlowski, Philipp,... | astro-ph | 15 pages, 15 figures, 3 tables, submitted to M... | 10.1111/j.1365-2966.2007.12161.x | 0704.0297 | None | None | None | Sung-Chul Yoon | Remnant evolution after a carbon-oxygen white ... | 2019-08-19 | [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... |
1 | Cofibrations are defined in the category of ... | B. Dugmore and PP. Ntumba | [[Dugmore, B., ], [Ntumba, PP., ]] | math.AT | 27 pages | None | 0704.0342 | None | None | None | Patrice Ntumba Pungu | Cofibrations in the Category of Frolicher Spac... | 2019-08-19 | [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... |
2 | We explore the effect of an inhomogeneous ma... | T.V. Zaqarashvili and K Murawski | [[Zaqarashvili, T. V., ], [Murawski, K, ]] | astro-ph | 6 pages, 3 figures, accepted in A&A | 10.1051/0004-6361:20077246 | 0704.0360 | None | None | None | Zaqarashvili | Torsional oscillations of longitudinally inhom... | 2019-08-19 | [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... |
3 | This paper has been removed by arXiv adminis... | Sezgin Aygun, Ismail Tarhan, Husnu Baysal | [[Aygun, Sezgin, ], [Tarhan, Ismail, ], [Baysa... | gr-qc | This submission has been withdrawn by arXiv ad... | 10.1088/0256-307X/24/2/015 | 0704.0525 | Chin.Phys.Lett.24:355-358,2007 | None | None | Sezgin Ayg\"un | On the Energy-Momentum Problem in Static Einst... | 2019-10-21 | [{'version': 'v1', 'created': 'Wed, 4 Apr 2007... |
4 | The most massive elliptical galaxies show a ... | Antonio Pipino (1,3), Thomas H. Puzia (2,4), a... | [[Pipino, Antonio, ], [Puzia, Thomas H., ], [M... | astro-ph | 32 pages (referee format), 9 figures, ApJ acce... | 10.1086/519546 | 0704.0535 | Astrophys.J.665:295-305,2007 | None | None | Antonio Pipino | The Formation of Globular Cluster Systems in M... | 2019-08-19 | [{'version': 'v1', 'created': 'Wed, 4 Apr 2007... |
进行数据预处理
粗略统计论文的种类信息
'''
count:一列数据的元素个数;
unique:一列数据中元素的种类;
top:一列数据中出现频率最高的元素;
freq:一列数据中出现频率最高的元素的个数;
'''
data["categories"].describe()
count 170618
unique 15592
top cs.CV
freq 5559
Name: categories, dtype: object
以上的结果表明:共有170618个数据,有15592个子类(因为有论文的类别是多个,例如一篇paper的类别是CS.AI & CS.MM和一篇paper的类别是CS.AI & CS.OS属于不同的子类别,这里仅仅是粗略统计),其中最多的种类是cs.CV,共出现了5559次。
查看所有论文的种类
# 所有的种类(独立的)
unique_categories = set([i for l in [x.split(' ') for x in data["categories"]] for i in l])
print(len(unique_categories))
unique_categories
由输出数据可知,数据集*有172个种类的论文
172
{'acc-phys',
'adap-org',
'alg-geom',
'astro-ph',
'astro-ph.CO',
'astro-ph.EP',
'astro-ph.GA',
'astro-ph.HE',
'astro-ph.IM',
'astro-ph.SR',
'chao-dyn',
'chem-ph',
'cmp-lg',
'comp-gas',
'cond-mat',
'cond-mat.dis-nn',
'cond-mat.mes-hall',
'cond-mat.mtrl-sci',
'cond-mat.other',
'cond-mat.quant-gas',
'cond-mat.soft',
'cond-mat.stat-mech',
'cond-mat.str-el',
'cond-mat.supr-con',
'cs.AI',
'cs.AR',
'cs.CC',
'cs.CE',
'cs.CG',
'cs.CL',
'cs.CR',
'cs.CV',
'cs.CY',
'cs.DB',
'cs.DC',
'cs.DL',
'cs.DM',
'cs.DS',
'cs.ET',
'cs.FL',
'cs.GL',
'cs.GR',
'cs.GT',
'cs.HC',
'cs.IR',
'cs.IT',
'cs.LG',
'cs.LO',
'cs.MA',
'cs.MM',
'cs.MS',
'cs.NA',
'cs.NE',
'cs.NI',
'cs.OH',
'cs.OS',
'cs.PF',
'cs.PL',
'cs.RO',
'cs.SC',
'cs.SD',
'cs.SE',
'cs.SI',
'cs.SY',
'dg-ga',
'econ.EM',
'econ.GN',
'econ.TH',
'eess.AS',
'eess.IV',
'eess.SP',
'eess.SY',
'funct-an',
'gr-qc',
'hep-ex',
'hep-lat',
'hep-ph',
'hep-th',
'math-ph',
'math.AC',
'math.AG',
'math.AP',
'math.AT',
'math.CA',
'math.CO',
'math.CT',
'math.CV',
'math.DG',
'math.DS',
'math.FA',
'math.GM',
'math.GN',
'math.GR',
'math.GT',
'math.HO',
'math.IT',
'math.KT',
'math.LO',
'math.MG',
'math.MP',
'math.NA',
'math.NT',
'math.OA',
'math.OC',
'math.PR',
'math.QA',
'math.RA',
'math.RT',
'math.SG',
'math.SP',
'math.ST',
'mtrl-th',
'nlin.AO',
'nlin.CD',
'nlin.CG',
'nlin.PS',
'nlin.SI',
'nucl-ex',
'nucl-th',
'patt-sol',
'physics.acc-ph',
'physics.ao-ph',
'physics.app-ph',
'physics.atm-clus',
'physics.atom-ph',
'physics.bio-ph',
'physics.chem-ph',
'physics.class-ph',
'physics.comp-ph',
'physics.data-an',
'physics.ed-ph',
'physics.flu-dyn',
'physics.gen-ph',
'physics.geo-ph',
'physics.hist-ph',
'physics.ins-det',
'physics.med-ph',
'physics.optics',
'physics.plasm-ph',
'physics.pop-ph',
'physics.soc-ph',
'physics.space-ph',
'q-alg',
'q-bio',
'q-bio.BM',
'q-bio.CB',
'q-bio.GN',
'q-bio.MN',
'q-bio.NC',
'q-bio.OT',
'q-bio.PE',
'q-bio.QM',
'q-bio.SC',
'q-bio.TO',
'q-fin.CP',
'q-fin.EC',
'q-fin.GN',
'q-fin.MF',
'q-fin.PM',
'q-fin.PR',
'q-fin.RM',
'q-fin.ST',
'q-fin.TR',
'quant-ph',
'solv-int',
'stat.AP',
'stat.CO',
'stat.ME',
'stat.ML',
'stat.OT',
'stat.TH',
'supr-con'}
特征处理
任务要求对2019年以后的paper进行分析,所以首先要对时间特征进行预处理,从而得到2019年以后的所有种类的论文:
data["year"] = pd.to_datetime(data["update_date"]).dt.year #将update_date从例如2019-02-20的str变为datetime格式,并提取处year
del data["update_date"] #删除 update_date特征,其使命已完成
data = data[data["year"] >= 2019] #找出 year 中2019年以后的数据,并将其他数据删除
# data.groupby(['categories','year']) #以 categories 进行排序,如果同一个categories 相同则使用 year 特征进行排序
data.reset_index(drop=True, inplace=True) #重新编号
data #查看结果
abstract | authors | authors_parsed | categories | comments | doi | id | journal-ref | license | report-no | submitter | title | versions | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | We systematically explore the evolution of t... | Sung-Chul Yoon, Philipp Podsiadlowski and Step... | [[Yoon, Sung-Chul, ], [Podsiadlowski, Philipp,... | astro-ph | 15 pages, 15 figures, 3 tables, submitted to M... | 10.1111/j.1365-2966.2007.12161.x | 0704.0297 | None | None | None | Sung-Chul Yoon | Remnant evolution after a carbon-oxygen white ... | [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... | 2019 |
1 | Cofibrations are defined in the category of ... | B. Dugmore and PP. Ntumba | [[Dugmore, B., ], [Ntumba, PP., ]] | math.AT | 27 pages | None | 0704.0342 | None | None | None | Patrice Ntumba Pungu | Cofibrations in the Category of Frolicher Spac... | [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... | 2019 |
2 | We explore the effect of an inhomogeneous ma... | T.V. Zaqarashvili and K Murawski | [[Zaqarashvili, T. V., ], [Murawski, K, ]] | astro-ph | 6 pages, 3 figures, accepted in A&A | 10.1051/0004-6361:20077246 | 0704.0360 | None | None | None | Zaqarashvili | Torsional oscillations of longitudinally inhom... | [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... | 2019 |
3 | This paper has been removed by arXiv adminis... | Sezgin Aygun, Ismail Tarhan, Husnu Baysal | [[Aygun, Sezgin, ], [Tarhan, Ismail, ], [Baysa... | gr-qc | This submission has been withdrawn by arXiv ad... | 10.1088/0256-307X/24/2/015 | 0704.0525 | Chin.Phys.Lett.24:355-358,2007 | None | None | Sezgin Ayg\"un | On the Energy-Momentum Problem in Static Einst... | [{'version': 'v1', 'created': 'Wed, 4 Apr 2007... | 2019 |
4 | The most massive elliptical galaxies show a ... | Antonio Pipino (1,3), Thomas H. Puzia (2,4), a... | [[Pipino, Antonio, ], [Puzia, Thomas H., ], [M... | astro-ph | 32 pages (referee format), 9 figures, ApJ acce... | 10.1086/519546 | 0704.0535 | Astrophys.J.665:295-305,2007 | None | None | Antonio Pipino | The Formation of Globular Cluster Systems in M... | [{'version': 'v1', 'created': 'Wed, 4 Apr 2007... | 2019 |
5 | Differential and total cross-sections for ph... | J. Junkersfeld (for the CB-ELSA collaboration) | [[Junkersfeld, J., , for the CB-ELSA collabora... | nucl-ex | 8 pages, 13 figures | 10.1140/epja/i2006-10302-7 | 0704.0710 | Eur.Phys.J.A31:365-372,2007 | None | None | Joerg Junkersfeld | Photoproduction of pi0 omega off protons for E... | [{'version': 'v1', 'created': 'Thu, 5 Apr 2007... | 2019 |
6 | In a ring of s-wave superconducting material... | Walter A. Simmons and Sandip S. Pakvasa | [[Simmons, Walter A., ], [Pakvasa, Sandip S., ]] | quant-ph | 5 pages, pdf format | None | 0704.0803 | None | None | None | Josephine Nanao | Geometric Phase and Superconducting Flux Quant... | [{'version': 'v1', 'created': 'Thu, 5 Apr 2007... | 2019 |
7 | We study the Dirichlet problem associated to... | Xuan Hien Nguyen | [[Nguyen, Xuan Hien, ]] | math.DG | 30 pages | None | 0704.0981 | Adv. Differential Equations 15 (2010), no. 5-6... | None | None | Xuan Hien Nguyen | Construction of Complete Embedded Self-Similar... | [{'version': 'v1', 'created': 'Sat, 7 Apr 2007... | 2019 |
8 | We report a measurement of D0-D0bar mixing i... | L.M. Zhang, et al (for the Belle Collaboration) | [[Zhang, L. M., ]] | hep-ex | 6 pages, 4 figures, Submitted to Physical Revi... | 10.1103/PhysRevLett.99.131803 | 0704.1000 | Phys.Rev.Lett.99:131803,2007 | None | BELLE-CONF-0702 | Liming Zhang | Measurement of D0-D0bar mixing in D0->Ks pi+ p... | [{'version': 'v1', 'created': 'Sat, 7 Apr 2007... | 2019 |
9 | We present single pointing observations of S... | P.D. Klaassen and C.D. Wilson | [[Klaassen, P. D., ], [Wilson, C. D., ]] | astro-ph | 34 pages, 9 figures, accepted for publication ... | 10.1086/518760 | 0704.1245 | Astrophys.J.663:1092-1102,2007 | None | None | Pamela Klaassen | Outflow and Infall in a Sample of Massive Star... | [{'version': 'v1', 'created': 'Tue, 10 Apr 200... | 2019 |
10 | The proton spin structure is not understood ... | K. Aoki (for the PHENIX Collaboration) | [[Aoki, K., , for the PHENIX Collaboration]] | hep-ex | 4 pages, 3 figures, to be published in the Pro... | 10.1063/1.2750791 | 0704.1369 | AIPConf.Proc.915:339-342,2007 | None | None | Kazuya Aoki | Double Helicity Asymmetry of Inclusive pi0 Pro... | [{'version': 'v1', 'created': 'Wed, 11 Apr 200... | 2019 |
11 | Abridged... Blue stragglers (BSS) are though... | Y. Momany, E.V. Held, I. Saviane, S. Zaggia, L... | [[Momany, Y., ], [Held, E. V., ], [Saviane, I.... | astro-ph | Accepted for publication in Astronomy & Astrop... | 10.1051/0004-6361:20067024 | 0704.1430 | None | None | None | Simone Zaggia R. | The blue plume population in dwarf spheroidal ... | [{'version': 'v1', 'created': 'Wed, 11 Apr 200... | 2019 |
12 | The spatial Fourier spectrum of the electron... | Yasha Gindikin and Vladimir A. Sablikov | [[Gindikin, Yasha, ], [Sablikov, Vladimir A., ]] | cond-mat.str-el cond-mat.mes-hall | 10 pages, 11 figures. Misprints fixed | 10.1103/PhysRevB.76.045122 | 0704.1445 | Phys. Rev. B 76, 045122 (2007) | http://arxiv.org/licenses/nonexclusive-distrib... | None | Yasha Gindikin | Deformed Wigner crystal in a one-dimensional q... | [{'version': 'v1', 'created': 'Wed, 11 Apr 200... | 2019 |
13 | The Gemini Planet (GPI) imager is an "extrem... | James R. Graham (1), Bruce Macintosh (2), Rene... | [[Graham, James R., ], [Macintosh, Bruce, ], [... | astro-ph | White paper submitted to the NSF-NASA-DOE Astr... | None | 0704.1454 | None | None | None | James R. Graham | Ground-Based Direct Detection of Exoplanets wi... | [{'version': 'v1', 'created': 'Wed, 11 Apr 200... | 2019 |
14 | We present ACS/HST coronagraphic observation... | D.R. Ardila, D.A. Golimowski, J.E. Krist, M. C... | [[Ardila, D. R., ], [Golimowski, D. A., ], [Kr... | astro-ph | Accepted to ApJ | None | 0704.1507 | None | None | None | David Ardila | HST/ACS Coronagraphic Observations of the Dust... | [{'version': 'v1', 'created': 'Wed, 11 Apr 200... | 2019 |
15 | We have selected a sample of 88 nearby (z<0.... | J. A. L. Aguerri, R. Sanchez-Janssen and C. Mu... | [[Aguerri, J. A. L., ], [Sanchez-Janssen, R., ... | astro-ph | 19 pages, 11 figures, accepted for publication... | 10.1051/0004-6361:20066478 | 0704.1579 | None | None | None | Jose Alfonso Lopez Aguerri | A Study of Catalogued Nearby Galaxy Clusters i... | [{'version': 'v1', 'created': 'Thu, 12 Apr 200... | 2019 |
16 | Photoproduction of pi0 mesons was studied wi... | H. van Pee, O. Bartholomy, V. Crede (for the C... | [[van Pee, H., , for the CB-ELSA Collaboration... | nucl-ex | 17 pages, 17 figures | 10.1140/epja/i2006-10160-3 | 0704.1776 | Eur.Phys.J.A31:61-77,2007 | None | None | Joerg Junkersfeld | Photoproduction of pi0-mesons off protons from... | [{'version': 'v1', 'created': 'Fri, 13 Apr 200... | 2019 |
17 | We investigate the dissipation of magnetic f... | Hideki Maki and Hajime Susa | [[Maki, Hideki, ], [Susa, Hajime, ]] | astro-ph | 12 pages, 7 figures, PASJ accepted | 10.1093/pasj/59.4.787 | 0704.1853 | None | None | None | Hajime Susa | Dissipation of Magnetic Flux in Primordial Sta... | [{'version': 'v1', 'created': 'Sat, 14 Apr 200... | 2019 |
18 | A long duration photon beam can induce macro... | G. Barbiellini (1,2), A. Galli (2,3), L. Amati... | [[Barbiellini, G., ], [Galli, A., ], [Amati, L... | astro-ph | 3 pages, no figure, to be published in "The Pr... | 10.1063/1.2757318 | 0704.2135 | AIPConf.Proc.921:265-267,2007 | None | None | Alessandra Galli | Relativistic interaction of a high intensity p... | [{'version': 'v1', 'created': 'Tue, 17 Apr 200... | 2019 |
19 | The environment of high-redshift galaxies is... | A.P.M. Fangano, A. Ferrara and P. Richter | [[Fangano, A. P. M., ], [Ferrara, A., ], [Rich... | astro-ph | 27 pages, 27 figures. Submitted to MNRAS. Full... | 10.1111/j.1365-2966.2007.12220.x | 0704.2143 | None | None | None | Alessio Fangano | Absorption features of high redshift galactic ... | [{'version': 'v1', 'created': 'Tue, 17 Apr 200... | 2019 |
20 | We describe the methodology and compute the ... | Keigo Fukumura and Demosthenes Kazanas | [[Fukumura, Keigo, ], [Kazanas, Demosthenes, ]] | astro-ph | 26 pages, 21 b/w figures, accepted for publica... | 10.1086/518883 | 0704.2159 | Astrophys.J.664:14-25,2007 | None | None | Keigo Fukumura | Accretion Disk Illumination in Schwarzschild a... | [{'version': 'v1', 'created': 'Tue, 17 Apr 200... | 2019 |
21 | Feedback from black hole activity is widely ... | J.-M. Wang, Y.-M. Chen, C.-S. Yan, C. Hu and W... | [[Wang, J. -M., ], [Chen, Y. -M., ], [Yan, C. ... | astro-ph | 1 color figure and 1 table. ApJ Letters in press | 10.1086/518807 | 0704.2288 | None | None | None | Jian-Min Wang | Suppressed star formation in circumnuclear reg... | [{'version': 'v1', 'created': 'Wed, 18 Apr 200... | 2019 |
22 | Let K be a compact subset of ${\mathbb R}^n$... | Athanasios Batakis (MAPMO), Pierre Levitz (PMC... | [[Batakis, Athanasios, , MAPMO], [Levitz, Pier... | math.CA | None | None | 0704.2362 | Pure & Applied Mathematics Quarterly (2011) Vo... | None | None | Athanasios Batakis | On Brownian flights | [{'version': 'v1', 'created': 'Wed, 18 Apr 200... | 2019 |
23 | We calculate relations on characteristic cla... | Benjamin McKay (University College Cork) | [[McKay, Benjamin, , University College Cork]] | math.DG math.AG | 29 pages (on A4 paper). I split off the result... | None | 0704.2555 | Adv. Geom. 11 (2011), no. 1, 139-168 | http://arxiv.org/licenses/nonexclusive-distrib... | None | Benjamin McKay | Characteristic forms of complex Cartan geometries | [{'version': 'v1', 'created': 'Thu, 19 Apr 200... | 2019 |
24 | A complete model of helium-like line and con... | R. L. Porter and G. J. Ferland | [[Porter, R. L., ], [Ferland, G. J., ]] | astro-ph | 28 pages, 7 figures, accepted to ApJ | 10.1086/518882 | 0704.2642 | Astrophys.J.664:586-595,2007 | None | None | Ryan Porter | Revisiting He-like X-ray Emission Line Plasma ... | [{'version': 'v1', 'created': 'Fri, 20 Apr 200... | 2019 |
25 | We report the first observation of the decay... | T. Medvedeva, R. Chistov, et al (for the Belle... | [[Medvedeva, T., ], [Chistov, R., ]] | hep-ex | 5 pages, 2 PostScript figures, 1 table | 10.1103/PhysRevD.76.051102 | 0704.2652 | Phys.Rev.D76:051102,2007 | None | None | Tatiana Medvedeva | Observation of the Decay \bar{B0}-> Ds+ Lambda... | [{'version': 'v1', 'created': 'Fri, 20 Apr 200... | 2019 |
26 | We study the charmless baryonic three-body d... | M.-Z. Wang, Y.-J. Lee, et al (for the Belle Co... | [[Wang, M. -Z., ], [Lee, Y. -J., ]] | hep-ex | 12 pages, 5 figures (11 figure files), PRD pub... | 10.1103/PhysRevD.76.052004 | 0704.2672 | Phys.Rev.D76:052004,2007 | None | Belle Preprint 2007-19, KEK Preprint 2007-6 | Minzu Wang | Study of B+ to p Lambdabar gamma, p Lambdabar ... | [{'version': 'v1', 'created': 'Fri, 20 Apr 200... | 2019 |
27 | Massive stars, supernovae (SNe), and long-du... | Jorick S. Vink and Rubina Kotak | [[Vink, Jorick S., ], [Kotak, Rubina, ]] | astro-ph | 6 pages, 5 figs, To appear in: "Circumstellar ... | None | 0704.2689 | None | None | None | Jorick S. Vink | Mass loss from Luminous Blue Variables and Qua... | [{'version': 'v1', 'created': 'Fri, 20 Apr 200... | 2019 |
28 | Using data collected with the CLEO III detec... | R.A. Briere, et al. (CLEO Collaboration) | [[Briere, R. A., ]] | hep-ex | 21 pages postscript,also available through\n ... | 10.1103/PhysRevD.76.012005 | 0704.2766 | Phys.Rev.D76:012005,2007 | None | CLNS 06/1984, CLEO 06-24 | Pamela Morehouse | Comparison of Particle Production in Quark and... | [{'version': 'v1', 'created': 'Fri, 20 Apr 200... | 2019 |
29 | Motivated by a proposal to create an optical... | Pavel Exner and Martin Fraas | [[Exner, Pavel, ], [Fraas, Martin, ]] | quant-ph cond-mat.mes-hall math-ph math.MP | LaTeX, 12 pages | 10.1016/j.physleta.2007.05.013 | 0704.2770 | Phys. Lett. A369 (2007), 393-399 | None | None | Pavel Exner | A remark on helical waveguides | [{'version': 'v1', 'created': 'Fri, 20 Apr 200... | 2019 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
170588 | We present a scheme for generating polarizat... | Zachary D. Walton, Alexander V. Sergienko, Bah... | [[Walton, Zachary D., ], [Sergienko, Alexander... | quant-ph | 6 pages, 3 figures | 10.1103/PhysRevA.70.052317 | quant-ph/0405021 | Phys. Rev. A 70, 052317 (2004) | None | None | Zac Walton | Generating Polarization-Entangled Photon Pairs... | [{'version': 'v1', 'created': 'Tue, 4 May 2004... | 2019 |
170589 | In quant-ph/0406139, we have introduced in a... | Elena R. Loubenets | [[Loubenets, Elena R., ]] | quant-ph math-ph math.MP | 6 pages | None | quant-ph/0407097 | None | None | None | Elena R. Loubenets | On validity of the original Bell inequality fo... | [{'version': 'v1', 'created': 'Wed, 14 Jul 200... | 2019 |
170590 | Both the set of quantum states and the set o... | O.V.Man'ko, and V.I.Man'ko | [[Man'ko, O. V., ], [Man'ko, V. I., ]] | quant-ph | 14 pages, to appear in Journal of Russian Lase... | 10.1023/B:JORR.0000043735.34372.8f | quant-ph/0407183 | Journal of Russian Laser Research (2004) 25: 477 | None | None | Olga Manko Vladimirovna | Classical mechanics is not h=0 limit of quantu... | [{'version': 'v1', 'created': 'Fri, 23 Jul 200... | 2019 |
170591 | Classical Floyd-Warshall algorithm is used t... | A. S. Gupta, A. Pathak | [[Gupta, A. S., ], [Pathak, A., ]] | quant-ph | There was a logical flaw in the reported algor... | None | quant-ph/0502144 | None | None | None | Anirban Pathak | Quantum Floyd-Warshall Alorithm | [{'version': 'v1', 'created': 'Wed, 23 Feb 200... | 2019 |
170592 | An experiment performed in 2002 by Sciarrino... | Sofia Wechsler | [[Wechsler, Sofia, ]] | quant-ph | The author of this article re-considered Sciar... | None | quant-ph/0503232 | None | None | None | Sofia Wechsler | Nonlocality of single fermions - branches that... | [{'version': 'v1', 'created': 'Wed, 30 Mar 200... | 2019 |
170593 | For emitters embedded in media of various re... | Chang-Kui Duan, Michael F. Reid | [[Duan, Chang-Kui, ], [Reid, Michael F., ]] | quant-ph | 9pages, 1 figures, presented on AMN-2 and to a... | 10.1016/j.cap.2005.11.016 | quant-ph/0505182 | Current Applied Physics 6, 348-350 (2006) | None | None | Chang-Kui Duan | Local field effects on the radiative lifetimes... | [{'version': 'v1', 'created': 'Tue, 24 May 200... | 2019 |
170594 | The dynamics of a two-mode Bose-Einstein con... | B. R. da Cunha and M. C. de Oliveira | [[da Cunha, B. R., ], [de Oliveira, M. C., ]] | quant-ph cond-mat.other physics.atom-ph | 9 pages, 5 figures | 10.1103/PhysRevA.75.063615 | quant-ph/0602054 | None | None | None | Marcos C. de Oliveira | Optimal Conditions for Atomic Homodyne Detecti... | [{'version': 'v1', 'created': 'Sat, 4 Feb 2006... | 2019 |
170595 | Within a well-known decay model describing a... | Pavel Exner and Martin Fraas | [[Exner, Pavel, ], [Fraas, Martin, ]] | quant-ph | 4 pages, 3 eps figures | 10.1088/1751-8113/40/6/010 | quant-ph/0603067 | J. Phys. A: Math. Theor. 40 (2007), 1333-1340 | None | None | Pavel Exner | The decay law can have an irregular character | [{'version': 'v1', 'created': 'Wed, 8 Mar 2006... | 2019 |
170596 | We study the thermal entanglement in a two-s... | S.Y. Mirafzali, M. Sarbishaei | [[Mirafzali, S. Y., ], [Sarbishaei, M., ]] | quant-ph | 5 pages, 3 figures | None | quant-ph/0608169 | None | None | None | Seyyad Yahya Mirafzali | The effect of anisotropy and external magnetic... | [{'version': 'v1', 'created': 'Tue, 22 Aug 200... | 2019 |
170597 | A simple quantum mechanical model consisting... | E. Kogan | [[Kogan, E., ]] | quant-ph cond-mat.mes-hall | 6 pages, 6 eps figures, revtex | None | quant-ph/0609011 | None | http://arxiv.org/licenses/nonexclusive-distrib... | None | Eugene Kogan | Decay of discrete state resonantly coupled to ... | [{'version': 'v1', 'created': 'Sun, 3 Sep 2006... | 2019 |
170598 | The local hidden variable assumption was rep... | Sofia Wechsler | [[Wechsler, Sofia, ]] | quant-ph | This article is based on very old information.... | None | quant-ph/0610159 | None | None | None | Sofia Wechsler | Are superluminal "signals" an acceptable hypot... | [{'version': 'v1', 'created': 'Thu, 19 Oct 200... | 2019 |
170599 | We study analytic structure of the Green's f... | E. Kogan | [[Kogan, E., ]] | quant-ph cond-mat.mes-hall | 4 pages, 6 eps figures, latex. arXiv admin not... | None | quant-ph/0611043 | None | None | None | Eugene Kogan | On the analytic structure of Green's function ... | [{'version': 'v1', 'created': 'Fri, 3 Nov 2006... | 2019 |
170600 | We introduce Bell-type inequalities allowing... | Perola Milman (PPM, CERMICS), Arne Keller (PPM... | [[Milman, Perola, , PPM, CERMICS], [Keller, Ar... | quant-ph | 4 pages | 10.1103/PhysRevLett.99.130405 | quant-ph/0612044 | Phys. Rev. Lett. 99, 130405 (2007) | None | None | Arne Keller | Bell-type inequalities for cold heteronuclear ... | [{'version': 'v1', 'created': 'Wed, 6 Dec 2006... | 2019 |
170601 | We provide a computational definition of the... | Pablo Arrighi, Gilles Dowek | [[Arrighi, Pablo, ], [Dowek, Gilles, ]] | quant-ph cs.LO cs.PL | The complementary note "On the critical pairs ... | 10.23638/LMCS-13(1:8)2017 | quant-ph/0612199 | Logical Methods in Computer Science, Volume 13... | http://arxiv.org/licenses/nonexclusive-distrib... | None | J\"urgen Koslowski | Lineal: A linear-algebraic Lambda-calculus | [{'version': 'v1', 'created': 'Fri, 22 Dec 200... | 2019 |
170602 | Recently, Farhi, Goldstone, and Gutmann gave... | Andrew M. Childs, Richard Cleve, Stephen P. Jo... | [[Childs, Andrew M., ], [Cleve, Richard, ], [J... | quant-ph | 2 pages. v2: updated name of one author | 10.4086/toc.2009.v005a005 | quant-ph/0702160 | Theory of Computing, Vol. 5 (2009) 119-123 | http://arxiv.org/licenses/nonexclusive-distrib... | None | Andrew M. Childs | Discrete-query quantum algorithm for NAND trees | [{'version': 'v1', 'created': 'Fri, 16 Feb 200... | 2019 |
170603 | The neutral B-meson pair produced at the Ups... | A. Go, A. Bay, et al. (for the Belle Collabora... | [[Go, A., ], [Bay, A., ]] | quant-ph hep-ex | 8 pages, 2 figures, submitted to Phys. Rev. Lett | 10.1103/PhysRevLett.99.131802 | quant-ph/0702267 | Phys.Rev.Lett.99:131802,2007 | None | Belle Preprint 2006-40, KEK Preprint 2006-61 | Apollo Go | Measurement of EPR-type flavour entanglement i... | [{'version': 'v1', 'created': 'Wed, 28 Feb 200... | 2019 |
170604 | .We expound an alternative to the Copenhagen... | Arthur Jabs | [[Jabs, Arthur, ]] | quant-ph | Latex, 88 pages, 6 figures. The present versio... | None | quant-ph/9606017 | None | http://arxiv.org/licenses/nonexclusive-distrib... | None | Arthur Jabs | Quantum Mechanics in Terms of Realism | [{'version': 'v1', 'created': 'Mon, 17 Jun 199... | 2019 |
170605 | It is shown, that for quantum systems the ve... | V. I. Man'ko, G. Marmo, E. C. G. Sudarshan, an... | [[Man'ko, V. I., ], [Marmo, G., ], [Sudarshan,... | quant-ph | Latex,14 pages,accepted by Int. Jour.Mod.Phys | 10.1142/S0217979297000666 | quant-ph/9612007 | Int.J.Mod.Phys. B11 (1997) 1281-1296 | None | None | None | Wigner's Problem and Alternative Commutation R... | [{'version': 'v1', 'created': 'Sat, 30 Nov 199... | 2019 |
170606 | The q-deformation of harmonic oscillators is... | V.I. Man'ko, G.Marmo, F.Zaccaria | [[Man'ko, V. I., ], [Marmo, G., ], [Zaccaria, ... | quant-ph | 23 pages,LATEX, to be published in Rend.Sem.Ma... | None | quant-ph/9703020 | Rend.Sem.Mat.Univ.Politec.Torino 54 (1996) 337... | None | None | None | Deformations and Nonlinear Systems | [{'version': 'v1', 'created': 'Wed, 12 Mar 199... | 2019 |
170607 | The microscopic approach quantum dissipation... | C.P.Sun H.B.Gao, H.F.Dong, S.R.Zhao | [[Gao, C. P. Sun H. B., ], [Dong, H. F., ], [Z... | quant-ph | 9 pages,Latex, E-mail address available after ... | 10.1103/PhysRevE.57.3900 | quant-ph/9706047 | Phys.Rev. E57 (1998) 3900-3904 | None | ITP.AC.97-6-19 | Chang-Pu Sun | Partial Factorization of Wave Function for A Q... | [{'version': 'v1', 'created': 'Fri, 20 Jun 199... | 2019 |
170608 | We consider the possibility of encoding m cl... | Andris Ambainis, Ashwin Nayak, Amnon Ta-Shma, ... | [[Ambainis, Andris, ], [Nayak, Ashwin, ], [Ta-... | quant-ph cs.CC | 12 pages, 3 figures. Defines random access cod... | None | quant-ph/9804043 | None | None | None | Ashwin Nayak | Dense Quantum Coding and a Lower Bound for 1-w... | [{'version': 'v1', 'created': 'Sat, 18 Apr 199... | 2019 |
170609 | This paper has been superseded by quant-ph/9... | Yu Shi | [[Shi, Yu, ]] | quant-ph | This paper has been withdrawn | None | quant-ph/9805083 | None | None | None | Yu Shi | Remarks on Universal Quantum Computer | [{'version': 'v1', 'created': 'Thu, 28 May 199... | 2019 |
170610 | The properties of the time-of-arrival operat... | J. G. Muga, C. R. Leavens and J. P. Palao | [[Muga, J. G., ], [Leavens, C. R., ], [Palao, ... | quant-ph | REVTEX, 12 pages, 4 postscript figures | 10.1103/PhysRevA.58.4336 | quant-ph/9807066 | Phys.Rev. A58 (1998) 1 | None | ULL-FIS-980701 | None | Space-time properties of free motion time-of-a... | [{'version': 'v1', 'created': 'Thu, 23 Jul 199... | 2019 |
170611 | Without imposing the locality condition,it i... | H.Razmi, M.Golshani | [[Razmi, H., ], [Golshani, M., ]] | quant-ph | 5 pages LaTeX | None | quant-ph/9812029 | None | None | TMU-98-03 | None | Locality Is An Unnecessary Assumption of Bell'... | [{'version': 'v1', 'created': 'Mon, 14 Dec 199... | 2019 |
170612 | A quantum computer is proposed in which info... | Mark S. Sherwin, Atac Imamoglu, Thomas Montroy... | [[Sherwin, Mark S., , University of\n Califor... | quant-ph | Revtex 6 pages, 3 postscript figures, minor ty... | 10.1103/PhysRevA.60.3508 | quant-ph/9903065 | None | None | None | Tom Montroy | Quantum Computation with Quantum Dots and Tera... | [{'version': 'v1', 'created': 'Thu, 18 Mar 199... | 2019 |
170613 | We utilize the generation of large atomic co... | V. A. Sautenkov, M. D. Lukin, C. J. Bednar, G.... | [[Sautenkov, V. A., ], [Lukin, M. D., ], [Bedn... | quant-ph | None | 10.1103/PhysRevA.62.023810 | quant-ph/9904032 | None | None | None | Mikhail Lukin | Enhancement of Magneto-Optic Effects via Large... | [{'version': 'v1', 'created': 'Thu, 8 Apr 1999... | 2019 |
170614 | Some explicit traveling wave solutions to a ... | Wen-Xiu Ma, Benno Fuchssteiner | [[Ma, Wen-Xiu, ], [Fuchssteiner, Benno, ]] | solv-int nlin.SI | 14pages, Latex, to appear in Intern. J. Nonlin... | 10.1016/0020-7462(95)00064-X | solv-int/9511005 | None | None | None | Wen-Xiu Ma | Explicit and Exact Solutions to a Kolmogorov-P... | [{'version': 'v1', 'created': 'Tue, 14 Nov 199... | 2019 |
170615 | We consider a hierarchy of many-particle sys... | J C Eilbeck, V Z Enol'skii, V B Kuznetsov, D V... | [[Eilbeck, J C, ], [Enol'skii, V Z, ], [Kuznet... | solv-int nlin.SI | plain LaTeX, 28 pages | None | solv-int/9809008 | None | None | None | Victor Enolskii | Linear r-Matrix Algebra for a Hierarchy of One... | [{'version': 'v1', 'created': 'Wed, 2 Sep 1998... | 2019 |
170616 | Consider the evolution $$ \frac{\pl m_\iy}{\... | M. Adler, T. Shiota and P. van Moerbeke | [[Adler, M., ], [Shiota, T., ], [van Moerbeke,... | solv-int adap-org hep-th nlin.AO nlin.SI | 42 pages | None | solv-int/9909010 | None | None | None | Pierre van Moerbeke | Pfaff tau-functions | [{'version': 'v1', 'created': 'Wed, 15 Sep 199... | 2019 |
170617 | A general solution to the Complex Monge-Amp\... | D.B. Fairlie and A.N. Leznov | [[Fairlie, D. B., ], [Leznov, A. N., ]] | solv-int nlin.SI | 13 pages, latex, no figures | 10.1088/0305-4470/33/25/307 | solv-int/9909014 | None | None | None | David Fairlie | The General Solution of the Complex Monge-Amp\... | [{'version': 'v1', 'created': 'Thu, 16 Sep 199... | 2019 |
170618 rows × 14 columns
筛选数据
这里我们就已经得到了所有2019年以后的论文,下面我们挑选出计算机领域内的所有文章:
#爬取所有的类别
website_url = requests.get('https://arxiv.org/category_taxonomy').text #获取网页的文本数据
soup = BeautifulSoup(website_url,'html.parser') #爬取数据,这里使用lxml的解析器,加速 soup = BeautifulSoup(r.text, ‘html.parser’)
root = soup.find('div',{'id':'category_taxonomy_list'}) #找出 BeautifulSoup 对应的标签入口
tags = root.find_all(["h2","h3","h4","p"], recursive=True) #读取 tags
#初始化 str 和 list 变量
level_1_name = ""
level_2_name = ""
level_2_code = ""
level_1_names = []
level_2_codes = []
level_2_names = []
level_3_codes = []
level_3_names = []
level_3_notes = []
#进行
for t in tags:
if t.name == "h2":
level_1_name = t.text
level_2_code = t.text
level_2_name = t.text
elif t.name == "h3":
raw = t.text
level_2_code = re.sub(r"(.*)\((.*)\)",r"\2",raw) #正则表达式:模式字符串:(.*)\((.*)\);被替换字符串"\2";被处理字符串:raw
level_2_name = re.sub(r"(.*)\((.*)\)",r"\1",raw)
elif t.name == "h4":
raw = t.text
level_3_code = re.sub(r"(.*) \((.*)\)",r"\1",raw)
level_3_name = re.sub(r"(.*) \((.*)\)",r"\2",raw)
elif t.name == "p":
notes = t.text
level_1_names.append(level_1_name)
level_2_names.append(level_2_name)
level_2_codes.append(level_2_code)
level_3_names.append(level_3_name)
level_3_codes.append(level_3_code)
level_3_notes.append(notes)
#根据以上信息生成dataframe格式的数据
df_taxonomy = pd.DataFrame({
'group_name' : level_1_names,
'archive_name' : level_2_names,
'archive_id' : level_2_codes,
'category_name' : level_3_names,
'categories' : level_3_codes,
'category_description': level_3_notes
})
#按照 "group_name" 进行分组,在组内使用 "archive_name" 进行排序
df_taxonomy.groupby(["group_name","archive_name"])
df_taxonomy
筛选结果如下所示:
group_name | archive_name | archive_id | category_name | categories | category_description | |
---|---|---|---|---|---|---|
0 | Computer Science | Computer Science | Computer Science | Artificial Intelligence | cs.AI | Covers all areas of AI except Vision, Robotics... |
1 | Computer Science | Computer Science | Computer Science | Hardware Architecture | cs.AR | Covers systems organization and hardware archi... |
2 | Computer Science | Computer Science | Computer Science | Computational Complexity | cs.CC | Covers models of computation, complexity class... |
3 | Computer Science | Computer Science | Computer Science | Computational Engineering, Finance, and Science | cs.CE | Covers applications of computer science to the... |
4 | Computer Science | Computer Science | Computer Science | Computational Geometry | cs.CG | Roughly includes material in ACM Subject Class... |
5 | Computer Science | Computer Science | Computer Science | Computation and Language | cs.CL | Covers natural language processing. Roughly in... |
6 | Computer Science | Computer Science | Computer Science | Cryptography and Security | cs.CR | Covers all areas of cryptography and security ... |
7 | Computer Science | Computer Science | Computer Science | Computer Vision and Pattern Recognition | cs.CV | Covers image processing, computer vision, patt... |
8 | Computer Science | Computer Science | Computer Science | Computers and Society | cs.CY | Covers impact of computers on society, compute... |
9 | Computer Science | Computer Science | Computer Science | Databases | cs.DB | Covers database management, datamining, and da... |
10 | Computer Science | Computer Science | Computer Science | Distributed, Parallel, and Cluster Computing | cs.DC | Covers fault-tolerance, distributed algorithms... |
11 | Computer Science | Computer Science | Computer Science | Digital Libraries | cs.DL | Covers all aspects of the digital library desi... |
12 | Computer Science | Computer Science | Computer Science | Discrete Mathematics | cs.DM | Covers combinatorics, graph theory, applicatio... |
13 | Computer Science | Computer Science | Computer Science | Data Structures and Algorithms | cs.DS | Covers data structures and analysis of algorit... |
14 | Computer Science | Computer Science | Computer Science | Emerging Technologies | cs.ET | Covers approaches to information processing (c... |
15 | Computer Science | Computer Science | Computer Science | Formal Languages and Automata Theory | cs.FL | Covers automata theory, formal language theory... |
16 | Computer Science | Computer Science | Computer Science | General Literature | cs.GL | Covers introductory material, survey material,... |
17 | Computer Science | Computer Science | Computer Science | Graphics | cs.GR | Covers all aspects of computer graphics. Rough... |
18 | Computer Science | Computer Science | Computer Science | Computer Science and Game Theory | cs.GT | Covers all theoretical and applied aspects at ... |
19 | Computer Science | Computer Science | Computer Science | Human-Computer Interaction | cs.HC | Covers human factors, user interfaces, and col... |
20 | Computer Science | Computer Science | Computer Science | Information Retrieval | cs.IR | Covers indexing, dictionaries, retrieval, cont... |
21 | Computer Science | Computer Science | Computer Science | Information Theory | cs.IT | Covers theoretical and experimental aspects of... |
22 | Computer Science | Computer Science | Computer Science | Machine Learning | cs.LG | Papers on all aspects of machine learning rese... |
23 | Computer Science | Computer Science | Computer Science | Logic in Computer Science | cs.LO | Covers all aspects of logic in computer scienc... |
24 | Computer Science | Computer Science | Computer Science | Multiagent Systems | cs.MA | Covers multiagent systems, distributed artific... |
25 | Computer Science | Computer Science | Computer Science | Multimedia | cs.MM | Roughly includes material in ACM Subject Class... |
26 | Computer Science | Computer Science | Computer Science | Mathematical Software | cs.MS | Roughly includes material in ACM Subject Class... |
27 | Computer Science | Computer Science | Computer Science | Numerical Analysis | cs.NA | cs.NA is an alias for math.NA. Roughly include... |
28 | Computer Science | Computer Science | Computer Science | Neural and Evolutionary Computing | cs.NE | Covers neural networks, connectionism, genetic... |
29 | Computer Science | Computer Science | Computer Science | Networking and Internet Architecture | cs.NI | Covers all aspects of computer communication n... |
... | ... | ... | ... | ... | ... | ... |
125 | Physics | Physics | physics | Plasma Physics | physics.plasm-ph | Description coming soon |
126 | Physics | Physics | physics | Popular Physics | physics.pop-ph | Description coming soon |
127 | Physics | Physics | physics | Physics and Society | physics.soc-ph | Description coming soon |
128 | Physics | Physics | physics | Space Physics | physics.space-ph | Description coming soon |
129 | Physics | Quantum Physics | quant-ph | Quantum Physics | quant-ph | Description coming soon |
130 | Quantitative Biology | Quantitative Biology | Quantitative Biology | Biomolecules | q-bio.BM | DNA, RNA, proteins, lipids, etc.; molecular st... |
131 | Quantitative Biology | Quantitative Biology | Quantitative Biology | Cell Behavior | q-bio.CB | Cell-cell signaling and interaction; morphogen... |
132 | Quantitative Biology | Quantitative Biology | Quantitative Biology | Genomics | q-bio.GN | DNA sequencing and assembly; gene and motif fi... |
133 | Quantitative Biology | Quantitative Biology | Quantitative Biology | Molecular Networks | q-bio.MN | Gene regulation, signal transduction, proteomi... |
134 | Quantitative Biology | Quantitative Biology | Quantitative Biology | Neurons and Cognition | q-bio.NC | Synapse, cortex, neuronal dynamics, neural net... |
135 | Quantitative Biology | Quantitative Biology | Quantitative Biology | Other Quantitative Biology | q-bio.OT | Work in quantitative biology that does not fit... |
136 | Quantitative Biology | Quantitative Biology | Quantitative Biology | Populations and Evolution | q-bio.PE | Population dynamics, spatio-temporal and epide... |
137 | Quantitative Biology | Quantitative Biology | Quantitative Biology | Quantitative Methods | q-bio.QM | All experimental, numerical, statistical and m... |
138 | Quantitative Biology | Quantitative Biology | Quantitative Biology | Subcellular Processes | q-bio.SC | Assembly and control of subcellular structures... |
139 | Quantitative Biology | Quantitative Biology | Quantitative Biology | Tissues and Organs | q-bio.TO | Blood flow in vessels, biomechanics of bones, ... |
140 | Quantitative Finance | Quantitative Finance | Quantitative Finance | Computational Finance | q-fin.CP | Computational methods, including Monte Carlo, ... |
141 | Quantitative Finance | Quantitative Finance | Quantitative Finance | Economics | q-fin.EC | q-fin.EC is an alias for econ.GN. Economics, i... |
142 | Quantitative Finance | Quantitative Finance | Quantitative Finance | General Finance | q-fin.GN | Development of general quantitative methodolog... |
143 | Quantitative Finance | Quantitative Finance | Quantitative Finance | Mathematical Finance | q-fin.MF | Mathematical and analytical methods of finance... |
144 | Quantitative Finance | Quantitative Finance | Quantitative Finance | Portfolio Management | q-fin.PM | Security selection and optimization, capital a... |
145 | Quantitative Finance | Quantitative Finance | Quantitative Finance | Pricing of Securities | q-fin.PR | Valuation and hedging of financial securities,... |
146 | Quantitative Finance | Quantitative Finance | Quantitative Finance | Risk Management | q-fin.RM | Measurement and management of financial risks ... |
147 | Quantitative Finance | Quantitative Finance | Quantitative Finance | Statistical Finance | q-fin.ST | Statistical, econometric and econophysics anal... |
148 | Quantitative Finance | Quantitative Finance | Quantitative Finance | Trading and Market Microstructure | q-fin.TR | Market microstructure, liquidity, exchange and... |
149 | Statistics | Statistics | Statistics | Applications | stat.AP | Biology, Education, Epidemiology, Engineering,... |
150 | Statistics | Statistics | Statistics | Computation | stat.CO | Algorithms, Simulation, Visualization |
151 | Statistics | Statistics | Statistics | Methodology | stat.ME | Design, Surveys, Model Selection, Multiple Tes... |
152 | Statistics | Statistics | Statistics | Machine Learning | stat.ML | Covers machine learning papers (supervised, un... |
153 | Statistics | Statistics | Statistics | Other Statistics | stat.OT | Work in statistics that does not fit into the ... |
154 | Statistics | Statistics | Statistics | Statistics Theory | stat.TH | stat.TH is an alias for math.ST. Asymptotics, ... |
155 rows × 6 columns
数据分析及可视化
首先看一下所有大类的paper数量分布:
_df = data.merge(df_taxonomy, on="categories", how="left").drop_duplicates(["id","group_name"]).groupby("group_name").agg({"id":"count"}).sort_values(by="id",ascending=False).reset_index()
_df
我们使用merge函数,以两个dataframe共同的属性 “categories” 进行合并,并以 “group_name” 作为类别进行统计,统计结果放入 “id” 列中并排序。
结果如下:
group_name | id | |
---|---|---|
0 | Physics | 38379 |
1 | Mathematics | 24495 |
2 | Computer Science | 18087 |
3 | Statistics | 1802 |
4 | Electrical Engineering and Systems Science | 1371 |
5 | Quantitative Biology | 886 |
6 | Quantitative Finance | 352 |
7 | Economics | 173 |
下面我们使用饼图进行上图结果的可视化:
fig = plt.figure(figsize=(15,12))
explode = (0, 0, 0, 0.2, 0.3, 0.3, 0.2, 0.1)
plt.pie(_df["id"], labels=_df["group_name"], autopct='%1.2f%%', startangle=160, explode=explode)
plt.tight_layout()
plt.show()
下面统计在计算机各个子领域2019年后的paper数量:
group_name="Computer Science"
cats = data.merge(df_taxonomy, on="categories").query("group_name == @group_name")
cats.groupby(["year","category_name"]).count().reset_index().pivot(index="category_name", columns="year",values="id")
我们同样使用 merge 函数,对于两个dataframe 共同的特征 categories 进行合并并且进行查询。然后我们再对于数据进行统计和排序从而得到以下的结果:
year | 2019 |
---|---|
category_name | |
Artificial Intelligence | 558 |
Computation and Language | 2153 |
Computational Complexity | 131 |
Computational Engineering, Finance, and Science | 108 |
Computational Geometry | 199 |
Computer Science and Game Theory | 281 |
Computer Vision and Pattern Recognition | 5559 |
Computers and Society | 346 |
Cryptography and Security | 1067 |
Data Structures and Algorithms | 711 |
Databases | 282 |
Digital Libraries | 125 |
Discrete Mathematics | 84 |
Distributed, Parallel, and Cluster Computing | 715 |
Emerging Technologies | 101 |
Formal Languages and Automata Theory | 152 |
General Literature | 5 |
Graphics | 116 |
Hardware Architecture | 95 |
Human-Computer Interaction | 420 |
Information Retrieval | 245 |
Logic in Computer Science | 470 |
Machine Learning | 177 |
Mathematical Software | 27 |
Multiagent Systems | 85 |
Multimedia | 76 |
Networking and Internet Architecture | 864 |
Neural and Evolutionary Computing | 235 |
Numerical Analysis | 40 |
Operating Systems | 36 |
Other Computer Science | 67 |
Performance | 45 |
Programming Languages | 268 |
Robotics | 917 |
Social and Information Networks | 202 |
Software Engineering | 659 |
Sound | 7 |
Symbolic Computation | 44 |
Systems and Control | 415 |
我们可以从结果看出,Computer Vision and Pattern Recognition(计算机视觉与模式识别)类是CS中paper数量最多的子类另外,Computation and Language(计算与语言)以及Cryptography and Security(密码学与安全)的2019年paper数量均超过1000或接近1000,这与我们的认知是一致的。
心得体会
在这次任务里,我学习了 Pandas 的基础操作、数据预处理、数据筛选及数据可视化相关的知识,收获颇多,将会继续努力!