我想知道如何在python中以编程方式索引/访问一些数据.
我有柱状数据:一组钻孔的深度,温度,梯度,伽玛.有n个钻孔.我有一个标题,其中列出了钻孔名称和数字ID.例:
Bore_name,Bore_ID,,,Bore_name,Bore_ID,,,, ...
<a row of headers>
depth,temp,gradient,gamma,depth,temp,gradient,gamma ...
除了不礼貌的迭代之外,我不知道如何为数据建立索引:
with open(filename,'rU') as f:
bores = f.readline().rstrip().split(',')
headers = f.readline().rstrip().split(',')
# load from CSV file, missing values are empty 'cells'
tdata = numpy.genfromtxt(filename, skip_header=2, delimiter=',', missing_values='', filling_values=numpy.nan)
for column in range(0,numpy.shape(tdata)[1],4):
# plots temperature on x, depth on y
pl.plot(tdata[:,column+1],tdata[:,column], label=bores[column])
# get index at max depth
depth = numpy.nanargmin(tdata[:,column])
# plot text label at max depth (y) and temp at that depth (x)
pl.text(tdata[depth,column+1],tdata[depth,column],bores[column])
这种方式似乎很容易,但是我最近一直在使用R,并且已经习惯了它们通过从标头解释的类和子类来引用数据对象的方式.
解决方法:
好吧,如果您喜欢R的data.table,则有一些(至少)尝试通过NumPy Core中的其他类以及通过外部Python库在NumPy中重新创建该功能.我发现最有前途的工作是Fernando Perez的datarray库.运作方式如下.
>>> # create a NumPy array for use as our data set
>>> import numpy as NP
>>> D = NP.random.randint(0, 10, 40).reshape(8, 5)
>>> # create some generic row and column names to pass to the constructor
>>> row_ids = [ "row{0}".format(c) for c in range(D1.shape[0]) ]
>>> rows = 'rows_id', row_ids
>>> variables = [ "col{0}".format(c) for c in range(D1.shape[1]) ]
>>> cols = 'variable', variables
通过调用构造函数并传入一个普通的NumPy数组和一个元组列表来实例化DataArray实例-每个轴一个元组,由于ndim = 2,因此列表中有两个元组,每个元组都由轴标签组成(str)和该轴的标签序列(列表).
>>> from datarray.datarray import DataArray as DA
>>> D1 = DA(D, [rows, cols])
>>> D1.axes
(Axis(name='rows', index=0, labels=['row0', 'row1', 'row2', 'row3',
'row4', 'row5', 'row6', 'row7']), Axis(name='cols', index=1,
labels=['col0', 'col1', 'col2', 'col3', 'col4']))
>>> # now you can use R-like syntax to reference a NumPy data array by column:
>>> D1[:,'col1']
DataArray([8, 5, 0, 7, 8, 9, 9, 4])
('rows',)