REDD的全称为 The Reference Energy Disaggregation Data Set, 目前的版本是Version 1.0
目前版本的下载地址为: http://redd.csail.mit.edu
如果使用该数据集完成研究,发表论文,可以引入如下的文献,这是REDD作者的原始文献:
[1] J. Zico Kolter and Matthew J. Johnson. REDD: A public data set for energy disaggregation research. In proceedings of the SustKDD workshop on Data Mining Applications in Sustainability, 2011.
介绍一下REDD的内容组织和格式:
REDD包含2种类型的数据集:高频数据集和低频数据集,描述如下:
(1)high-frequency current/voltage waveform data of the two power mains(15kHz)
(2)lower-frequency power data including the mains and individual, labeled circuits(up to 24 devices, 1Hz,161MB of zip and 2.48GB of unzip).
数据文件组织如下:
low_freq/ -- ~1Hz power readings, whole home and circuits
high_freq/ -- aligned and group current/voltage waveforms
high_freq_raw/ -- raw current/voltage waveforms
low frequency data(1Hz)的文件内容格式如下:
1306541834 102.964 UTC timestamp and the apparent power
1306541835 103.125
1306541836 104.001
High frequency data(15kHz)的文件内容格式如下:
The high_freq/ directory contains AC waveform data for the power mains and a single phase of the voltage for the home.
current_1.dat -- current waveforms for first power mains
current_2.dat -- current waveforms for second power mains
voltage.dat -- voltage waveforms
这是从高频数据中的数据格式:
1297340206.597013 135.000000 0.000000 3.623859 7.254136 10.949398 ...
1297340208.844086 722.000000 0.000000 3.638527 7.249567 10.929027 ...
关于这个格式的说明如下:
1) A decimal UTC timestamp, in the same format as the timestamps for the low frequency data, but allowing for fractional parts.
2) A cycle count. Although this is represented in the file as a double, it is in fact an integer that indicates for how many AC cycles this particular waveform remains.(keep invariable)
3) 275 decimal values, indicating the value of the waveform (in amps or volts), at equally-spaced portions of the cycle.
这个数据集是为数不多的用于能量分解研究的数据集,价值很高。