时序数据处理及其应用系统的开发/Time series data process and application systems developing

2018-11-22 23:05:36

data processing time series 数据



数据挖掘是技术是上世纪九十年代国内外迅速发展起来的一门学科,涉及到人工智能、统计学、机器学习、数据库等多个领域。
数据预处理是数据挖掘(知识发现)过程中的一个重要步骤,通过对工业企业的数据库系统中含有噪声、不完整、甚至是不一致的数据的处理可以提高数据挖掘对象的质量,并最终达到提高数据挖掘所获模式知识质量的目的。
本文基于天津乙烯裂解炉产生的工业时序数据,研究并分析了时间序列数据的预处理相关工作。首先讨论了流程工业数据数据特点以及时序数据研究的内容。然后采用最大值填充空缺数据、平均值填充空缺数据、插值填充空缺数据、分箱法去除噪声和数据压缩以及分段线性数据处理的方法实现了对原始工业数据进行了空缺填充、噪声去处和数据压缩等工作。采用缓冲区的概念重点研究并实现了工业时序数据在线实时数据处理的要求。接着利用处理好的数据进行了聚类等数据分析工作,验证了工作的有效性。最终基于J2EE技术开发出工业时序数据应用系统。



Data mining is a rapid developing subject both at home and abroad in the 1990s; it concerns a lot of fields, such as artificial intelligence, statistics, machine learning, database, etc...
Data processing is an important step in data mining (knowledge discovery), it can improved the quality of the data mining through processing the data which include noise, incomplete, even inconsistent factor in industrial enterprise’s database system, and improved the performance of the pattern knowledge which was got by data mining.
This paper research and analyze time series data processing based on industrial time series data which is produced by Tianjing ethylene pryolysis furnace. Firstly, we discussed the characteristic of process industrial data and the research contents of time series. Then, we processed the vacancy, noisy data and realized the data compressing using following methods: filling vacancy data by max values, even values and inserting values, removing the noisy by bin, data compressing and piecewise linear representation data processing. We focus on researched and realized the request of online data processing in industry time series data using the concept of buffer. Then we go on researching the cluster data mining with the processed data, validate the research’s effectiveness. At last we developed industry time series application system based on J2EE.