当前课程知识点:Big Data of Smart Grid > Chapter 7 Prospect of BDSG > Intelligent Disaster or Failure Recognition Means——Related Literature of Electric Power Vision Big Data > 4.3 Data Processing
返回《Big Data of Smart Grid》慕课在线视频课程列表
返回《Big Data of Smart Grid》慕课在线视频列表
好我们开始上课
Ok let’s get start it.
我们继续学习第二部分第四章
And we are still in Part two Chapter 4
这节课
In this lecture
几种数据处理技术的流程
the technical architecture of data processing
将被介绍
will be introduced
智能电网大数据处理的问题复杂多样
The problems of BDSG processing are complex and diversified
不同业务应用领域的数据处理时间 数据规模各不相同
The data processing time and data scale are quite different from application to application
其中数据处理时间一般是业务应用中最敏感的因素
Especially, the data processing time is generally the most sensitive factor in business application
根据在线处理时间的要求
And according to the requirements of online processing time
将业务分为在线、近线和离线
the application of data processing is classified as online, near-line and offline
在线的处理时间一般在秒级甚至是毫秒级
Online processing time is generally in second or even millisecond level
通常采用流式计算方式
usually using the stream calculation method
近线的处理时间一般在分钟级或者小时级
The processing time of near line is usually in minute or hour level
通常采用内存计算方式
and the method of memory calculation is usually adopted
离线的处理时间一般以天为单位
The off-line processing time is generally measured in days
通常采用离线计算方式
and off-line calculation is usually adopted
流式计算
Now let’s take a look about the first data processing stream calculation
流处理的基本理念是数据的价值会随着时间的流逝而不断减少
The basic idea of stream processing is that the value of data will decrease with time
因此尽可能快的分析最新数据并给出分析结果
It is the common goal of all stream computing models to analyze
是所有流式计算处理模式的共同目标
the latest data and give the analysis results as quickly as possible
智能电网中需要采用流式计算处理的大数据应用场景主要有
In intelligent grid, big data applications that need to be processed by flow calculation mainly include
电力系统安全稳定分析
power system security and stability analysis
电力设备运行状态评估
power equipment operation status evaluation
生产环境重要指标计算
calculation of important indicators of production environment
和客户的实施需求等
and implementation needs of customers
为了保证海量数据的实时访问和实时计算分析性能
In order to ensure the real-time access of massive data and real-time computing and analysis performance
智能电网大数据引入了分布式计算处理框架
distributed computing processing framework is introduced for big data in intelligent grid
目前广泛应用的分布式流式计算框架主要包括Storm和Spark Streaming
Storm and Spark Streaming are two most popular distributed streaming computing frameworks at present
智能电网大数据内存计算
Memory calculation in BDSG
主要应用于海量、非实时静态数据的复杂迭代计算
is mainly applied to the complex iterative calculation of large volume and non-real-time static data
可以通过减少磁盘I/O的操作
It can improve the data reading and writing ability and accelerate the distributed computing efficiency
提高数据读写能力,加速海量数据的分布式计算效率
of mass data by reducing the operation of disk I/O
该计算框架也广泛应用于智能电网大数据中的
This computing framework is also widely used in BDSG
有向无环图(DAG)计算、机器学习等方面
such as directed acyclic graph calculation and machine learning
Spark、GraphX和MLib是一些常用的工具
Spark、GraphX and MLib are some common tools
现在让我们一起来看离线计算
Now let's move to offline calculation
智能电网大数据批量计算主要应用于
Big data of intelligent grid batch calculation is mainly applied to
海量、非实时静态数据的批量计算和处理
batch calculation and processing of sea volume and non-real-time static data
批量计算凭借其低成本、高可靠性、高可扩展性的特点
Batch computing is widely used in offline data processing because of
在离线数据处理业务中得到了广泛的应用
its low cost, high reliability and high scalability
当前的离线计算框架众多
Currently, there are many offline computing frameworks,
需要针对数据特点
so it is necessary to select an appropriate offline computing framework from the perspectives of
从编程模型、存储介质、应用类型等角度选择合适的离线计算框架
programming model, storage medium and application type according to the characteristics of data
以满足智能电网大数据应用场景的需要
so as to meet the needs of big data of intelligent grid application scenarios
一些常用的离线计算工具有Mapreduce, pig, HiveQL和Mahout
Some commonly used tools of offline calculation are Mapreduce, pig, HiveQL, and Mahout
本节课的内容就到这里
Ok, that’s all for this lecture
下节课将介绍数据处理的技术架构
Next time, we will learn some technical details of data analysis and mining
-Course Introduction and Overview of Big Data
-Chapter 1
-2.1 Why Electirc Power + Big Data? 2.2 Applications
-Chapter 2
-3.1 Grid Operation and Development
-Chapter3
-Related literature on big data applications from the user perspective
-4.1 Data Acquisition+4.2 Data Storage
-4.6 Data Security and Privacy Protection
-Chapter4
-Load forecasting technology related literature
-5.1.1Platform Construction: Demand Analysis
-5.1.2Platform Construction: Design (1)
-5.1.2Platform Construction: Design (2)
-5.2 Data collection and management
-5.3.1 Data Aggregation and Fusion: Scheme and process
-5.3.2 Data Aggregation and Fusion: Application Practice
-5.4.1 Analysis and Mining: Scheme and process
-5.4.2 Analysis and Mining: Use-case analysis
-Chapter5
-6.1 Heavy overload prediction of station area
-6.2 Daily load forecasting of large users
-6.3 Fault correlation analysis of power grid control system equipment
-6.4 Reliability of relay protection equipment family-
-6.5Application of random matrix in big data analysis of smart grid
-Chapter6
-Literature on Power Vision Data Processing Technology
-Development trend and suggestions for BDIG
-Chapter7