4.3 Data Processing慕课视频播放-Big Data of Smart Grid-MOOC慕课视频教程-柠檬大学

当前课程知识点：Big Data of Smart Grid > Chapter 7 Prospect of BDSG > Intelligent Disaster or Failure Recognition Means——Related Literature of Electric Power Vision Big Data > 4.3 Data Processing

好我们开始上课
Ok let’s get start it.

我们继续学习第二部分第四章
And we are still in Part two Chapter 4

这节课
In this lecture

几种数据处理技术的流程
the technical architecture of data processing

将被介绍
will be introduced

智能电网大数据处理的问题复杂多样
The problems of BDSG processing are complex and diversified

不同业务应用领域的数据处理时间数据规模各不相同
The data processing time and data scale are quite different from application to application

其中数据处理时间一般是业务应用中最敏感的因素
Especially, the data processing time is generally the most sensitive factor in business application

根据在线处理时间的要求
And according to the requirements of online processing time

将业务分为在线、近线和离线
the application of data processing is classified as online, near-line and offline

在线的处理时间一般在秒级甚至是毫秒级
Online processing time is generally in second or even millisecond level

通常采用流式计算方式
usually using the stream calculation method

近线的处理时间一般在分钟级或者小时级
The processing time of near line is usually in minute or hour level

通常采用内存计算方式
and the method of memory calculation is usually adopted

离线的处理时间一般以天为单位
The off-line processing time is generally measured in days

通常采用离线计算方式
and off-line calculation is usually adopted

流式计算
Now let’s take a look about the first data processing stream calculation

流处理的基本理念是数据的价值会随着时间的流逝而不断减少
The basic idea of stream processing is that the value of data will decrease with time

因此尽可能快的分析最新数据并给出分析结果
It is the common goal of all stream computing models to analyze

是所有流式计算处理模式的共同目标
the latest data and give the analysis results as quickly as possible

智能电网中需要采用流式计算处理的大数据应用场景主要有
In intelligent grid, big data applications that need to be processed by flow calculation mainly include

电力系统安全稳定分析
power system security and stability analysis

电力设备运行状态评估
power equipment operation status evaluation

生产环境重要指标计算
calculation of important indicators of production environment

和客户的实施需求等
and implementation needs of customers

为了保证海量数据的实时访问和实时计算分析性能
In order to ensure the real-time access of massive data and real-time computing and analysis performance

智能电网大数据引入了分布式计算处理框架
distributed computing processing framework is introduced for big data in intelligent grid

目前广泛应用的分布式流式计算框架主要包括Storm和Spark Streaming
Storm and Spark Streaming are two most popular distributed streaming computing frameworks at present

智能电网大数据内存计算
Memory calculation in BDSG

主要应用于海量、非实时静态数据的复杂迭代计算
is mainly applied to the complex iterative calculation of large volume and non-real-time static data

可以通过减少磁盘I/O的操作
It can improve the data reading and writing ability and accelerate the distributed computing efficiency

提高数据读写能力，加速海量数据的分布式计算效率
of mass data by reducing the operation of disk I/O

该计算框架也广泛应用于智能电网大数据中的
This computing framework is also widely used in BDSG

有向无环图（DAG）计算、机器学习等方面
such as directed acyclic graph calculation and machine learning

Spark、GraphX和MLib是一些常用的工具
Spark、GraphX and MLib are some common tools

现在让我们一起来看离线计算
Now let's move to offline calculation

智能电网大数据批量计算主要应用于
Big data of intelligent grid batch calculation is mainly applied to

海量、非实时静态数据的批量计算和处理
batch calculation and processing of sea volume and non-real-time static data

批量计算凭借其低成本、高可靠性、高可扩展性的特点
Batch computing is widely used in offline data processing because of

在离线数据处理业务中得到了广泛的应用
its low cost, high reliability and high scalability

当前的离线计算框架众多
Currently, there are many offline computing frameworks,

需要针对数据特点
so it is necessary to select an appropriate offline computing framework from the perspectives of

从编程模型、存储介质、应用类型等角度选择合适的离线计算框架
programming model, storage medium and application type according to the characteristics of data

以满足智能电网大数据应用场景的需要
so as to meet the needs of big data of intelligent grid application scenarios

一些常用的离线计算工具有Mapreduce, pig, HiveQL和Mahout
Some commonly used tools of offline calculation are Mapreduce, pig, HiveQL, and Mahout

本节课的内容就到这里
Ok, that’s all for this lecture

下节课将介绍数据处理的技术架构
Next time, we will learn some technical details of data analysis and mining

4.3 Data Processing在线视频

4.3 Data Processing课程教案、知识点、字幕

Big Data of Smart Grid课程列表：

Chapter 1 What is Big Data

Chapter 2 Big Data of Smart Grid（BDSG）

Chapter 3 Main Application Fields of BDSG

Chapter 4 Technology System of BDSG

Chapter 5 Research Methods and Application Methods of BDSG

Chapter 6 Project Cases of BDSG

Chapter 7 Prospect of BDSG

4.3 Data Processing笔记与讨论

也许你还感兴趣的课程: