当前课程知识点:Big Data of Smart Grid > Chapter 7 Prospect of BDSG > Intelligent Disaster or Failure Recognition Means——Related Literature of Electric Power Vision Big Data > 5.4.1 Analysis and Mining: Scheme and process
返回《Big Data of Smart Grid》慕课在线视频课程列表
返回《Big Data of Smart Grid》慕课在线视频列表
Ok welcome back to BDSG from Chang'an University
好的 欢迎回到长安大学的智能大数据(课程)
In this lecture
在这堂课课上
Let’s talk about the 1st subsection of 5.4
让我们来谈谈5.4节
data analysis and mining
数据分析和挖掘的第一部分
This section is divided into two parts
这部分分为两部分
The first part is Scheme and Process
第一部分是方案与流程
The big data analysis scheme in smart grid
智能电网大数据分析方案
is shown in Figure 5.6
如图5.6所示
We can see that there are six parts
我们可以看到有6个部分
like business understanding analytical method
业务理解 分析方法
data preparation implementation feedback
数据准备 实施反馈
model evaluation and data modeling
模型评估和数据建模
The smart grid big data technology architecture
智能电网大数据技术架构
is shown in Figure 5.7
如图5.7所示
It tells us that the technical procedures
它告诉我们这六个部分的技术步骤
and common tools for the six parts
和常用工具
Ok now let's talk about
好的 现在让我们谈谈
some details about Business Understanding
业务理解的一些细节
The smart grid big data analysis project
智能电网大数据分析项目
starts with business needs analysis
开始于业务需求分析
The members of the data science team
数据科学团队成员
conduct multiple analyses and discussions
与业务人员和关键利益相关方
with professional staff and key stakeholders
进行多次分析讨论
to jointly formulate business requirements
共同制定业务需求
and form business problems
形成业务问题
together with the project sponsor
与项目发起人
Let's move to Data Preparation
让我们来看数据准备
determine the analysis goals of the project
共同确定项目的分析目标
That is the third step of application practice
即是应用实践的第三步
that is
即
According to the characteristics of the data source
根据数据源特征
the final application scenarios to be implemented
最终要实施的应用场景
select the corresponding collection tool for data extraction conversion and loading
选择对应的采集工具进行数据抽取 转换及加载
And compile the corresponding
并编制相应的
For Excel and txt file data
对于Excel txt文件型数据
functional design plan
功能设计方案
at the same time
同时
also need to evaluate the personnel
还需评估
technology time and data
可用于项目实施的人员
that can be used for project implementation
技术 时间和数据
Now let's move to the Analytical Method
现在 让我们转到分析方法
The focus of this step is to
此步的重点在于
transform business problems into analytical problems
把业务问题转化为分析问题
and form initial analysis hypotheses
并形成初始的分析假设
and initially determine the analytical mining methods
初步确定需要使用的分析挖掘方法
to be used in order to
以便根据分析目标
cluster classify regression or discover
进行数据的聚类 分类 回归
relationships based on the analysis objectives
或者关系发现
Let's move to data preparation
让我们来看数据准备
That is the third step of application practice
即是应用实践的第三步
According to the characteristics of the data source
根据数据源特征
select the corresponding collection tool
选择对应的采集工具
for data extraction conversion and loading
进行数据抽取 转换及加载
For Excel and txt file data
对于Excel txt文件型数据
use Kettle tools to collect the data
通过Kettle工具将数据采集
into HDFS Hive and other large data storage media
到HDFS Hive等大数据存储介质中
for streaming data (such as message data Log data)
对于流式数据(例如报文数据 日志数据)
it is collected by Flume tools
通过Flume工具进行采集
for relational databases
对于Oracle PostgreSQL MySQL
such as Oracle PostgreSQL MySQL etc.,
等关系型数据库
data is extracted by Sqoop tools
则通过Sqoop进行数据抽取
such as R language SQL Excel and Pandas
还要采用R语言 SQL Excel和Pandas
are also used to statistically explore data quality
等工具统计探查数据质量
Now let's move to the forth step of application practice
现在让我们转到应用实践的第四步
It's the data modeling
即数据建模
Data modeling is the key to smart grid big data analysis
数据建模是智能电网大数据分析的关键
According to the analysis hypothesis and data situation
根据分析假设和数据情况
the preliminary analysis method is used for model training
对初步确定的分析方法进行模型训练
parameter tuning and algorithm verification
参数调优和算法验证
Through data exploration and variable selection
通过数据探索和变量选择
perform descriptive statistical analysis
进行描述性统计分析
and exploratory modeling analysis
和探索性建模分析
to understand the relationship between variables
以理解变量间的关系
Use traditional analysis and mining tools
利用R、Matlab等
such as R and Matlab
传统分析挖掘工具
to statistically analyze a small amount of sampled data
对少量抽样数据进行统计分析
and build a model
并构建模型
Based on the analysis assumption analysis goals
基于分析假设、分析目标
and data exploration situation
和数据探索情况
choose one or a specific type of analysis method
选择一种或一类具体的分析方法
When analyzing and mining large-scale full-volume data
针对大规模全量数据进行分析挖掘时
use distributed algorithms in new analysis
采用Mahout RHadoop MLlib等新型分析挖掘工具
and mining tools such as Mahout RHadoop MLlib etc.
中的分布式算法 进行模型训练
During the model training process
在模型训练过程中
the model parameters need to be adjusted
需根据分析方法的结果
according to the results of the analysis method
对模型参数进行调优
Now let's move to the fifth step of application practice
现在让我们转到应用实践的第五步
It's the Model Evaluation
即模型评估
This step is to verify the analysis method on the actual
此步骤是在实际数据(非训练时采用的数据)
data(the data used during non-training)
上对分析方法进行验证
and iteratively optimize the mining model
根据验证结果
based on the verification results
迭代优化分析挖掘模型
Combined with project analysis goals
结合项目分析目标
and designed business scenarios
和设计的业务场景
the data dimensions or attributes are screened
对数据维度或属性进行筛选
and the corresponding presentation method
根据目的和用户群
is selected according to the purpose and user group
选用相应的展现方式
Now let's take a look about implementation feedback
现在让我们看看实施反馈
Collect feedback information
在实施过程中
during the implementation process
收集反馈信息
and determine whether model correction is needed
并根时据结果反馈情况确定是否需要
based on the results feedback
进行模型修正
Ok that's all for this lecture
好了 这就是这堂课的所有内容
For the next lecture we will talk about the second part of 5.4
下一堂课我们将谈谈5.4节的第二部分
That is the use-case analysis
即用例分析
Ok See you next time Bye bye
好的我们下次再见 拜拜
-Course Introduction and Overview of Big Data
-Chapter 1
-2.1 Why Electirc Power + Big Data? 2.2 Applications
-Chapter 2
-3.1 Grid Operation and Development
-Chapter3
-Related literature on big data applications from the user perspective
-4.1 Data Acquisition+4.2 Data Storage
-4.6 Data Security and Privacy Protection
-Chapter4
-Load forecasting technology related literature
-5.1.1Platform Construction: Demand Analysis
-5.1.2Platform Construction: Design (1)
-5.1.2Platform Construction: Design (2)
-5.2 Data collection and management
-5.3.1 Data Aggregation and Fusion: Scheme and process
-5.3.2 Data Aggregation and Fusion: Application Practice
-5.4.1 Analysis and Mining: Scheme and process
-5.4.2 Analysis and Mining: Use-case analysis
-Chapter5
-6.1 Heavy overload prediction of station area
-6.2 Daily load forecasting of large users
-6.3 Fault correlation analysis of power grid control system equipment
-6.4 Reliability of relay protection equipment family-
-6.5Application of random matrix in big data analysis of smart grid
-Chapter6
-Literature on Power Vision Data Processing Technology
-Development trend and suggestions for BDIG
-Chapter7