当大多数人想到数据分析时,他们会想到在Microsoft Excel 之类(like Microsoft Excel)的工具中操作和分析数据。现实情况是,数据分析包含广泛的工具和许多不同的方法来操纵和理解数据所讲述的故事。
什么是数据分析?如果您谈论的是业务数据、制造数据、营销数据或特定于您所经营的行业和业务的数据,则数据分析的使用方式非常不同。(Data)
在本文中,您将了解数据分析的不同方面、它们的含义以及它们通常是如何被广泛使用的。
数据采集(Data Collection)
任何数据分析的第一阶段都是数据收集。这仅仅意味着从包含您需要的信息的所有来源收集数据。
数据可以包括以下任何一项及更多内容:
- 制造机械控制器
- 有人手动将数据输入计算机
- 测量温度、压力等的传感器
- 基于云的(Cloud based)数据源
- 来自互联网的信息,如天气或政府数据库
- (Databases)存放在公司网络上的数据库
许多组织面临的一个主要挑战是弄清楚可以使用哪些技术工具来收集这些信息。大多数时候需要软件连接到远程设备或数据源,然后将它们拉入内部数据库或数据历史系统。
这些存储区域通常被称为“数据仓库”。
一旦信息被收集到组织内部的数据仓库中,就可以使用各种工具来进行实际的数据分析。
商业智能(Business Intelligence)
收集数据后,下一步就是决定如何处理所有这些数据。在商业智能方面,所需的数据应该可以帮助组织做出更好的业务决策。
商业智能(Intelligence)(BI) 报告和仪表板可帮助经理和其他业务领导者更好地了解趋势并深入了解业务的各个方面。
这些方面包括:
- 供应链需求或限制
- 降低成本
- 改善销售
- 客户需求和行为
- 预测未来的销售或市场需求
- 物流和运输
从整个组织的所有这些不同系统中收集数据,您可以在信息之间建立以前不可能实现的联系。
制造智能(Manufacturing Intelligence)
从制造过程中收集数据的困难在于通常只有这么多。
如果您考虑一个典型的制造设施,车间的每台机器都会收集数十到数百个数据点,其中包括:
- 温度和压力
- 零件或产品制造
- 使用的原材料
- 坏零件报废
- 故障计数和警报
在大多数情况下,制造设备通过使用可编程逻辑控制器 ( PLC ) 实现自动化。这些设备不仅根据它们的编程方式运行设备,而且还从该设备收集和收集数据。
从这些PLC(PLCs)中获取数据涉及在与这些PLC(PLCs)位于同一网络的服务器上运行的软件。有许多供应商已经编写了软件来将数据从这些控制器中取出并输入到数据历史数据库或数据库中。
该领域的数据历史学家领导者包括:
- OSIsoft:这家公司已经存在了几十年,包括可以从几乎任何类型的处理器、传感器或数据库中获取数据的“集成商”或驱动程序。
- Factorytalk:长期自动化领导者罗克韦尔自动化(Rockwell Automation)生产了他们自己的数据历史库,称为Factorytalk,以帮助他们的客户从机器处理器收集数据。
- Aveva:以前称为Wonderware,AVEVA Historian承诺提供对过程数据、警报、事件等机器数据的“开放式访问”。
- Iconics:作为数据历史市场中较小的参与者,Iconics 的制造商承诺提供“高速存档”,以便存储的数据分辨率与机器上最初出现的分辨率相匹配。
几乎所有这些软件提供商都包含数据分析工具,以配合他们的数据历史解决方案。为您的制造设施选择正确的数据收集和分析解决方案实际上取决于您使用的控制器、您希望如何存储数据以及您愿意花费多少。
数据可视化
用于收集、分析和可视化业务数据的最流行工具是Microsoft PowerBI。
PowerBI是Microsoft提供的功能强大的可视化工具,可让您从许多不同的数据源中引入数据。然后,您可以跨各种饼图和条形图、折线图、表格等对数据进行切片和切块。
结合来自各种数据源的信息的能力使您可以找到以前不可能的相关性。这就是现代数据分析的魔力。它提供了获得洞察力的能力,这在让您可视化来自多个来源的数据的工具之前是不可能的。
PowerBI并不是唯一能够以这种方式操作和可视化数据的应用程序。事实上,这类工具的市场正在不断增长。
当今领先的数据可视化工具包括:
- Metabase:一种开源(免费)解决方案,自诩让组织中的人员“提出问题并从数据中学习”。
- Tableau:一个流行的数据可视化平台,用于许多不同的行业。可以连接(Connectivity)许多不同的数据源。
- Whatagraph:在营销机构中很受欢迎,因为它很容易生成易于理解的报告。该工具包括自动生成报告,并且可以自动将这些报告通过电子邮件发送给任何人。
- JasperReports:这是另一个开源报告解决方案。它的强大之处在于能够以多种不同格式输出报告,例如打印文档、PDF(PDFs)和基于 Web 的报告。
您决定使用的选项实际上取决于您或您的组织想要进行的投资。值得庆幸的是,如果您需要从那里开始,可以使用出色的开源选项。
数据挖掘
最强大的新数据分析技术之一是称为数据挖掘的技术。
数据(Data)挖掘侧重于使用统计建模从大量数据中提取模式和趋势,以预测未来趋势。
可以执行数据挖掘统计分析的应用程序是高度专业化的,并且通常需要针对当前的应用程序或情况进行定制。
数据挖掘分析的类型包括:
- 探索性数据分析(Exploratory Data Analysis)( EDA ):这涉及搜索数据中的模式以识别新趋势或学习新信息。
- 验证性数据分析(Confirmatory Data Analysis)(CDA:这涉及使用所有收集到的数据来尝试确定可疑的相关性是否正确。
当今市场上一些领先的数据挖掘软件工具包括:
- Rapid Miner :一个用(Rapid Miner)Java编写的优秀的开源预测分析系统。它能够进行机器学习、预测分析和文本挖掘。
- Sisense:为商业智能量身定制的许可软件,能够为大型组织扩展。它包括一个出色的报告模块。
- Oracle:作为数据行业的领先品牌之一,Oracle在(Oracle)SQL中提供数据挖掘功能,让组织可以使用存储在Oracle数据库中的数据。
- IBM Cognos:该软件能够处理大量数据以识别重要趋势。这些可用于为管理或其他人生成报告。
- SAS:数据行业的另一个大牌,统计分析系统(Statistical Analysis System)( SAS ) 专门设计用于根据分析结果挖掘、管理甚至更新数据。
如您所见,数据分析有很多方面,您需要使用的工具实际上取决于您希望从这些数据中学到什么。
数据分析的进步每年都在不断进步,任何希望在其行业中保持领先地位的公司或组织都需要掌握可用的数据分析工具并充分利用它们。
What Is Data Analysis And The Best Tools To Use
When most people think of datа analysis, theу think of manipulating and analyzing data in a tool like Microsoft Excel. The reality is that data analysis encompasses a wide range of tools and a lot of different methods to manipulate and understand the story that the data tells.
What is data analysis? Data analysis is used very differently if you’re talking about business data, manufacturing data, marketing data, or data specific to the industry and business that you operate.
In this article, you’ll learn about the different aspects of data analysis, what they mean, and how they’re generally used across the board.
Data Collection
The first stage of any data analysis is data collection. This simply means gathering data from all of the sources that hold information you need.
Data can include any of the following and more:
- Manufacturing machinery controllers
- Someone manually entering data into a computer
- Sensors that measure temperature, pressure, and more
- Cloud based data sources
- Information from the internet like weather or government databases
- Databases housed on your company network
A major challenge for a lot of organizations is figuring out what technical tools are available to gather that information. Most of the time software is required to connect to that remote device or data source and then pull them into an internal database or data historian system.
These storage areas are often referred to as a “data warehouse”.
Once information is collected into a data warehouse inside an organization, various tools can be used to conduct the actual data analysis.
Business Intelligence
Once data is collected, the next step is deciding what to do with all that data. When it comes to business intelligence, the required data should help an organization make better business decisions.
Business Intelligence (BI) reports and dashboards help managers and other business leaders better understand trends and gain insights into various aspects of the business.
These aspects include:
- Supply chain needs or limitations
- Reducing costs
- Improving sales
- Customer needs and behaviors
- Predicting future sales or market demands
- Logistics and shipping
Gathering data from all of these different systems throughout your organization lets you build connections between information that may never have been possible before.
Manufacturing Intelligence
The difficulty when it comes to gathering data from manufacturing processes is that usually there’s just so much of it.
If you think about a typical manufacturing facility, every single machine on the shop floor collects dozens to hundred of data points that include:
- Temperatures and pressures
- Parts or product made
- Raw material used
- Bad parts scrapped
- Malfunction counts and alarms
In most cases, manufacturing equipment is automated by the use of a programmable logic controller (PLC). These devices not only run the equipment according to how they’re programmed, but they also collect and gather data from that equipment.
Getting data out of those PLCs involves software that runs on a server on the same network as those PLCs. There are many vendors that have written software to get data out of those controllers and into a data historian or a database.
The data historian leaders in this area include:
- OSIsoft: This company has been around for decades, and includes “integrators” or drivers that can get data out of almost any kind of processor, sensor, or database.
- Factorytalk: Long time automation leader Rockwell Automation produced their own data historian called Factorytalk to help their customers collect data from machine processors.
- Aveva: Formerly known as Wonderware, the AVEVA Historian promises to provide “open access” to machine data like process data, alarms, events, and more.
- Iconics: A smaller player in the data historian marketplace, the makers of Iconics promise to provide “high-speed archiving” so the stored data resolution matches what originally occurred on the machine.
Nearly all of these software providers include data analysis tools to go along with their data historian solution. Choosing the right data collection and analytics solution for your manufacturing facility really depends on the controllers you use, how you want to store the data, and how much you are willing to spend.
Data Visualization
The most popular tool for collecting, analyzing, and visualizing business data is Microsoft PowerBI.
PowerBI is a powerful visualization tool offered by Microsoft that lets you bring in data from many different data sources. You can then slice and dice the data across various pie and bar charts, line graphs, tables, and more.
The ability to combine information from various data sources lets you find correlations that wouldn’t have been possible before. This is the magic of modern data analysis. It provides the ability to gain insights that were never before possible before tools that let you visualize data from many sources.
PowerBI isn’t the only app with the ability to manipulate and visualize data in this way. In fact, there’s a growing market for just these types of tools.
The leading data visualization tools today include:
- Metabase: An open-source (free) solution that touts itself as letting people in your organization “ask questions and learn from data”.
- Tableau: A popular data visualization platform used across many different industries. Connectivity with many different data sources is available.
- Whatagraph: Popular among marketing agencies because it’s easy to produce easy-to-understand reports. The tool includes automated report generation and can automatically email those to anyone.
- JasperReports: This is another open-source reporting solution. It’s power comes from the ability to output reports in many different formats like printed documents, PDFs, and web-based reports.
The option you decide to go with really depends on the investment you or your organization wants to make. Thankfully there are excellent open-source options available if that’s where you need to start.
Data Mining
One of the most powerful new data analysis techniques is something called data mining.
Data mining focuses on using statistical modeling to pull patterns and trends out of a large volume of data in order to predict future trends.
The applications that can perform data mining statistical analysis are highly specialized and often need to be customized to the application or situation at hand.
Types of data mining analysis include:
- Exploratory Data Analysis (EDA): This involves searching for patterns in data in order to identify new trends or learn new information.
- Confirmatory Data Analysis (CDA: This involves using all of the collected data to try and determine whether suspected correlations are true.
Some of the leading data mining software tools available on the market today include:
- Rapid Miner: An excellent open-source predictive analysis system written in Java. It’s capable of machine learning, predictive analysis, and text mining.
- Sisense: Licensed software tailored for business intelligence, with the ability to scale up for large organizations. It includes an excellent reporting module.
- Oracle: One of the leading names in the data industry, Oracle offers data mining feature within SQL that lets organizations use data stored in an Oracle database.
- IBM Cognos: This software is capable of processing large volumes of data to identify important trends. These can be used to generate reports for management or others.
- SAS: Another big name in the data industry, Statistical Analysis System (SAS) was specifically designed to mine, manage, and even update data based on analytical results.
As you can see, there are many facets to data analysis and the tools you need to use really depends on what you hope to learn from that data.
Advancements in data analysis continue to advance every year, and any company or organization that hopes to stay ahead in their industry needs to stay on top of what data analysis tools are available and to use them to their fullest potential.