大数据(Big Data)一词在地球上几乎无处不在——线上和线下——越来越多地使用。它不仅与计算机有关。它属于一个称为信息技术(Information Technology)的总称,现在几乎是所有其他技术以及研究和业务领域的一部分。大数据(Big Data)不是什么大问题。围绕它的炒作肯定会让你感到困惑。这篇文章来看看什么是大数据(Big Data)。它还包含一个NetFlix如何使用其数据(或者更确切地说是大数据(Big Data))来更好地满足客户需求的示例。
什么是大数据
存放在贵公司服务器中的数据直到昨天还只是数据——经过分类和归档。突然间,俚语Big Data流行起来,现在你们公司的数据是Big Data。该术语涵盖了您的组织迄今为止存储的每一条数据。它包括存储在云中的数据,甚至包括您添加书签的URL(URLs)。您的公司可能尚未将所有数据数字化。您可能尚未构建所有数据。但是,贵公司的所有数字、论文、结构化和非结构化数据现在都是大数据(Big Data)。
简而言之,您服务器中存在的所有数据(无论是否分类)统称为大数据(BIG DATA)。所有这些数据可用于通过不同类型的分析获得不同的结果。不必所有分析都使用所有数据。不同的分析使用大数据(BIG DATA)的不同部分来产生必要的结果和预测。
大数据(Big Data)本质上是您分析结果的数据,可用于预测和其他用途。当使用“大数据(Big Data)”一词时,突然之间,您的公司或组织正在使用顶级信息(Information)技术,使用您多年来有意或无意存储的相同数据来推断不同类型的结果。
大数据有多大
从本质上讲,所有数据组合在一起就是大数据(Big Data),但许多研究人员同意,大数据(Big Data –)本身不能使用普通的电子表格和常规的数据库管理工具进行操作。他们需要像Hadoop(Hadoop)这样的特殊分析工具(我们将在另一篇文章中对此进行研究),以便可以一次性分析所有数据(可能包括分析迭代)。
与上述相反,虽然我不是该主题的专家,但我会说任何组织的数据——无论大小,有组织或无组织——都是该组织的大数据,该组织可以选择自己的工具来分析数据。(Contrary to the above, though I am not an expert on the subject, I would say that data with any organization – big or small, organized or unorganized – is Big Data for that organization and that the organization may choose its own tools to analyze the data.)
通常,为了分析数据,人们习惯于根据一个或多个常见字段创建不同的数据集,以便分析变得容易。在大数据(Big Data)的情况下,不需要创建子集来分析它。我们现在拥有可以分析数据的工具,无论数据有多大。很可能,这些工具本身即使在分析数据时也会对数据进行分类。
我觉得重要的是要提到Jimmy Guterman的《大数据》一书中的两句话:
“Big Data: when the size and performance requirements for data management become significant design and decision factors for implementing a data management and analysis system.”
-和-
“For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.”
所以你看到,容量和分析都是大数据(Big Data)的重要组成部分。
阅读(Read):什么是数据挖掘?(What is Data Mining?)
大数据概念
这是大多数人不同意的另一点。一些专家说,大数据概念(Big Data Concepts)是三个 V:
- 体积
- 速度
- 种类
其他一些人在这个概念中添加了几个 V:
- 可视化
- 真实性(可靠性)
- 变异性和
- 价值
我将在另一篇文章中介绍大数据(Big Data)的概念,因为这篇文章已经变得越来越大。在我看来,前三个 V 足以解释大数据(Big Data)的概念。
大数据示例——NetFlix 如何(Big Data Example – How NetFlix)使用它来解决问题
到 2008 年, NetFlix(NetFlix)发生了一次中断,导致许多客户被蒙在鼓里。虽然有些人仍然可以访问流媒体服务,但他们中的大多数人不能。一些客户设法获得了他们租用的DVD(DVDs),而另一些客户则失败了。《华尔街日报》(Wall Street Journal)的一篇博文称,Netflix刚刚开始点播流媒体。
停电使管理层考虑未来可能出现的问题,因此;它转向大数据(Big Data)。它使用该数据分析了高流量区域、易受攻击的点和网络吞吐量等,并对其进行处理,以减少在全球范围内出现未来问题时的停机时间。这是华尔街日报博客(Wall Street Journal Blog)的链接(the link),如果您想查看大数据(Big Data)的示例。
以上用外行的语言总结了什么是大数据。您可以称其为非常基本的介绍。我打算再写几篇关于相关因素的文章——概念(Concepts)、分析(Analysis)、工具(Tools)、大数据的使用(uses of Big Data)、大数据 3 V等。同时,如果您想在上面添加任何内容,请评论和分享和我们。
阅读下一篇(Read next):什么是网页抓取(Web Scraping)?
What is Big Data - A Simple Explanation with Example
The term Big Data is being increasingly used almost everywhere on the planet – online and offline. And it is not related to computers only. It comes under a blanket term called Information Technology, which is now part of almost all other technologies and fields of studies and businesses. Big Data is not a big deal. The hype surrounding it is a sure pretty big deal to confuse you. This article takes a look at what is Big Data. It also contains an example of how NetFlix used its data, or rather, Big Data, to better serve its clients’ needs.
What is Big Data
The data lying in the servers of your company was just data until yesterday – sorted and filed. Suddenly, the slang Big Data got popular, and now the data in your company is Big Data. The term covers each and every piece of data your organization has stored until now. It includes data stored in clouds and even the URLs that you bookmarked. Your company might not have digitized all the data. You may not have structured all the data already. But then, all the digital, papers, structured and non-structured data with your company is now Big Data.
In short, all the data – whether or not categorized – present in your servers are collectively called BIG DATA. All this data can be used to get different results using different types of analysis. It is not necessary that all analyses use all the data. The different analysis uses different parts of the BIG DATA to produce the results and predictions necessary.
Big Data is essentially the data that you analyze for results that you can use for predictions and other uses. When using the term Big Data, suddenly your company or organization is working with top-level Information technology to deduce different types of results using the same data that you stored intentionally or unintentionally over the years.
How big is Big Data
Essentially, all the data combined is Big Data, but many researchers agree that Big Data – as such – cannot be manipulated using normal spreadsheets and regular tools of database management. They need special analysis tools like Hadoop (we’ll study this in a separate post) so that all the data can be analyzed at one go (may include iterations of analysis).
Contrary to the above, though I am not an expert on the subject, I would say that data with any organization – big or small, organized or unorganized – is Big Data for that organization and that the organization may choose its own tools to analyze the data.
Normally, for analyzing data, people used to create different data sets based on one or more common fields so that analysis becomes easy. In the case of Big Data, there is no need to create subsets for analyzing it. We now have tools that can analyze data irrespective of how huge it is. Probably, these tools themselves categorize the data even as they are analyzing it.
I find it important to mention two sentences from the book “Big Data” by Jimmy Guterman:
“Big Data: when the size and performance requirements for data management become significant design and decision factors for implementing a data management and analysis system.”
-And-
“For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.”
So you see that both volume and analysis are an important part of Big Data.
Read: What is Data Mining?
Big Data Concepts
This is another point where most people don’t agree. Some experts say that the Big Data Concepts are three V’s:
- Volume
- Velocity
- Variety
Some others add few more V’s to the concept:
- Visualization
- Veracity (Reliability)
- Variability and
- Value
I will cover concepts of Big Data in a separate article as this post is already getting big. In my opinion, the first three V’s are enough to explain the concept of Big Data.
Big Data Example – How NetFlix used it to fix its problems
Towards 2008, there was an outage at NetFlix due to which many customers were left in the dark. While some could still access the streaming services, most of them could not. Some customers managed to get their rented DVDs whereas others failed. A blog post on the Wall Street Journal says Netflix had just started on-demand streaming.
The outage made the management think about the possible future problems and hence; it turned to Big Data. It analyzed high traffic areas, susceptible points, and network throughput, etc. using that data and worked on it to lower the downtime if a future problem arises as it went global. Here is the link to the Wall Street Journal Blog, if you wish to check out the examples of Big Data.
The above summarizes what is Big Data in a layman’s language. You can call it a very basic introduction. I plan to write a few more articles on associated factors such as – Concepts, Analysis, Tools, and uses of Big Data, Big Data 3 V’s, etc. Meanwhile, if you would like to add anything to the above, please comment and share with us.
Read next: What is Web Scraping?