网页抓取(Web scraping)是使用自动化方法从网站中提取数据、信息或图像的行为。将(Think)其视为全自动复制和粘贴。(copy and paste)
我们要么编写应用程序,要么使用应用程序访问我们想要访问的网站,并从这些网站复制我们想要的特定内容。这比下载整个网站要精确得多。
像任何工具一样,网络抓取可以用于善或恶。抓取网站的一些更好的理由是
根据其内容、价格比较购物(price comparison shopping)或监控
股市信息在(stock market information)搜索引擎(search engine)中对其进行排名。您甚至可以将其用作某种研究工具(research tool)。
如何使用Excel抓取网站?
信不信由你,Excel能够从网站提取数据已经有很长一段时间了,至少从Excel 2003 开始(Excel 2003)。只是网络抓取是大多数人没有想到(t think)的,更不用说使用
电子表格程序(spreadsheet program)来完成这项工作了。但它非常简单,而且功能强大。让我们通过制作一组Microsoft Office 键盘(Microsoft Office keyboard)
快捷键来了解它是如何完成的。
找到您要抓取的网站
我们要做的第一件事是找到我们想要从中获取信息的特定网页。让我们去源头(source and search)搜索https://support.office.com/。我们将使用搜索词(search term)“常用快捷方式”。我们可以通过使用特定应用程序的名称来使其更具体,例如Outlook、Excel、Word等。为结果页面添加书签可能是个好主意,这样我们就可以轻松返回那里。
单击(Click)搜索结果(search result)“ Windows版Excel中的键盘快捷方式”。进入该页面后,找到Excel版本列表,然后单击Newer Versions。现在我们正在使用最新最好的。
我们可以返回到我们的搜索结果页面,并在它们自己的选项卡中打开所有其他Office 应用程序的结果并为它们添加书签。(Office apps)这是一个好主意,即使对于这个练习。这是大多数人会停止收集Office快捷方式的地方,但我们不会。我们将把它们放在Excel 中(Excel),这样我们就可以随时随地对它们做任何我们想做的事情。
打开 Excel 并抓取
打开Excel并开始一个新的工作簿。将工作簿另存为Office 快捷方式(Office Shortcuts)。如果您有 OneDrive,请将其保存在那里,以便自动保存(AutoSave )功能起作用。
保存工作簿后,单击“数据(Data)”选项卡。
在“数据”选项卡(Data tab)的功能区中,单击“来自 Web(From Web) ” 。
从Web(From Web )向导窗口将打开。这是我们放置要从中抓取数据的网站的网址或 URL的地方。(web address or URL)切换到您的网络浏览器并复制(copy)URL。
将 URL 粘贴到From Web 向导的(From Web wizard)URL
字段中。我们可以选择在基本(Basic)或高级(Advanced)模式下使用它。高级模式(Advanced mode)为我们提供了更多关于如何从网站访问数据的选项。对于本练习,我们只需要基本模式。单击(Click)确定(OK)。
Excel现在将尝试连接到该网站。这可能需要几秒钟。如果有,我们会看到一个进度窗口(progress window)。
导航器(Navigator)窗口将
打开,我们将在左侧看到来自网站的表格列表。当我们选择一个时,我们会在右侧看到一个表格预览。让我们选择常用快捷方式(Frequently used shortcuts )表。
如果我们需要四处寻找我们想要的表格,我们可以单击Web View选项卡来查看实际的网站。(Web
View)当我们找到它时,我们可以点击它,它将被选中进行导入。
现在,我们单击
此窗口底部的加载按钮。(Load)我们可以选择其他选项,这些选项更复杂,超出了我们第一次抓取的范围。请注意他们在那里。Excel 的网页抓取功能非常强大。
几秒钟后,Web 表格(web table)将在Excel中加载。(Excel)我们将在左侧看到数据,其中数字1位于下图中。数字2突出显示
用于从网站获取数据的查询。(Query)当我们在工作簿中有多个查询时,我们会在此处选择需要使用的查询。
请注意(Notice),数据作为 Excel 表格进入电子表格。它已经为我们设置好了能够过滤或排序数据。
我们可以对所有其他具有Outlook(Outlook)、Word、Access、PowerPoint和任何其他Office 应用程序所需的(Office app)Office快捷方式的网页重复此过程。
在Excel中使抓取的数据(Scraped Data Current)保持最新
作为对您的奖励,我们将学习如何在Excel中保持我们抓取的数据的最新状态。这是说明Excel在数据抓取方面的强大功能的好方法。即使这样,我们也只做Excel可以做的最基本的抓取。
对于这个例子,让我们使用像https://www.cnbc.com/stocks/股票信息(stock information)网页。
完成我们之前所做的并从地址栏中(address bar)复制并粘贴(copy and paste)新的URL。
您将进入导航器窗口(Navigator window)并查看可用的表格。让我们选择美国主要股票指数(Major U.S. Stock Indices)。
抓取数据后,我们将看到以下电子表格。
在右侧,我们看到了对Major US Stock Indexes的查询。
选择(Select)它以使其突出显示。确保(Make)我们位于“表格工具(Table Tools)”选项卡和“设计(Design)”
区域中。然后单击Refresh下的向下箭头。然后单击连接属性(Connection Properties)。
在“查询属性(Query
Properties )”窗口中的“使用(Usage)”
选项卡下,我们可以控制此信息的刷新方式。我们可以设置一个特定的时间段来刷新,或者在我们下次打开工作簿时刷新,或者在后台刷新,或者这些的任意组合。一旦我们选择了我们需要的东西,点击确定(OK)关闭窗口并继续。
而已!现在,您可以从Excel 电子表格(Excel spreadsheet)跟踪股票价格、体育赛事比分或任何其他经常变化的数据。如果您擅长Excel方程和函数,您几乎可以对数据做任何您想做的事情。
也许尝试识别股票趋势,在工作中运行一个梦幻般的运动池,或者只是跟踪天气。谁知道?您的想象力和互联网(Internet)上可用的数据是唯一的限制。
Use Excel as a Tool to Copy Data from the Web
Web ѕcraping is the aсt of extracting data, information, or іmages from a website using an automated method. Think of it as copy and pаste оn full аutomatic.
We either write or use an app to go to the websites we want it to and make a copy of the specific things we want from those websites. It’s much more precise than downloading an entire website.
Like any tool, web scraping can be used for good or evil.
Some of the better reasons for scraping websites would be ranking it in a
search engine based on its content, price comparison shopping, or monitoring
stock market information. You might even use it as a research tool of sorts.
How Can I Scrape Websites with Excel?
Believe it or not, Excel has had the ability to extract data
from websites for a long time, at least since Excel 2003. It’s just that web
scraping is something most people don’t think of, let alone think of using a
spreadsheet program to do the job. But it’s surprisingly easy, and powerful.
Let’s learn how it’s done by making a collection of Microsoft Office keyboard
shortcuts.
Find the Sites You Want to Scrape
The first thing we’re going to do is find the specific web pages from which we want to get information. Let’s go to the source and search at https://support.office.com/. We’re going to use the search term “frequently used shortcuts”. We can make it more specific by using the name of the specific app, like Outlook, Excel, Word, and so on. It may be a good idea to bookmark the results page so we can easily get back there.
Click on the search result, “Keyboard shortcuts in Excel for
Windows”. Once on that page, find the list of Excel versions and click on Newer Versions. Now we’re working with
the latest and greatest.
We could go back to our search results page and open the
results for all the other Office apps in their own tabs and bookmark them. It’s
a good idea, even for this exercise. This is where most people would stop in
collecting Office shortcuts, but not us. We’re going to put them in Excel so we
can do whatever we want with them, whenever we want.
Open Excel and Scrape
Open Excel and start a new workbook. Save the workbook as Office Shortcuts. If you have OneDrive,
save it there so the AutoSave feature
will work.
Once the workbook is saved, click on the Data tab.
In the ribbon of the Data tab, click on From Web.
The From Web wizard
window will open. This is where we put the web address or URL of the website
from which we want to scrape data. Switch to your web browser and copy the URL.
Paste the URL in to the URL
field of the From Web wizard. We could choose to use this in Basic or Advanced mode. The Advanced mode gives us a lot more options on how
to access the data from the website. For this exercise, we only need Basic
mode. Click OK.
Excel will now attempt to connect to the website. This may
take a few seconds. We’ll see a progress window, if it does.
The Navigator
window will open, and we’ll see a list of tables from the website on the left.
When we select one, we’ll see a table preview on the right. Let’s select the Frequently used shortcuts table.
We can click on the Web
View tab to see the actual website, if we need to look around for the table
we want. When we find it, we can click on it and it will be selected for
import.
Now, we click on the Load
button at the bottom of this window. There are other options we could choose,
which are more complex and beyond the scope of doing our first scrape. Just be
aware that they are there. Excel’s web scraping capabilities are very powerful.
The web table will be loaded in Excel after a few seconds.
We’ll see the data on the left, where the number 1 is in the picture below. Number 2 highlights the Query
used to get the data from the website. When we have multiple queries in a work
book, this is where we select the one we need to use.
Notice that the data comes into the spreadsheet as an Excel
table. It’s already set up for us to be able to filter or sort the data.
We can repeat this process for all the other web pages that
have the Office shortcuts that we want for Outlook, Word, Access, PowerPoint,
and any other Office app.
Keeping Scraped Data Current in Excel
As a bonus for you, we’re going to learn how to keep our
scraped data fresh in Excel. This is a great way to illustrate just how
powerful Excel is for data scraping. Even with this, we’re only doing the most
basic scraping that Excel can do.
For this example, let’s use a stock information web page like https://www.cnbc.com/stocks/.
Go through what we did before and copy and paste the new URL from the address bar.
You’ll get to the Navigator window and see the tables available. Let’s select the Major U.S. Stock Indices.
Once the data is scraped we’ll see the following
spreadsheet.
On the right, we see the query for Major U.S. Stock Indexes.
Select that so it is highlighted. Make sure we’re in the Table Tools tab and in the Design
area. Then click on the down arrow under Refresh.
Then click on Connection Properties.
In the Query
Properties window, under the Usage
tab, we can control how this information refreshes. We can set a specific time
period to refresh, or to refresh when we open the workbook the next time, or to
refresh in the background, or any combination of these. Once we choose what we
need, click on OK to close the
window and continue.
That’s it! Now you can track stock prices, sports scores, or any other data that changes frequently from an Excel spreadsheet. If you’re good with Excel equations and functions, you can do almost anything you want with the data.
Maybe try to identify stock trends, run a fantasy sports pool at work, or maybe just keep track of the weather. Who knows? Your imagination and the data available on the Internet, are the only limits.