Fantasy Football is a nerdy game. By saying that, there is no way you can win the championship without studying it. That’s how web scraping comes in handy. In this tutorial, you will learn how to build a web scraping tool to accomplish the job.
You can now curate valuable data automatically into one spreadsheet promptly. Rather than looking up the stats from various sources by copying and pasting, this is much easier and faster.
Table of Contents
Why Web Scraping?
According to the report, the average players will spend 3 hours every week to manage their teams, and 9 additional hours are reading about trends. Around 30% of the players are managing their teams during their day jobs.
Tons of information is under your fingertips. Making an accurate prediction of the team players’ performance is tough. How are you able to pick the 2nd tier players and make a top-tier result? You need to track the game statistics and find hidden values.
What is Web Scraping?
Web scraping is the technique to automate the process of data extraction from the websites. Traditionally, you will need a programmer to script.
As for now, a web scraping tool substitutes the labor work of coding. Scraping isn’t a programmer’s privilege any more. Anyone can extract valuable information from the internet and save it into local storage or on the cloud.
In this article, I will walk you through how to extract fantasy football projections points from sports websites like fantasypros.com with a web scraping tool.
It isn’t necessary to document the entire page. You can be even more creative, and get a leg up by making a side-by-side comparison with opponent’s teams for a thorough analysis.
We then compare that with Python script. So you will have an idea of how easy it is for all of us, especially Fantasy football players, to keep track of the stats.
Disclaimer: I am new to Fantasy Football. This article doesn’t provide professional advice in draft strategies. Instead, this is a piece of sharing knowledge from a statistical perspective.
Web Scraping with Octoparse
Octoparse, a very intuitive web scraping tool. It helps me accomplish many obstacles in data analysis projects. It is the best on the market. You can download it here.
Create a project:
Open Octoparse, and click the little plus sign to build a new task with the Advanced Model. Enter the URL and Octoparse will open the webpage with the built-in browser. We can interact and extract the data by clicking on the page.
First, click the player on the first row. Notice, that Octoparse parses the website into single elements. It found similar elements and highlight them in red.
This is great. Follow the Action Tip, and click “select all sub-element.” The entire row has been selected. Octoparse then will remind you that it found similar rows that are ready to be selected. Follow the guide, click “Select-All.”
Notice, all rows are now selected successfully and highlighted in green.
Next, click “Extract data in the loop”. Congratulations! You complete a crawler. [Download the crawler]
Last but not least, save the task and start extraction in your choices of extraction types. You can extract locally, on the cloud, or set a schedule. In this case, I highly recommend setting a schedule. The crawler will scrape the websites on a timely basis. So you are always keep updated.
Extracted data will be delivered in the form of structured formats, including Excel, txt, and JSON. Since we need to analyze the points, I export them into Excel, and it looks like this.
Web Scraping with Python
You can read the full Python work here. I broke down the process in a few steps:
- Browse to the desired page and copy the URL for later use.
- Use Python’s requests and bs4 (Beautiful Soup) packages to get the whole web page in HTML syntax format.123456789import reimport requestsdef get_html_data(url):response = requests.get(url)return BeautifulSoup(response.content, "html5lib")
- Examine the HTML code carefully to find where the data you want to extract is. In this case, we are looking for “TR” (Table Row)
- Locate what unique identifiers like href links, class names, table rows and table data surround the data you want.
- Try to extract different fields from a single row of data
- Go through a few trial and error iterations.
- Regulate the data formats (As we extract raw data, it is possible that the data looks funny with weird formats. You need to clean the character formats and make them consistent and readable.)
Web scraping sports projections are fast and easy. Yet, with a web scraping tool, you accomplish the entire process within simple clicks. I spent 1 hour reading up documentation of Beautiful Soup, experimenting how to locate the precise fields and writing Python code.
Yet, I spent less than 10 minutes to set up the extraction with Octoparse. The best part is that once you have the crawler in hand, you can set a schedule, and let itself automate the extractions.
For players, you can monitor different site sources at the same time by setting extraction crawlers much easier:
- CBS – Jamey Eisenberg
- CBS – Dave Richard
- CBS – Average
- FOX Sports
The more data you collect, the more comprehensive your analysis will be. Now, you will obtain first-hand data even before the news comes out!
Octoparse is having their best early Black Friday Deals with an extra 10% OFF everything on Nov.26th. They are definitely worth a few minutes of your time to check their products and deals.