Scraping Web Data Using Beautifulsoup
I am trying to scrape the rain chance and the temperature/wind speed for each baseball game from rotowire.com. Once I scrape the data, I will then convert it to three columns - rai
Solution 1:
To locate all the data, see this example:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www.rotowire.com/baseball/daily-lineups.php"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
weather = []
for tag in soup.select(".lineup__bottom"):
header = tag.find_previous(class_="lineup__teams").get_text(
strip=True, separator=" vs "
)
rain = tag.select_one(".lineup__weather-text > b")
forecast_info = rain.next_sibling.split()
temp = forecast_info[0]
wind = forecast_info[2]
weather.append(
{"Header": header, "Rain": rain.text.split()[0], "Temp": temp, "Wind": wind}
)
df = pd.DataFrame(weather)
print(df)
Output:
Header Rain Temp Wind
0 PHI vs CIN100% 66° 81 CWS vs CLE0% 64° 42 SD vs CHC 0% 69° 73 NYM vs ARI Dome In Stadium
4 MIN vs BAL 0% 75° 95 TB vs NYY 0% 68° 96 MIA vs TOR 0% 81° 67 WAS vs ATL 0% 81° 48 BOS vs HOU Dome In Stadium
9 TEX vs COL 0% 76° 610 STL vs LAD 0% 73° 411 OAK vs SEA Dome In Stadium
Solution 2:
You can locate the cards with the game info and find the weather data at the bottom (if it is present):
from bs4 import BeautifulSoup as soup
import requests, re, pandas as pd
d = soup(requests.get('https://www.rotowire.com/baseball/daily-lineups.php').text, 'html.parser')
r = [{'header':' vs '.join(k.get_text(strip=True) for k in i.select('div.lineup__teams div.lineup__team')),
'rain':(j:=i.select_one('.lineup__bottom .lineup__weather .lineup__weather-text')).contents[1].text,
'temperature':x[0] if (x:=re.findall('^\d+', j.contents[2].strip())) else'In Domed Stadium',
'wind':x[0] if (x:=re.findall('(?<=Wind\s)[\w\W]+', j.contents[2].strip())) else'In Domed Stadium'
}
for i in d.select('div.lineup.is-mlb div.lineup__box') if'is-tools' not in i.parent['class']]
df = pd.DataFrame(r)
print(df)
Output:
header rain temperature wind
0 PHI vs CIN100% Rain 668 mph In
1 CWS vs CLE0% Rain 644 mph L-R
2 SD vs CHC 0% Rain 697 mph In
3 NYM vs ARI Dome In Domed Stadium In Domed Stadium
4 MIN vs BAL 0% Rain 759 mph Out
5 TB vs NYY 0% Rain 689 mph R-L
6 MIA vs TOR 0% Rain 816 mph L-R
7 WAS vs ATL 0% Rain 814 mph R-L
8 BOS vs HOU Dome In Domed Stadium In Domed Stadium
9 TEX vs COL 0% Rain 766 mph
10 STL vs LAD 0% Rain 734 mph Out
11 OAK vs SEA Dome In Domed Stadium In Domed Stadium
Post a Comment for "Scraping Web Data Using Beautifulsoup"