How to Fix NameError and Missing Elements When Combining Selenium and BeautifulSoup
Why Does Combining Selenium and BeautifulSoup Break Your Code?
Combining the browser automation of Selenium with the parsing speed of BeautifulSoup 4 is a highly effective pattern for scraping dynamic websites. However, beginners often run into frustrating bugs like NameError: name 'variable' is not defined or find that elements returned by find_all() suddenly come up empty.
If your BeautifulSoup code worked perfectly on its own but broke the moment you introduced Selenium, the issue usually boils down to two things: incorrect timing (asynchronous loading) and unstable dynamic class names. Let's break down exactly why this happens and how to fix it.
The Root Causes of the Problem
1. Taking the Page Source Snapshot Too Early
In your Selenium code, you likely wrote something like this:
Petdoc = BeautifulSoup(driver.page_source, "html.parser")
time.sleep(10)
Pettag2 = Petdoc.find_all(...)
This is the most common mistake when combining these two libraries. driver.page_source retrieves a static string snapshot of the HTML at the exact millisecond it is called. Sleeping after you capture the page source does not update your Petdoc variable. If the dynamic content (like prices) hasn't fully loaded when you call driver.page_source, BeautifulSoup will parse an incomplete page.
2. Why the NameError Occurs
In Python, if a loop does not execute even once, any variable initialized inside that loop will remain undefined. For example:
# If Pettag2 is empty, this loop is skipped entirely
for PetotherPrice in Pettag2:
Petprice2 = float(PetotherPrice.text.split(' ')[0])
print(Petprice2) # Throws NameError: name 'Petprice2' is not defined
Because your find_all() returned an empty list (due to the timing issue above), the loop never ran, Petprice2 was never created, and printing it caused your program to crash.
3. Brittle, Generated Class Names
Websites built with modern frameworks (like React, Next.js, or Vue) often use CSS Modules or Tailwind CSS, which generate randomized class names like _text_j98bt_1 or _content-price_1xjd5_90. These hashes can change every time the site is updated or even on different sessions, making your hardcoded selectors highly unreliable.
The Solution: Best Practices and Refactored Code
To fix these issues, we need to:
- Use Selenium's Explicit Waits (
WebDriverWait) to wait for elements to load in the DOM before grabbing the page source. - Define fallback values for variables to avoid
NameErrorcrashes. - Use more robust CSS selectors (like partial class matching or structural selectors).
Here is the optimized, fully-functional refactored solution:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
pets = []
# Initialize Chrome WebDriver
driver = webdriver.Chrome()
try:
driver.get("https://starpets.gg/adopt-me/shop/pet/hot_doggo/17996")
# Use Explicit Waits instead of implicit waits or arbitrary time.sleep()
wait = WebDriverWait(driver, 15)
# 1. Accept Cookies if the button appears
try:
cookie_btn = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.cookie__button")))
cookie_btn.click()
except Exception:
print("Cookie banner not found or already closed.")
# 2. Click the Age Dropdown
# Using partial class matching for stability
age_dropdown = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@title="Select age"]')))
age_dropdown.click()
# 3. Select "Newborn"
newborn_option = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[text()="Newborn"]')))
newborn_option.click()
# CRITICAL: Wait for the dynamic price elements to load before grabbing the page source
# We wait until at least one price container is visible
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "[class*='_content-price_']")))
# Now that we are sure the page is updated, capture the page source
soup = BeautifulSoup(driver.page_source, "html.parser")
# Extract best price
pettag1 = soup.select("p[class*='_info__price__count'], span.itemprop.price")
Petprice1 = None
for PetbestPrice in pettag1:
try:
Petprice1 = float(PetbestPrice.text.replace("$", "").strip())
pets.append(Petprice1)
print("The best price is:", Petprice1)
except ValueError:
continue
# Extract other prices using partial class matching to avoid brittle selectors
# class*='_content-price_' matches any class containing that substring
pettag2 = soup.select("div[class*='_content-price_']")
Petprice2 = None # Initialize variable to prevent NameError
for PetotherPrice in pettag2:
try:
# Safely parse the price
raw_price = PetotherPrice.text.split(' ')[0].replace("$", "").strip()
Petprice2 = float(raw_price)
pets.append(Petprice2)
print("Other price found:", Petprice2)
except (ValueError, IndexError):
continue
# Print final results
if not pets:
print("No prices were successfully parsed.")
else:
print("All collected prices:", pets)
finally:
driver.quit()
Key Improvements Explained
- Explicit Waits (
WebDriverWait): Instead of guessing how many seconds a page needs to load withtime.sleep(),WebDriverWaitdynamically pauses execution until the specific element is rendered. This makes your scraper both faster and significantly less prone to timing errors. - Partial Class Attribute Selectors: By using Beautiful Soup's CSS selector
div[class*='_content-price_'], we look for any class containing the substring_content-price_. This ignores the dynamic hashes that change during site builds. - Variable Initialization: Declaring
Petprice2 = Nonebefore the loop guarantees that the variable exists even if the loop finds zero items, preventing unexpectedNameErrorcrashes.