Why Does Combining Selenium and BeautifulSoup Break Your Code?

Combining the browser automation of Selenium with the parsing speed of BeautifulSoup 4 is a highly effective pattern for scraping dynamic websites. However, beginners often run into frustrating bugs like NameError: name 'variable' is not defined or find that elements returned by find_all() suddenly come up empty.

If your BeautifulSoup code worked perfectly on its own but broke the moment you introduced Selenium, the issue usually boils down to two things: incorrect timing (asynchronous loading) and unstable dynamic class names. Let's break down exactly why this happens and how to fix it.

The Root Causes of the Problem

1. Taking the Page Source Snapshot Too Early

In your Selenium code, you likely wrote something like this:

Petdoc = BeautifulSoup(driver.page_source, "html.parser")
time.sleep(10)
Pettag2 = Petdoc.find_all(...)

This is the most common mistake when combining these two libraries. driver.page_source retrieves a static string snapshot of the HTML at the exact millisecond it is called. Sleeping after you capture the page source does not update your Petdoc variable. If the dynamic content (like prices) hasn't fully loaded when you call driver.page_source, BeautifulSoup will parse an incomplete page.

2. Why the NameError Occurs

In Python, if a loop does not execute even once, any variable initialized inside that loop will remain undefined. For example:

# If Pettag2 is empty, this loop is skipped entirely
for PetotherPrice in Pettag2:
    Petprice2 = float(PetotherPrice.text.split(' ')[0])

print(Petprice2) # Throws NameError: name 'Petprice2' is not defined

Because your find_all() returned an empty list (due to the timing issue above), the loop never ran, Petprice2 was never created, and printing it caused your program to crash.

3. Brittle, Generated Class Names

Websites built with modern frameworks (like React, Next.js, or Vue) often use CSS Modules or Tailwind CSS, which generate randomized class names like _text_j98bt_1 or _content-price_1xjd5_90. These hashes can change every time the site is updated or even on different sessions, making your hardcoded selectors highly unreliable.

---

The Solution: Best Practices and Refactored Code

To fix these issues, we need to:

  • Use Selenium's Explicit Waits (WebDriverWait) to wait for elements to load in the DOM before grabbing the page source.
  • Define fallback values for variables to avoid NameError crashes.
  • Use more robust CSS selectors (like partial class matching or structural selectors).

Here is the optimized, fully-functional refactored solution:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

pets = []

# Initialize Chrome WebDriver
driver = webdriver.Chrome()
try:
    driver.get("https://starpets.gg/adopt-me/shop/pet/hot_doggo/17996")
    
    # Use Explicit Waits instead of implicit waits or arbitrary time.sleep()
    wait = WebDriverWait(driver, 15)
    
    # 1. Accept Cookies if the button appears
    try:
        cookie_btn = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.cookie__button")))
        cookie_btn.click()
    except Exception:
        print("Cookie banner not found or already closed.")

    # 2. Click the Age Dropdown
    # Using partial class matching for stability
    age_dropdown = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@title="Select age"]')))
    age_dropdown.click()
    
    # 3. Select "Newborn"
    newborn_option = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[text()="Newborn"]')))
    newborn_option.click()
    
    # CRITICAL: Wait for the dynamic price elements to load before grabbing the page source
    # We wait until at least one price container is visible
    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "[class*='_content-price_']")))
    
    # Now that we are sure the page is updated, capture the page source
    soup = BeautifulSoup(driver.page_source, "html.parser")
    
    # Extract best price
    pettag1 = soup.select("p[class*='_info__price__count'], span.itemprop.price")
    Petprice1 = None
    for PetbestPrice in pettag1:
        try:
            Petprice1 = float(PetbestPrice.text.replace("$", "").strip())
            pets.append(Petprice1)
            print("The best price is:", Petprice1)
        except ValueError:
            continue
            
    # Extract other prices using partial class matching to avoid brittle selectors
    # class*='_content-price_' matches any class containing that substring
    pettag2 = soup.select("div[class*='_content-price_']")
    
    Petprice2 = None # Initialize variable to prevent NameError
    for PetotherPrice in pettag2:
        try:
            # Safely parse the price
            raw_price = PetotherPrice.text.split(' ')[0].replace("$", "").strip()
            Petprice2 = float(raw_price)
            pets.append(Petprice2)
            print("Other price found:", Petprice2)
        except (ValueError, IndexError):
            continue

    # Print final results
    if not pets:
        print("No prices were successfully parsed.")
    else:
        print("All collected prices:", pets)

finally:
    driver.quit()

Key Improvements Explained

  • Explicit Waits (WebDriverWait): Instead of guessing how many seconds a page needs to load with time.sleep(), WebDriverWait dynamically pauses execution until the specific element is rendered. This makes your scraper both faster and significantly less prone to timing errors.
  • Partial Class Attribute Selectors: By using Beautiful Soup's CSS selector div[class*='_content-price_'], we look for any class containing the substring _content-price_. This ignores the dynamic hashes that change during site builds.
  • Variable Initialization: Declaring Petprice2 = None before the loop guarantees that the variable exists even if the loop finds zero items, preventing unexpected NameError crashes.