How to scrape Google News

 with the class 'vr1PYe'.Uses the .get_text(strip=True) method to cleanly extract text content.

  • Building absolute URLs: Converts relative URLs from the href attribute into absolute URLs by prepending https://news.google.com and stripping query parameters.
  • Error handling: Implements a try-except block to handle any exceptions during parsing, making sure that the script continues to process the remaining articles.
  • Limiting to 10 items: The loop stops after successfully extracting 10 news items, but you can customize it according to your needs.
  • Example news container

    The code for scraping Google News is now ready to run. To print the extracted content and run the functions, make sure to include the following code:

    if __name__ == "__main__":
        news = get_google_news()
        for idx, item in enumerate(news, 1):
            print(f"{idx}. {item['source']}: {item['headline']}")
            print(f"   Link: {item['link']}\n")
    
    

    Once run, the above code would generate a clearly formatted output as shown in the following image:

    Output of the Google News Scraper
    Google News scraper output

    4. Deploying to Apify

    There are several reasons to deploy your scraper code to a platform like Apify. In this specific case, deploying code to Apify helps you automate the data scraping process efficiently by scheduling your web scraper to run daily or weekly. It also offers a convenient way to store and download your data.

    To deploy your code to Apify, follow these steps:

    #1. Create an account on Apify

    #2. Create a new Actor

    mkdir google-news-scraper
    cd google-news-scraper
    
      • Then initialize the Actor by typing apify init.

    #3. Create the main.py script

      • Note that you would have to make some changes to the previous script to make it Apify-friendly. Such changes are: importing Apify SDK and updating input/output handling.
      • You can find the modified script on GitHub.

    #4. Create the Dockerfile and requirements.txt

    FROM python:3.9-slim
    
    # Set the working directory
    WORKDIR /app
    
    # Copy the requirements file
    COPY requirements.txt ./
    
    # Install Python dependencies
    RUN pip install --no-cache-dir -r requirements.txt
    
    # Copy the rest of the application code
    COPY . .
    
    # Set the entry point to your script
    CMD ["python", "main.py"]
    
    
    beautifulsoup4==4.9.3
    requests==2.25.1
    apify
    

    #5. Deploy

        • Type apify login to log in to your account. You can either use the API token or verify through your browser.
        • Once logged in, type apify push and you’re good to go.

      Once deployed, head over to Apify Console > Your Actor > Click “Start” to run the scraper on Apify.

      Deploying Google News scraper to Apify
      Deploying a Google News scraper to Apify

      Once the run is successful, you can view the output of the scraper on the “Output” tab.

      To view and download output, click “Export Output”.

      Export Google News Scraper results with one click
      Export Google News Scraper results with one click

      You can select/omit sections and download the data in different formats such as CSV, JSON, Excel, etc.

      Export your Google News data in multiple formats
      Export your Google News data in multiple formats

      You can also schedule your Actor by clicking the three dots (•••) in the top-right corner of the Actor dashboard > Schedule Actor option.

      Scheduling the Google News Scraper Actor
      Scheduling the Google News Scraper Actor

      The complete code

      Here’s the complete script for building a Google News scraper with beautifulsoup.

      import requests
      from bs4 import BeautifulSoup
      
      def get_google_news():
          url = "https://news.google.com/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGx1YlY4U0FtVnVHZ0pWVXlnQVAB?ceid=US:en&oc=3"
          headers = {
              'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
          }
          response = requests.get(url, headers=headers)
          soup = BeautifulSoup(response.content, 'html.parser')
      
          news_items = []
      
          # Finding all story containers
          containers = soup.find_all('div', class_='W8yrY')
      
          # Scraping at least 10 headlines
          for container in containers:
              try:
                  # Getting the primary article in each container
                  article = container.find('article')
                  if not article:
                      continue
      
                  # Extracting headline
                  headline_elem = article.find('a', class_='gPFEn')
                  headline = headline_elem.get_text(strip=True) if headline_elem else 'No headline'
      
                  # Extracting source
                  source_elem = article.find('div', class_='vr1PYe')
                  source = source_elem.get_text(strip=True) if source_elem else 'Unknown source'
      
                  # Extracting and converting the URLs
                  relative_link = headline_elem['href'] if headline_elem else ''
                  absolute_link = f'{relative_link.split("?")[0]}' if relative_link else ''
      
                  news_items.append({
                      'source': source,
                      'headline': headline,
                      'link': absolute_link
                  })
      
                  # Stop if you have 10 items
                  if len(news_items) >= 10:
                      break
      
              except Exception as e:
                  print(f"Error processing article: {str(e)}")
                  continue
      
          return news_items
      
      # Running and printing the results
      if __name__ == "__main__":
          news = get_google_news()
          for idx, item in enumerate(news, 1):
              print(f"{idx}. {item['source']}: {item['headline']}")
              print(f"   Link: {item['link']}\n")
      
      

      Using a ready-made Google News Scraper

      Although building your own program to scrape Google News sounds like a good option, there are many downsides to doing so – dealing with website blocking (as the website identifies your script as a bot and refuses crawling), and of course, hours of debugging and headaches.

      The best way to avoid the above is to use a ready-made scraper on a platform like Apify. By doing so, you can not only scrape the Google news page but also scrape news related to certain search queries, schedule runs, and extract news items of a certain period of time.

      For example, if you want to scrape news about DeepSeek in a certain time period, you can go to Google News Scraper on Apify > Try for free, and fill in the input fields below:

      Scraping news from a certain query using a ready-made scraper
      Scraping news from a certain query using a ready-made scraper

      Once run, it’ll return the scraped information as follows:

      Results from Google News Scraper run
      Results from Google News Scraper run

      Conclusion

      In this tutorial, we built a program that can scrape Google News using Python’s beautifulsoup library and deployed it on Apify for easier management. We also showed you an easier way, which is to run an off-the-shelf Actor designed for the task: Google News Scraper. The first option gives you the flexibility to create the scraper you want. The second makes it easy to get data quickly and easily without building a scraper from scratch.

      Frequently asked questions

      Can you scrape Google News?

      Yes, you can scrape Google News using 2 ways:

      1. Building a scraper from scratch: However, in order to handle issues like dynamic content parsing, you’ll have to implement techniques like using headless requests.
      2. Using ready-made scrapers on platforms like Apify.

      Yes, it is legal to scrape Google News, as it is publicly available information. But you should be mindful of local and regional laws regarding copyright and personal data. If you want some solid guidance on the legality and ethics of web scraping, Is web scraping legal? is a comprehensive treatment of the subject.

      How to scrape Google News?

      To scrape Google News, you can use Apify’s Google News Scraper. It allows you to scrape metadata from Google News, such as headlines, images, and URLs, while also supporting query-based search options.

    Comments

    No comments yet. Why don’t you start the discussion?

    Leave a Reply

    Your email address will not be published. Required fields are marked *