How to scrape Google News

with the class 'vr1PYe'.Uses the .get_text(strip=True) method to cleanly extract text content.

Building absolute URLs: Converts relative URLs from the href attribute into absolute URLs by prepending https://news.google.com and stripping query parameters.

Error handling: Implements a try-except block to handle any exceptions during parsing, making sure that the script continues to process the remaining articles.

Limiting to 10 items: The loop stops after successfully extracting 10 news items, but you can customize it according to your needs.

Example news container

The code for scraping Google News is now ready to run. To print the extracted content and run the functions, make sure to include the following code:

if __name__ == "__main__":
    news = get_google_news()
    for idx, item in enumerate(news, 1):
        print(f"{idx}. {item['source']}: {item['headline']}")
        print(f"   Link: {item['link']}\n")

Once run, the above code would generate a clearly formatted output as shown in the following image:

Output of the Google News Scraper — Google News scraper output

4. Deploying to Apify

There are several reasons to deploy your scraper code to a platform like Apify. In this specific case, deploying code to Apify helps you automate the data scraping process efficiently by scheduling your web scraper to run daily or weekly. It also offers a convenient way to store and download your data.

To deploy your code to Apify, follow these steps:

#1. Create an account on Apify

#2. Create a new Actor

mkdir google-news-scraper
cd google-news-scraper

Then initialize the Actor by typing apify init.

#3. Create the main.py script

Note that you would have to make some changes to the previous script to make it Apify-friendly. Such changes are: importing Apify SDK and updating input/output handling.
You can find the modified script on GitHub.

#4. Create the Dockerfile and requirements.txt

FROM python:3.9-slim

# Set the working directory
WORKDIR /app

# Copy the requirements file
COPY requirements.txt ./

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code
COPY . .

# Set the entry point to your script
CMD ["python", "main.py"]

beautifulsoup4==4.9.3
requests==2.25.1
apify

#5. Deploy

Type apify login to log in to your account. You can either use the API token or verify through your browser.
Once logged in, type apify push and you’re good to go.

Once deployed, head over to Apify Console > Your Actor > Click “Start” to run the scraper on Apify.

Deploying Google News scraper to Apify — Deploying a Google News scraper to Apify

Once the run is successful, you can view the output of the scraper on the “Output” tab.

To view and download output, click “Export Output”.

Export Google News Scraper results with one click

You can select/omit sections and download the data in different formats such as CSV, JSON, Excel, etc.

Export your Google News data in multiple formats

You can also schedule your Actor by clicking the three dots (•••) in the top-right corner of the Actor dashboard > Schedule Actor option.

Scheduling the Google News Scraper Actor

The complete code

Here’s the complete script for building a Google News scraper with beautifulsoup.

import requests
from bs4 import BeautifulSoup

def get_google_news():
    url = "https://news.google.com/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGx1YlY4U0FtVnVHZ0pWVXlnQVAB?ceid=US:en&oc=3"
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
    }
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')

    news_items = []

    # Finding all story containers
    containers = soup.find_all('div', class_='W8yrY')

    # Scraping at least 10 headlines
    for container in containers:
        try:
            # Getting the primary article in each container
            article = container.find('article')
            if not article:
                continue

            # Extracting headline
            headline_elem = article.find('a', class_='gPFEn')
            headline = headline_elem.get_text(strip=True) if headline_elem else 'No headline'

            # Extracting source
            source_elem = article.find('div', class_='vr1PYe')
            source = source_elem.get_text(strip=True) if source_elem else 'Unknown source'

            # Extracting and converting the URLs
            relative_link = headline_elem['href'] if headline_elem else ''
            absolute_link = f'{relative_link.split("?")[0]}' if relative_link else ''

            news_items.append({
                'source': source,
                'headline': headline,
                'link': absolute_link
            })

            # Stop if you have 10 items
            if len(news_items) >= 10:
                break

        except Exception as e:
            print(f"Error processing article: {str(e)}")
            continue

    return news_items

# Running and printing the results
if __name__ == "__main__":
    news = get_google_news()
    for idx, item in enumerate(news, 1):
        print(f"{idx}. {item['source']}: {item['headline']}")
        print(f"   Link: {item['link']}\n")

Using a ready-made Google News Scraper

Although building your own program to scrape Google News sounds like a good option, there are many downsides to doing so – dealing with website blocking (as the website identifies your script as a bot and refuses crawling), and of course, hours of debugging and headaches.

The best way to avoid the above is to use a ready-made scraper on a platform like Apify. By doing so, you can not only scrape the Google news page but also scrape news related to certain search queries, schedule runs, and extract news items of a certain period of time.

For example, if you want to scrape news about DeepSeek in a certain time period, you can go to Google News Scraper on Apify > Try for free, and fill in the input fields below:

Scraping news from a certain query using a ready-made scraper

Once run, it’ll return the scraped information as follows:

Conclusion

In this tutorial, we built a program that can scrape Google News using Python’s beautifulsoup library and deployed it on Apify for easier management. We also showed you an easier way, which is to run an off-the-shelf Actor designed for the task: Google News Scraper. The first option gives you the flexibility to create the scraper you want. The second makes it easy to get data quickly and easily without building a scraper from scratch.

Frequently asked questions

Can you scrape Google News?

Yes, you can scrape Google News using 2 ways:

Building a scraper from scratch: However, in order to handle issues like dynamic content parsing, you’ll have to implement techniques like using headless requests.
Using ready-made scrapers on platforms like Apify.

Is it legal to scrape Google News?

Yes, it is legal to scrape Google News, as it is publicly available information. But you should be mindful of local and regional laws regarding copyright and personal data. If you want some solid guidance on the legality and ethics of web scraping, Is web scraping legal? is a comprehensive treatment of the subject.