How to find all links using DOM Crawler and PHP?

You can find all links using
DOM Crawler
and PHP by making use of either the
filter or the filterXPath method
. Below, you can find two code samples that demonstrate how to use either of these methods. The code uses
Guzzle
to load the
ScrapingBee website
so you may want to install that as well using Composer.

This example code uses the filter method:

use SymfonyComponentDomCrawlerCrawler;
use GuzzleHttpClient;

// Create a client to make the HTTP request
$client = new GuzzleHttpClient();
$response = $client->get('https://www.scrapingbee.com/');
$html = (string) $response->getBody();

// Load the HTML document
$crawler = new Crawler($html);

// Find all links on the page
$links = $crawler->filter('a');

// Loop over the links and print their href attributes
foreach ($links as $link) {
    echo $link->getAttribute('href') . PHP_EOL;
}

// Output:
// /
// https://app.scrapingbee.com/account/login
// https://app.scrapingbee.com/account/register
// /#pricing
// /#faq
// /blog/
// #
// /features/screenshot/
// /features/google/
// ...

This example code uses filterXPath method:

use SymfonyComponentDomCrawlerCrawler;
use GuzzleHttpClient;

// Create a client to make the HTTP request
$client = new GuzzleHttpClient();
$response = $client->get('https://www.scrapingbee.com/');
$html = (string) $response->getBody();

// Load the HTML document
$crawler = new Crawler($html);

// Find all links on the page
$links = $crawler->filterXPath('//a');

// Loop over the links and print their href attributes
foreach ($links as $link) {
    echo $link->getAttribute('href') . PHP_EOL;
}

// Output:
// /
// https://app.scrapingbee.com/account/login
// https://app.scrapingbee.com/account/register
// /#pricing
// /#faq
// /blog/
// #
// /features/screenshot/
// /features/google/
// ...

Related DOM Crawler web scraping questions:

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *