Can I use XPath selectors in DOM Crawler?

Yes, you can use XPath selectors in
DOM Crawler
. Here is some sample code that uses
Guzzle
to load the
ScrapingBee website
and then uses DOM Crawler’s
filterXPath method
to extract and print the text content of the h1 tag:

use SymfonyComponentDomCrawlerCrawler;
use GuzzleHttpClient;

// Create a client to make the HTTP request
$client = new GuzzleHttpClient();
$response = $client->get('https://www.scrapingbee.com/');
$html = (string) $response->getBody();

// Load the HTML document
$crawler = new Crawler($html);

// Find the first h1 element on the page
$h1 = $crawler->filterXPath('//h1[1]');

// Get the text content of the h1 element
$text = $h1->text();

// Print the text content
echo $text; 

// Output: 
// "Tired of getting blocked while scraping the web?"

If you do not want to use Guzzle, take a look at this sample code that directly passes in an HTML string:

use SymfonyComponentDomCrawlerCrawler;
use GuzzleHttpClient;

$html = <<<EOD



	Example Page


	
	This is an example page.


EOD;

// Load the HTML document
$crawler = new Crawler($html);

// Find the first h1 element on the page
$h1 = $crawler->filterXPath('//h1[1]');

// Get the text content of the h1 element
$text = $h1->text();

// Print the text content
echo $text; 

// Output: 
// "Hello, world!"

Related DOM Crawler web scraping questions:

Comments

Leave a Reply Cancel reply