Yes, you can use XPath selectors in
DOM Crawler
. Here is some sample code that uses
Guzzle
to load the
ScrapingBee website
and then uses DOM Crawler’s
filterXPath
method
to extract and print the text content of the h1
tag:
use SymfonyComponentDomCrawlerCrawler;
use GuzzleHttpClient;
// Create a client to make the HTTP request
$client = new GuzzleHttpClient();
$response = $client->get('https://www.scrapingbee.com/');
$html = (string) $response->getBody();
// Load the HTML document
$crawler = new Crawler($html);
// Find the first h1 element on the page
$h1 = $crawler->filterXPath('//h1[1]');
// Get the text content of the h1 element
$text = $h1->text();
// Print the text content
echo $text;
// Output:
// "Tired of getting blocked while scraping the web?"
If you do not want to use Guzzle, take a look at this sample code that directly passes in an HTML string:
use SymfonyComponentDomCrawlerCrawler;
use GuzzleHttpClient;
$html = <<<EOD
Example Page
This is an example page.
EOD;
// Load the HTML document
$crawler = new Crawler($html);
// Find the first h1 element on the page
$h1 = $crawler->filterXPath('//h1[1]');
// Get the text content of the h1 element
$text = $h1->text();
// Print the text content
echo $text;
// Output:
// "Hello, world!"