How to select values between two nodes in DOM Crawler and PHP?

You can select values between two nodes in
DOM Crawler
by using the
filterXPath method
with an XPath expression that selects the nodes between the two nodes you want to use as anchors.

Here is some sample code that prints the text content of all the nodes between the h1 and h2 nodes:

use SymfonyComponentDomCrawlerCrawler;

$html = <<<EOD
  

Paragraph 1

Paragraph 2

Header 2

Paragraph 3

EOD; // Load the HTML document $crawler = new Crawler($html); // Find all nodes between the h1 and h2 elements $nodesBetweenHeadings = $crawler->filterXPath('//h1/ following-sibling::h2/ preceding-sibling::*[ preceding-sibling::h1 ]'); // Loop over the nodes and print their text content foreach ($nodesBetweenHeadings as $node) { echo $node->textContent . PHP_EOL; }

The XPath expression used above can be read like this:

  1. //h1: Go to the h1 tag
  2. /following-sibling::h2: Go to the sibling h2 tag
  3. /preceding-sibling::*[preceding-sibling::h1]: Find all the preceding siblings of the h2 tag that have an h1 tag as a preceding sibling (* matches all tags)

Related DOM Crawler web scraping questions:

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *