How to scrape using Guzzle, Simple HTML Dom and anyIP.io?

In this quick tutorial, we will show you how to start to scrape any website using Guzzle (a PHP library) and using rotating proxies from anyIP.io.

Guzzle installation

The recommended way to install Guzzle is through Composer.

composer require guzzlehttp/guzzle

How to use Guzzle

Following the documentation, opening a page using Guzzle is pretty simple:

Try our premium proxy network.

Starting as low as 2$ per GB – cancel anytime.

Try It Now!
PHP
$client = new GuzzleHttpClient(); 
$res = $client->request('GET', 'https://www.example.com'); 
echo $res->getBody();

To use a proxy, you have to add a proxy parameter:

PHP
$res = $client->request("POST", "https://www.example.com", [ 
  "proxy" => "https://username:[email protected]", 
]);

How to parse the page?

The content of the page is in $res->getBody(). After checking that you actually got the correct result (the status code is 200, the content header is text or similar, etc.), you can start to parse the page. They are many options for this:

As a quick introduction to the scraping world, we will use Simple HTML Dom. After installing it and initialize it, you can simply use any CSS selector to retrieve the content of your choice:

PHP
$simpleHTMLDom = str_get_html($res->getBody()); 
$links = $simpleHTMLDom ->find('a');

Never get blocked again.

The refreshingly affordable, and remarkably reliable, proxy service.
Get access to millions of residential and mobile IPs