How to scrape using Guzzle, Simple HTML Dom and AnyIP.io?

globalisation, internet, communication

In this quick tutorial, we will show you how to start to scrape any website using Guzzle (a PHP library) and using rotating proxies from AnyIP.io.

Guzzle instalation

The recommended way to install Guzzle is through Composer.

composer require guzzlehttp/guzzle

How to use Guzzle

Following the documentation, open a page using Guzzle is pretty simple:

$client = new GuzzleHttp\Client();
$res = $client->request('GET', 'https://www.example.com');
echo $res->getBody();

To use a proxy, you have to add a proxy parameter:

$res = $client->request("POST", "https://www.example.com", [
    "proxy" => "http://username:[email protected]",
]);

How to parse the page?

The content of the page is in $res->getBody(). After checking that you actually got the correct result (the status code is 200, the content header is text or similar, etc.), you can start to parse the page. They are many options for this:

As a quick introduction to the scraping world, we will use Simple HTML Dom. After installing it and initiliaze it, you can simply use any CSS selector to retrieve the content of your choice:

$simpleHTMLDom = str_get_html($res->getBody());
$links = $simpleHTMLDom ->find('a');

Scroll to Top
Ready to go?

Get Instant Access to 30+ Million Residential Proxies, anywhere world-wide

By clicking “Start now”, i agree to Terms and Conditions & Privacy Policy