In this quick tutorial, we will show you how to start to scrape any website using Guzzle (a PHP library) and using rotating proxies from anyIP.io, get your account now.
Guzzle installation
The recommended way to install Guzzle is through Composer.
How to use Guzzle
Following the documentation, opening a page using Guzzle is pretty simple:
To use a proxy, you have to add a proxy parameter:
How to parse the page?
The content of the page is in $res->getBody(). After checking that you actually got the correct result (the status code is 200, the content header is text or similar, etc.), you can start to parse the page. They are many options for this:
- Use a regex
- Use the DOM library from PHP
- Use Simple HTML Dom
- Use Ultimate Web Scraper
As a quick introduction to the scraping world, we will use Simple HTML Dom. After installing it and initialize it, you can simply use any CSS selector to retrieve the content of your choice: