What is the best PHP web content crawler class? #web content crawler
Edit
by Adeagbo Moruf Adedeji - 10 years ago (2015-01-09)
Extracting content by passing the URL of a web site
| The class will extract the specified content and save it in a database by passing the URL of that web site until all the related content is extracted. |
Ask clarification
2 Recommendations
This class can parse and extract Web page information details.
It can retrieve a Web page from a given URL and parse it to extract details like:
- Page title
- Page head and body
- Meta tags
- Character set
- Links expanded to full path
- Images
- Page headers from H1 through H6
- Internal and external links checking if they are broken
- Page elements by class or id value
| by zinsou A.A.E.Moïse package author 6835 - 7 years ago (2017-12-22) Comment
one may also need this... |
PHP Scraper: Extract structured data from remote HTML pages
This class is meant to fetch remote HTML pages and parse them to extract structured information into arrays.
It can take a model of the definition of the structure of a given page and process it to clip the relevant fields of information.
| by Manuel Lemos 26695 - 10 years ago (2015-01-21) Comment
It seems you want to scrape information from Web pages but in general the actual scraping configuration depends on the format of the pages you want to scrape.
This class can solve your problem by passing a model of the data you want to extract from the pages you want to scrape.
Adding the scrapped content to a database needs to be done by yourself with additional code as it depends a lot on what you want to store in the database. |