PHP Classes

What is the best PHP web content crawler class?: Extracting content by passing the URL of a web site

Recommend this page to a friend!
  All requests RSS feed  >  What is the best PHP web content craw...  >  Request new recommendation  >  A request is featured when there is no good recommended package on the site when it is posted. Featured requests  >  No recommendations No recommendations  

What is the best PHP web content crawler class?

Edit

Picture of Adeagbo Moruf Adedeji by Adeagbo Moruf Adedeji - 10 years ago (2015-01-09)

Extracting content by passing the URL of a web site

This request is clear and relevant.
This request is not clear or is not relevant.

+3

The class will extract the specified content and save it in a database by passing the URL of that web site until all the related content is extracted.

Ask clarification

2 Recommendations

Very simple page details: Parse and extract Web page information details

This class can parse and extract Web page information details.

It can retrieve a Web page from a given URL and parse it to extract details like:

- Page title
- Page head and body
- Meta tags
- Character set
- Links expanded to full path
- Images
- Page headers from H1 through H6
- Internal and external links checking if they are broken
- Page elements by class or id value
This recommendation solves the problem.
This recommendation does not solve the problem.

0

Picture of zinsou A.A.E.Moïse by zinsou A.A.E.Moïse package author package author Reputation 6835 - 7 years ago (2017-12-22) Comment

one may also need this...


PHP Scraper: Extract structured data from remote HTML pages

This class is meant to fetch remote HTML pages and parse them to extract structured information into arrays.

It can take a model of the definition of the structure of a given page and process it to clip the relevant fields of information.
This recommendation solves the problem.
This recommendation does not solve the problem.

0

Picture of Manuel Lemos by Manuel Lemos Reputation 26695 - 10 years ago (2015-01-21) Comment

It seems you want to scrape information from Web pages but in general the actual scraping configuration depends on the format of the pages you want to scrape.

This class can solve your problem by passing a model of the data you want to extract from the pages you want to scrape.

Adding the scrapped content to a database needs to be done by yourself with additional code as it depends a lot on what you want to store in the database.


Recommend package
: 
: