Custom parsers
Last updated
Last updated
You can create custom advanced parsers if you have the necessary technical skills. Note that this section is intended for developers. We don't provide support for customization tasks, but we do offer services on custom advanced parsers development.
Before getting started with the new parser, we recommend enabling debug mode for the External Importer. To do so, add the following line to your wp-config.php:
In this mode, some helpers will be enabled to debug your parser:
Parsing task quick restart button
Names of parsers used
Raw product data
In debug mode, caching of all requests to external websites is enabled. It allows External Importer to skip making many new requests when debugging the parser. You can also check the page returned to the plugin bot.
All temporary cache files are stored in wp-content/uploads/ei-debug/
.
After the parser is finished, remember to disable debug mode and manually delete wp-content/uploads/ei-debug/
directory with all temporary files.
All custom parsers must be stored in wp-content/ei-parsers/
. Create this directory if you don't have one. Never directly edit plugin files. Otherwise, you risk losing all changes with the next plugin update.
Let's say we want to create a parser for https://www.example.com.
1. Create a file named ExamplecomAdvanced.php in wp-content/ei-parsers/
.
2. Create a class named ExamplecomAdvanced that follows AdvancedParser.
3. Use namespace, enter the domain name in the comments block, as shown in the following example.
4. Now you need to implement methods to extract product data from the target website:
parseLinks()
parsePagination()
parseTitle()
parseDescription()
parsePrice()
parseOldPrice()
parseImage()
parseImages()
parseManufacturer()
parseInStock()
parseCategoryPath()
parseCurrencyCode()
parseFeatures()
parseReviews()
We recommend studying the implementation of the ready parsers in external-importer/application/libs/pextractor/parser/advanced/parsers/
In most cases, you'll use XPath requests:
If you need to use regular expressions, this is how to access the HTML source:
Advanced parsers have priority when extracting data, but you can implement only a few methods if some data can be extracted through default structured parsers.