====== scrape Plugin ======

---- plugin ----
description: Include HTML parts from other website into the wiki
author     : Andreas Gohr 
email      : dokuwiki@cosmocode.de
type       : syntax
lastupdate : 2023-09-14
compatible : 
depends    : 
conflicts  : 
similar    : 
tags       : include, html, jquery

downloadurl: https://github.com/cosmocode/dokuwiki-plugin-scrape/zipball/master
bugtracker : https://github.com/cosmocode/dokuwiki-plugin-scrape/issues
sourcerepo : https://github.com/cosmocode/dokuwiki-plugin-scrape/
donationurl: 

screenshot_img : 
----

This plugin allows you to include HTML scraped from a different website. The part to include can be specified by a jQuery-like expression. To prevent abuse all HTML is purified against malicious code and only whitelisted URLs can be scraped.

===== Installation =====

[[https://www.cosmocode.de/en/open-source/dokuwiki-plugins/|{{ http://cosmocode.de/static/img/dokuwiki/dwplugins.png?recache|A CosmoCode Plugin}}]]

Search and install the plugin using the [[plugin:extension|Extension Manager]]. Refer to [[:Plugins]] on how to install plugins manually.

===== Configuration =====

All URLs that should be scrapable have to be defined through a regular expression in the config.

===== Syntax/Usage =====

The general syntax is:

  {{scrape>url query|title}}

  * **url** is the URL of the website you want to scrape. It must be matched by the regular expression given in the config.
  * **query** is the ''querySelectorAll()''-like CSS selector to select one or more page elements on the given website. When you end your query with a ''~'' the innerHTML of the match will be used, otherwise the matched wrapping element itself will be part of the output. When no query is given, ''body ~'' is used.
  * **title** is only used when your query matched the URL of an image file. In that case the image will be embedded and the given title be added. You can leave out the title.

Example:

  {{scrape>https://example.com p}}