Differences

This shows you the differences between two versions of the page.

--- plugin:aichat [2024-01-29 17:07] – [General Options] andi
+++ plugin:aichat [2024-05-16 23:50] (current) – version upped andi
@@ Line 6: / Line 6: @@
 email      : dokuwiki@cosmocode.de
 type       : action, helper, syntax, CLI
-lastupdate : 2023-12-09
+lastupdate : 2024-05-16
-compatible : Jack Jackrum
+compatible : Jack Jackrum, Kaos
 depends    : sqlite
 conflicts  :
 similar    :
-tags       : !experimental, openai, chatgpt, llm, ai
+tags       : !experimental, openai, chatgpt, llm, ai, mistral, anthropic
 downloadurl: https://github.com/cosmocode/dokuwiki-plugin-aichat/zipball/master
@@ Line 17: / Line 17: @@
 sourcerepo : https://github.com/cosmocode/dokuwiki-plugin-aichat/
 donationurl:
+updatemessage: Configuration changed. Read docs before upgrading!
 screenshot_img : plugin:aichat.png
@@ Line 23: / Line 25: @@
 [[https://www.cosmocode.de/en/open-source/dokuwiki-plugins/|{{ https://www.cosmocode.de/static/img/dokuwiki/dwplugins.png?recache|A CosmoCode Plugin}}]]
-This plugin adds the ability to use a "artificial intelligence" chat bot to your wiki. It uses OpenAI's ChatGPT API to interact with the natural language questions of the user. The Plugin will provide the bot with context extracted from your wiki pages, so the bot can answer questions directly related to your content.
+This plugin adds the ability to use a Large Language Model (LLM) often called "artificial intelligence" (AI) to chat with a bot in your wiki. It uses OpenAI's ChatGPT or other APIs to interact with the natural language questions of the user. The Plugin will provide the bot with context extracted from your wiki pages, so the bot can answer questions directly related to your content. This is mechanism is also known as Retrieval-Augmented Generation (RAG).
-You can read an [[https://forum.dokuwiki.org/d/21178-experimental-plugin-aichat-chat-with-your-wiki-pages-using-chatgpt|introductionary post about]] how this plugin works in the forum.
+You can read an [[https://forum.dokuwiki.org/d/21178-experimental-plugin-aichat-chat-with-your-wiki-pages-using-chatgpt|introductionary post about]] how this plugin works in the forum. There's also a blog post for more background information on how the [[https://www.splitbrain.org/blog/2023-08/15-using_sqlite_as_vector_store_in_php|clustering in sqlite]] works.
 ===== Installation =====
@@ Line 31: / Line 33: @@
 :!: This plugin is experimental. Implementation details may change and require manual intervention on future updates.
-:!: The use of this plugin creates [[https://openai.com/pricing|OpenAI API costs]]. Be sure to be aware of their pricing details and your usage patterns
+:!: The use of this plugin creates API costs. Be sure to be aware of the pricing details for your selected API providers and your usage patterns
 :!: This plugin requires PHP 8.1 or higher. Installing it on systems with lower versions will "crash" your wiki.
@@ Line 47: / Line 49: @@
 ===== Configuration =====
+If you need help with installing, configuring and fine tuning this plugin, feel free to [[https://www.cosmocode.de/en/services/wiki/dokuwiki-ai/|contact us]] for a quote.
-==== OpenAI ====
+Use the [[config|Configuration Manager]] to adjust the settings described below.
-You need to create an [[https://platform.openai.com/account/api-keys|OpenAI API key]].
-Configure the key in the [[plugin:config|Configuration Manager]]. Optionally you can set your organization's ID here as well.
+==== 🧠 Models ====
-You can select the AI model used here as well. Currently GPT-3.5 models with 4k and 16k token limits and GPT-4 with a 8k token limit can be used. The default model is the cheapest one. Refer to [[https://openai.com/pricing|OpenAI's Pricing page]] for details.
+Models are used to create embeddings (vectors describing your content), rephrasing and to actually answer the question. The plugin currently supports models provided by different providers. You need to configure their respective API keys and other credentials to use them. Of course you only need API keys for those providers whose models you want to use.
-==== Vector Storage ====
-The plugin needs to store embedding vectors to do semantic similarity searches. There are multiple options available, you only need to configure one.
+  * [[https://platform.openai.com/account/api-keys|OpenAI API keys]]
+  * [[https://console.anthropic.com/settings/keys|Anthropic API keys]]
+  * [[https://console.mistral.ai/api-keys/|Mistral API keys]]
+  * [[https://dash.voyageai.com/api-keys|Voyage API keys]]
+  * [[https://platform.reka.ai/apikeys|Reka API keys]]
+  * [[https://console.groq.com/keys|Groq API keys]]
-=== SQLite Storage ===
+You can use different models for embeddings, rephrasing and chat. When changing the embedding model you need to rebuild the vector storage (see below). The rephrasing model is used to interpret a given question  when selecting possible source documents from your wiki.
-This is the default mechanism. It uses the [[sqlite]] plugin to store vectors in a SQLite database. The more wiki pages you have, the longer does it take to search through all available embeddings. To mitigate this, this storage mechanism uses a clustering approach, separating the similar page chunks into clusters and then only search the nearest cluster when looking for similar chunks.
+An overview over prices, properties and a short description of each model can be seen when running the ''models'' CLI command. Prices shown here may not be correct! Always check with the model provider's pricing list!
-Clusters are automatically created when the embedding index is built the first time. However, when your wiki grows a lot, topics change drastically or you add a new language, the initial clusters may not longer be a good fit. In that case you should run the ''maintenance'' command to rebuild the clusters. See below for the command line usage.
+Which model is the best for your use case depends on various factors and requires some experimentation. Contact CosmCode for help with picking and configuring the right solution for you.
-=== PineCone Storage ===
+==== 📥 Vector Storage ====
-This mechanism uses the [[https://www.pinecone.io/|PineCone]] vector database service to store embeddings. It is the more scalable choice and should be used when using the plugin on larger wikis.
+The plugin needs to store embedding vectors to do semantic similarity searches. There are multiple options available, you only need to configure one.
-Their free tier should be enough to get you started.
+The default uses a local SQLite database. This does not need any additional configuration. However, the more wiki pages you have, the longer does it take to search through all available embeddings. To mitigate this, this storage mechanism uses a clustering approach, separating the similar page chunks into clusters and then only search the nearest cluster when looking for similar chunks.
-  * Sign up at [[https://www.pinecone.io/|pinecone.io]]
+Clusters are automatically created when the embedding index is built the first time. However, when your wiki grows a lot, topics change drastically or you add a new language, the initial clusters may not longer be a good fit. In that case you should run the ''maintenance'' CLI command to rebuild the clusters.
-  * Create a new index
-    * Use a dimension of ''1536'' with a ''cosine'' metric
-  * The list of indexes will give you the ''pinecone_baseurl'' setting. It should look like ''https://myname-something.svc.gcp-starter.pinecone.io'' or similar.
-  * Under API Keys you find the key to use for the ''pinecone_apikey'' setting
+The other storage options are dedicated vector storage solutions, that each need their own configuration:
-Once you set the ''pinecone_baseurl'', the plugin will use Pinecone for storage. Continue with the initial setup below.
+  * [[https://www.pinecone.io/|PineCone]] a SaaS vector database with a free tier. After signing up, you need to create an index and give it a dimension matching your embed model (see ''models'' command) eg. ''1536''. As metric select ''cosine''. Your ''pinecone_baseurl'' can be found in the list of indexes. It should look like ''https://myname-something.svc.gcp-starter.pinecone.io'' or similar.
+  * [[https://www.trychroma.com/|Chroma]] is a vector storage you can host yourself. Chroma has to be run in [[https://docs.trychroma.com/deployment|Server-Mode]].
+  * [[https://qdrant.tech/|Qdrant]] can either be self-hosted or used as SaaS with a free tier. After "creating an API", the shown curl snippet contains your baseurl and apikey
-=== Chroma Storage ===
+==== Finetuning ====
-[[https://www.trychroma.com/|Chroma]] is a vector storage you can host yourself. You need to run at least version ''0.4.16''. Chroma has to be run in [[https://docs.trychroma.com/deployment|Server-Mode]].
+The plugin offers a few options to fine tune how to store vector data and talk to the LLM.
-Once you configure a ''chroma_baseurl'', Chroma will be used as storage backend. Continue with the initial setup below.
+  * chunkSize - The maximum size (in tokens) of chunks your wiki pages will be split into.
+  * similarityThreshold - A minimum percentage that documents need to match a given question. Different embedding models return different similarity distances. Increase this value if the bot uses too many irrelevant sources, lower the value if the bot finds too few relevant sources.
+  * contextChunks - the maximum number of chunks to use as context when answering question. More chunks also mean larger context and higher cost!
-=== Qdrant Storage ===
+  * rephraseHistory - how many question/answer pairs of the previous conversation should be taken into account when interpreting the given question for document retrieval? More history means larger context and higher cost!
+  * chatHistory - how many question/answer pairs of the previous conversation should be sent as additional context when answering questions? More history means larger context and higher cost!
-[[https://qdrant.tech/|Qdrant]] is another self-hostable storage.
-Alternatively there is a free hosted trial available:
-  * Sign up at https://cloud.qdrant.io/
-  * Create a cluster
-    * keep all sliders to the left for the free tier
-  * Create an API key for your cluster
-    * the curl snippet displayed shows you your baseurl and apikey
-Once you configure a ''qdrant_baseurl'', Qdrant will be used as storage backend. Continue with the initial setup below.
@@ Line 135: / Line 129: @@
 All chat conversations can optionally be logged to DokuWiki's logging facilities by enabling the ''logging'' option. They will show up as a new tab in the [[plugin:logviewer|LogViewer]].
-Access to the chat functionality can be restricted to users and groups wiht the ''restrict'' option. Simply list them comma separated. Prefix groups with an ''@'' as usual. By default, everyone may access.
+Access to the chat functionality can be restricted to users and groups with the ''restrict'' option. Simply list them comma separated. Prefix groups with an ''@'' as usual. By default, everyone may access.
 ===== Initial Setup =====
-Once the plugin has been configured, the embeddings store has to be initialized. This is done via the command line interface.
+Once the plugin has been configured, the vector store has to be initialized. This is done via the command line interface.
@@ Line 147: / Line 141: @@
 Updates to the storage can be done by rerunning the command.
-Depending on the size of your wiki this can take a while. Whenever you change the model (or in some cases when updates to the inner workings of the plugin are made) you will need to completely clear and rebuild the storage. Use the clear parameter for that:
+Depending on the size of your wiki this can take a while. Whenever you change certain settings you will need to completely clear and rebuild the storage. Settings that require a reindex are marked with 🔄 in the config manager. Use the clear parameter for that:
    php bin/plugin.php aichat embed --clear
@@ Line 226: / Line 220: @@
   php bin/plugin.php aichat info
-Shows a few statistics on the vector store.
+Shows a few statistics on the currently configured vector store and models.
   php bin/plugin.php aichat maintenance
@@ Line 238: / Line 232: @@
   * Does this work on multilingual content?
-    * It seem to work just fine. ChatGPT can work with many languages and will automatically match the user's question language to content in the same language.
+    * It seem to work just fine. Most LLMs can work with many languages and will automatically match the user's question language to content in the same language.
   * Can I adjust the prompts used?
-    * Yes you'll find them in the ''lang/en/prompt_*.txt'' files. Refer to [[https://www.dokuwiki.org/localization#changing_some_localized_texts_and_strings_in_your_installation|Localizing Texts]] on how to override them
+    * Yes you'll find them in the ''lang/en/*.prompt'' files. Refer to [[https://www.dokuwiki.org/localization#changing_some_localized_texts_and_strings_in_your_installation|Localizing Texts]] on how to override them
   * Will the chat bot hallucinate?
     * Yes. Even though the prompt asks the model not to, it has a tendency to state things with confidence that are not entirely true. Remember this is not real intelligence or understanding. It is just clever text prediction. However as long as the question covers a topic that does have an answer in your wiki, the results are generally pretty good.
   * Will the bot answer questions that are unrelated to the wiki contents?
-    * That might happen. No answers will be given if no fitting sources can be found at all -- a similarity cut-off is used. But often sources are found that are just slightly above the cut-off but do not answer the question. The bot will then fall back to its ChatGPT roots and happily answer completely off-topic questions. So far attempts to avoid this with prompting have failed.
+    * That might happen. No answers will be given if no fitting sources can be found at all -- check the ''similarityThreshold'' option. But often sources are found that are just slightly above the cut-off but do not answer the question. The bot will then fall back to its LLM roots and happily answer completely off-topic questions. So far attempts to avoid this with prompting have failed.
-  * Can I use a different model? Like Bard? Or even running my own?
+  * Can I use a different model? Or even running my own?
-    * Models can be added relatively easy (as long as there is an embedding mechanism). Again, contact us and we can work something out.
+    * Models can be added relatively easy. Again, contact us and we can work something out.
+  * The bot seems to never find any sources?
+    * Check the ''similarityThreshold'' setting. You might to lower it for some models.
 ===== Development =====
-Developers could use this plugin in their own plugins. Via the helper component you get access to an implementation of AbstractModel which will give you a Client to interact with the current model provider (OpenAI only currently). You also get access to the Embedding object which lets you do similarity searches on the wiki content.
+Developers could use this plugin in their own plugins. Via the helper component you get access to an implementation of AbstractModel which will give you a Client to interact with the current model provider. You also get access to the Embedding object which lets you do similarity searches on the wiki content.
+===== See also =====
+Forum thread: [[https://forum.dokuwiki.org/d/21178-experimental-plugin-aichat-chat-with-your-wiki-pages-using-chatgpt|Experimental Plugin: AIChat - chat with your wiki pages using ChatGPT]]