Looking for Mental Health Support Datasets for building a Multi-turn Chatbot

thanhcao · September 14, 2024, 9:32am

Hi everyone,

I’m currently developing a multi-turn chatbot aimed at helping users manage anxiety, depression, and other mental health challenges. I’m seeking datasets related to mental health support conversations to train the chatbot. So far, I have found two datasets: ESConv and AUGESC.

However, these kinds of datasets seem to be quite rare. Could anyone recommend similar mental health conversation datasets or point me to other resources? I’d greatly appreciate any help or suggestions!

Thank you in advance!

John6666 · September 14, 2024, 12:35pm

I’ve seen some of these in my search for other datasets, but I don’t remember where…
There seem to be a few mental health LLMs out there, so why not try to find a dataset that was used in a training course for them?
A search on GGUF will take you to a lot of LLMs. Then add keywords that look like that to narrow it down.

Someone may have a collection. If you can find a good collection, it’s a quick story.

https://huggingface.co/victunes/TherapyBeagle-11B-v2
https://huggingface.co/victunes/TherapyLlama-8B-v1
https://huggingface.co/models?sort=modified&search=gguf

P.S.

Users found by the above method. There seems to be also datasets. This is one way to find them.
https://huggingface.co/CalebE

By the way, this is what a collection is, and this is my HF utility collection as an example, which also includes space to assist in searching the dataset. You might be better to try using something like that.
https://huggingface.co/collections/John6666/spaces-for-model-space-useful-utilities-in-hugging-face-6685598385e2e2adac9d35a2

P.S.

This is a last resort, or in a sense a legitimate one, but if you find someone who seems to know a lot, you can open a Discussion with their appropriate repo and ask them. You can shorten the process in one fell swoop.

thanhcao · September 14, 2024, 3:05pm

Thank you so much <3

John6666 · September 14, 2024, 10:20pm

That’s fine. It seems worthwhile.
I hope this will help.
HF’s search function has quite a few omissions, so it might be faster to do this on Google.
https://www.google.co.jp/search?q=mental+site%3Ahuggingface.co

thanhcao · September 15, 2024, 2:04am

I really appreciate your guidance. I’ll try your suggestion, and if I have any further questions, I hope it’s okay to ask for your advice again. Your help means a lot! ^^

mohsinali046 · September 18, 2024, 7:13pm

Hi thanhcao,
I am currently working on a research project focused on Fine-tune Llama 2 Model with LoRa and QLoRa techniques. I am using dataset “Amod/mental health counseling dataset”. However, I noticed that there is no paper or detailed documentation associated with the dataset.

Could you please provide any additional information or resources related to this dataset? If there are any related research papers or documentation that you are aware of, I would greatly appreciate it

thanhcao · September 21, 2024, 4:01am

Hi Mohsinali046,
I came across the GitHub page associated with the dataset you’re interested in. It seems the raw data can be found in CSV format on this GitHub page: Counsel Chat Dataset. After reviewing the CSV file, it looks like the data originates from counseling questions asked on the Counsel Chat website, such as this example.

Let me know if you need further assistance!

DinoDS · March 4, 2026, 3:15pm

Here’s a short, engaging reply you can post. It’s helpful, safety-aware, and not salesy.

Hi, good question and you are right, high quality mental health dialogue data is rare for privacy and safety reasons.

A few directions that usually help:

Broaden search terms beyond “mental health chatbot”
Try “empathetic dialogue”, “supportive conversation”, “counseling dialogue”, “distress support”, “crisis counseling”, “peer support forum”.
Look at adjacent dialogue datasets
Even if they are not strictly clinical, empathetic and supportive conversation datasets can work well for training multi turn responses and then you add your own safety rules for self harm or crisis escalation.
Consider a hybrid approach
Use RAG with vetted mental health resources for factual guidance, and use conversation data mainly to learn tone, reflection, and asking gentle clarifying questions. This reduces the risk of the model inventing advice.
Safety note
If you deploy for anxiety or depression, plan for guardrails. Clear disclaimers, crisis escalation paths, and refusal behaviors for self harm content are important.

Quick question so people can give better pointers. Are you aiming for general supportive coaching, or clinical style counseling, and do you need it in English only or multilingual?

Topic		Replies	Views
Mentail health counseling 🤗Datasets	2	300	September 18, 2024
LLaMa2 fine-tuning: Multi-turn conversation dataset template Models	2	5788	March 6, 2024
Dataset format standards for chat-based, fine-tuned Llama models 🤗Datasets	4	6800	December 9, 2025
Fine tune LLM in our competition for mental health research - £500 ($648) available to win! Community Calls	0	79	October 23, 2024
How to fine-tune a mistral LLM for a multi-turn conversation, are there any examples? Models	0	812	April 8, 2024

Looking for Mental Health Support Datasets for building a Multi-turn Chatbot

P.S.

P.S.

Related topics