I’m currently developing a multi-turn chatbot aimed at helping users manage anxiety, depression, and other mental health challenges. I’m seeking datasets related to mental health support conversations to train the chatbot. So far, I have found two datasets: ESConv and AUGESC.
However, these kinds of datasets seem to be quite rare. Could anyone recommend similar mental health conversation datasets or point me to other resources? I’d greatly appreciate any help or suggestions!
I’ve seen some of these in my search for other datasets, but I don’t remember where…
There seem to be a few mental health LLMs out there, so why not try to find a dataset that was used in a training course for them?
A search on GGUF will take you to a lot of LLMs. Then add keywords that look like that to narrow it down.
Someone may have a collection. If you can find a good collection, it’s a quick story.
This is a last resort, or in a sense a legitimate one, but if you find someone who seems to know a lot, you can open a Discussion with their appropriate repo and ask them. You can shorten the process in one fell swoop.
I really appreciate your guidance. I’ll try your suggestion, and if I have any further questions, I hope it’s okay to ask for your advice again. Your help means a lot! ^^
Hi thanhcao,
I am currently working on a research project focused on Fine-tune Llama 2 Model with LoRa and QLoRa techniques. I am using dataset “Amod/mental health counseling dataset”. However, I noticed that there is no paper or detailed documentation associated with the dataset.
Could you please provide any additional information or resources related to this dataset? If there are any related research papers or documentation that you are aware of, I would greatly appreciate it
Hi Mohsinali046,
I came across the GitHub page associated with the dataset you’re interested in. It seems the raw data can be found in CSV format on this GitHub page: Counsel Chat Dataset. After reviewing the CSV file, it looks like the data originates from counseling questions asked on the Counsel Chat website, such as this example.
Look at adjacent dialogue datasets
Even if they are not strictly clinical, empathetic and supportive conversation datasets can work well for training multi turn responses and then you add your own safety rules for self harm or crisis escalation.
Consider a hybrid approach
Use RAG with vetted mental health resources for factual guidance, and use conversation data mainly to learn tone, reflection, and asking gentle clarifying questions. This reduces the risk of the model inventing advice.
Safety note
If you deploy for anxiety or depression, plan for guardrails. Clear disclaimers, crisis escalation paths, and refusal behaviors for self harm content are important.
Quick question so people can give better pointers. Are you aiming for general supportive coaching, or clinical style counseling, and do you need it in English only or multilingual?