Looking for Data

I’m curious to know if anyone has made a first person perspective synthetic therapy data set

Specifically the AI Assistant name will be inserted into the Therapist’s name slot in the conversation log

This data set should also probably include therapy/psychology research papers

And

Textbooks

My hope with this it to help with model Alignment and making it less sycophantic

1 Like

I don’t think there’s existing standalone dataset that fits that purpose at this point.
Since it’s in the medical field, you might get some information by asking on Hugging Science.

Yes. I can help you with that.

A solid way is to build a synthetic first person therapy style dataset where the therapist name is a placeholder token, so you can swap in the assistant name later. The dataset should include many examples of non sycophantic behavior like gentle disagreement, reality checking, asking clarifying questions, and setting boundaries, plus safe escalation patterns.

For papers and textbooks, the safest approach is not to copy copyrighted text into training rows. Instead, keep conversations original and reference research concepts, or use RAG with open access sources for grounding when needed.

Do you want this mainly for internal alignment experiments, or to publish a gated dataset on Hugging Face?

1 Like