Pro tip2: You can treat HF datasets as versioned repos by pinning a specific revision (tag, branch or commit) when downloading files. 🧠
This ensures your data processing pipelines always use the exact dataset state before passing the data to the model. It enables reproducible pipelines and allows for reliable outputs of your ML system.
Pro tip: If you are finetuning any model with tensorboard logs enabled, be sure to upload them to HF Hub as event artifacts, they can be viewed instantly. 🚀
I previously remembered this done in the notus model release: argilla/notus-7b-v1