I have a number of datasets, which I create from a dictionary like so:
info = DatasetInfo(
description="my happy lil dataset",
version="0.0.1",
homepage="https://www.myhomepage.co.uk"
)
train_dataset = Dataset.from_dict(prepare_data(data["train"]), info=info)
test_dataset = Dataset.from_dict(prepare_data(data["test"]), info=info)
validation_dataset = Dataset.from_dict(prepare_data(data["validation"]),info=info)
I then combine these into a DatasetDict.
# Create a DatasetDict
dataset = DatasetDict(
{"train": train_dataset, "test": test_dataset, "validation": validation_dataset}
)
So far, so good. If I access dataset['train'].info.description I see the expected result of "My happy lil dataset".
So I push to the hub, like so:
dataset.push_to_hub(f"{organization}/{repo_name}", commit_message="Some commit message")
And this succeeds too.
However, when I come to pull the dataset back down from the hub, and access the information associated with it; like so:
pulled_data = full = load_dataset("f{organization}/{repo_name}" ,use_auth_token = True)
# I expect the following to print out "my happy lil dataset"
print(pulled_data["train"].info.description)
# However, instead it returns ''
Am I loading my data in from the hub incorrectly? Am I pushing only my dataset and not the info somehow?
I feel like I’m missing something obvious, but I’m really not sure. Any help would be appreciated.