Dataloaders¶
manify.utils.dataloaders
¶
Dataloaders Submodule.¶
The dataloaders module allows users to load datasets from Manify's datasets repo on Hugging Face.
We provide a summary of the data types available, and their original sources, here.
Earlier versions of Manify included scripts to process raw data, which we have replaced with a single, centralized Hugging Face repo and the function load_hf. For transparency, we have preserved the data generation code in the Dataset-Generation branch of Manify.
| Dataset | Task | Distance Matrix | Features | Labels | Adjacency Matrix | Source/Citation |
|---|---|---|---|---|---|---|
| cities | none | ✅ | ❌ | ❌ | ❌ | Network Repository: Cities |
| cs_phds | regression | ✅ | ❌ | ✅ | ✅ | Network Repository: CS PhDs |
| polblogs | classification | ✅ | ❌ | ✅ | ✅ | Network Repository: Polblogs |
| polbooks | classification | ✅ | ❌ | ✅ | ✅ | Network Repository: Polbooks |
| cora | classification | ✅ | ❌ | ✅ | ✅ | Network Repository: Cora |
| citeseer | classification | ✅ | ❌ | ✅ | ✅ | Network Repository: Citeseer |
| karate_club | none | ✅ | ❌ | ❌ | ✅ | Network Repository: Karate |
| lesmis | none | ✅ | ❌ | ❌ | ✅ | Network Repository: Lesmis |
| adjnoun | none | ✅ | ❌ | ❌ | ✅ | Network Repository: Adjnoun |
| football | none | ✅ | ❌ | ❌ | ✅ | Network Repository: Football |
| dolphins | none | ✅ | ❌ | ❌ | ✅ | Network Repository: Dolphins |
| blood_cells | classification | ❌ | ✅ | ✅ | ❌ | See datasets from Zheng et al (2017): Massively parallel digital transcriptional profiling of single cells. - CD8+ Cytotoxic T-cells - CD8+/CD45RA+ Naive Cytotoxic T Cells - CD56+ Natural Killer Cells - CD4+ Helper T Cells - CD4+/CD45RO+ Memory T Cells - CD4+/CD45RA+/CD25- Naive T Cells - CD4+/CD25+ Regulatory T Cells - CD34+ Cells - CD19+ B Cells - CD14+ Monocytes |
| lymphoma | classification | ❌ | ✅ | ✅ | ❌ | See datasets from 10x Genomics: - Hodgkin's Lymphoma - Healthy Donor PBMCs |
| cifar_100 | classification | ❌ | ✅ | ✅ | ❌ | Hugging Face Datasets: CIFAR-100 |
| mnist | classification | ❌ | ✅ | ✅ | ❌ | Hugging Face Datasets: MNIST |
| temperature | regression | ❌ | ✅ | ✅ | ❌ | [Citation] |
| landmasses | classification | ❌ | ✅ | ✅ | ❌ | Generated using basemap.is_land |
| neuron_33 | classification | ❌ | ✅ | ✅ | ❌ | Allen Brain Atlas |
| neuron_46 | classification | ❌ | ✅ | ✅ | ❌ | Allen Brain Atlas |
| traffic | regression | ❌ | ✅ | ✅ | ❌ | Kaggle: Traffic Prediction Dataset |
| qiita | none | ✅ | ✅ | ❌ | ❌ | NeuroSEED Git Repo |
load_hf(name, namespace='manify')
¶
Load a dataset from HuggingFace Hub at {namespace}/{name}.
| Returns: |
|
|---|
Source code in manify/utils/dataloaders.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | |