原始项目:https://github.com/MICV-yonsei/CT2MRI/tree/main
1. 数据预处理
1.1 生成hdf5文件
如果数据格式为.raw,且形状为(num, height, width) = (160, 224, 168),即师兄的数据。
那么使用 generate_total_hdf5_csv_niys.py
文件。
下面这行代码是创建头颅的训练数据集:
python3 generate_total_hdf5_csv_niys.py --data_dir ../../data/preprocess_globalNormAndEnContrast --which_set train --hdf5_name 160_train_axial.hdf5
==========
Subject 070 done in 1.422 seconds.
Successfully written train_data_brain.hdf5 in 139.905 seconds.
(11200, 2)
(224, 168, 11200)
11200
下面这行代码是创建头颅的测试数据集:
python3 generate_total_hdf5_csv_niys.py --data_dir ../../data/preprocess_globalNormAndEnContrast --which_set test --hdf5_name 160_valid_axial.hdf5
===========
Successfully written test_data_brain.hdf5 in 40.116 seconds.
(4800, 2)
(224, 168, 4800)
4800
创建骨盆的训练数据集:
python3 generate_total_hdf5_csv_niys.py --data_dir ../../data/preprocess_globalNormAndEnContrast_pelvis --which_set train --hdf5_name 160_train_axial.hdf5
================
Successfully written 160_train_axial.hdf5 in 264.063 seconds.
(6720, 2)
(240, 384, 6720)
6720
创建骨盆的测试数据集:
python3 generate_total_hdf5_csv_niys.py --data_dir ../../data/preprocess_globalNormAndEnContrast_pelvis --which_set test --hdf5_name 160_valid_axial.hdf5
===================
Successfully written 160_valid_axial.hdf5 in 99.247 seconds.
(2880, 2)
(240, 384, 2880)
2880
1.2 创建pkl文件
Linux为sh脚本文件添加执行权限:
chmod u+x file.sh
创建头颅的hist数据集:在shell/create_data/文件夹下运行 make_hist_dataset_niys_br.sh
脚本。
创建骨盆的hist数据集:在shell/create_data/文件夹下运行 make_hist_dataset_niys_gp.sh
脚本。
2. 训练
2.1 训练头颅
进入shell/train文件夹下。
./train.sh
训练结束前的log:
Epoch: [8 / 40] iter: 79633 loss: 0.0602: 11%|█████████████▋ | 1234/11200 [23:43<3:06:30, 1.12s/it]Epoch 39818: reducing learning rate of group 0 to 1.5625e-06.
Epoch: [8 / 40] iter: 89599 loss: 0.0503: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11200/11200 [3:30:35<00:00, 1.13s/it]
training time: 3:30:36
validating epoch...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2400/2400 [13:46<00:00, 2.90it/s]
validating epoch success
saving latest checkpoint...
wandb: WARNING (User provided step: 7 is less than current step: 89599. Dropping entry: {'val_epoch/loss': 0.08186065405607224, '_timestamp': 1746142631.3914557}).
save top model start...
sampling loop time step: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:16<00:00, 3.00it/s]
sampling loop time step: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:16<00:00, 2.98it/s]
Epoch: [9 / 40] iter: 95133 loss: 0.0367: 49%|████████████████████████████████████████████████████████████▎ | 5534/11200 [1:44:20<1:46:10, 1.12s/it]Epoch 47568: reducing learning rate of group 0 to 7.8125e-07.
Epoch: [9 / 40] iter: 100799 loss: 0.0343: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11200/11200 [3:30:38<00:00, 1.13s/it]
training time: 3:30:38
validating epoch...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2400/2400 [13:48<00:00, 2.90it/s]
validating epoch success
saving latest checkpoint...
wandb: WARNING (User provided step: 8 is less than current step: 100799. Dropping entry: {'val_epoch/loss': 0.08267383277416229, '_timestamp': 1746156154.7018843}).
save top model start...
wandb: Waiting for W&B process to finish... (success).
wandb: Network error (TransientError), entering retry loop.
wandb: | 1.765 MB of 1.765 MB uploaded (0.000 MB deduped)
wandb: Run history:
wandb: loss/train █▆▅▆▆▄▄▆▂▃▃▂▁▄▂▄▄▃▃▃▃▃▂▃▃▂▄▁▃▄▃▄▃▂▃▃▃▄▄▃
wandb: loss/val_step ▆▅▆▃▅▆▂▅▆▃▄▄▃▁▂▃▄▅▂▅▄▄▃▅▇▄▅▇▆▄▄▄▄█▅▂▅▃▆▂
wandb:
wandb: Run summary:
wandb: loss/train 0.03431
wandb: loss/val_step 0.04754
wandb:
wandb: 🚀 View run 250430_160_BBDM_axial_DDIM_MR_global_hist_context at: https://wandb.ai/niys/niys-bbdm-2024/runs/bmtz28sn
wan
2.2 训练骨盆
进入shell/train文件夹下。
./train_gp.sh
在config文件夹下的 BBDM_base_gp.yaml
中修改了骨盆的训练参数,使其 model_channels
从128降维64,因为显存不够,同时 batch
从2降为1。
训练结束前的log:
Epoch: [6 / 40] iter: 80639 loss: 0.0476: 100%|██████████████████████████████████████████████████████████████████████████████| 13440/13440 [1:53:52<00:00, 1.97it/s]
training time: 1:53:52
validating epoch...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2880/2880 [06:39<00:00, 7.22it/s]
validating epoch success
saving latest checkpoint...
wandb: WARNING (User provided step: 5 is less than current step: 80639. Dropping entry: {'val_epoch/loss': 0.07331851869821548, '_timestamp': 1746236606.621776}).
save top model start...
remove top_model_epoch_5.pth
saving top checkpoint: average_loss=0.07331851869821548 epoch=6
sampling loop time step: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:06<00:00, 7.50it/s]
sampling loop time step: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:06<00:00, 7.47it/s]
Epoch: [7 / 40] iter: 84775 loss: 0.0862: 31%|████████████████████████▎ | 4136/13440 [35:13<1:18:42, 1.97it/s]Epoch 42389: reducing learning rate of group 0 to 1.5625e-06.
Epoch: [7 / 40] iter: 94079 loss: 0.0575: 100%|██████████████████████████████████████████████████████████████████████████████| 13440/13440 [1:53:55<00:00, 1.97it/s]
training time: 1:53:56
validating epoch...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2880/2880 [06:37<00:00, 7.24it/s]
validating epoch success
saving latest checkpoint...
wandb: WARNING (User provided step: 6 is less than current step: 94079. Dropping entry: {'val_epoch/loss': 0.07403839379549026, '_timestamp': 1746243851.1632428}).
save top model start...
sampling loop time step: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:06<00:00, 7.49it/s]
sampling loop time step: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:06<00:00, 7.47it/s]
Epoch: [8 / 40] iter: 96777 loss: 0.0361: 20%|███████████████▊ | 2698/13440 [23:04<1:30:53, 1.97it/s]Epoch 48390: reducing learning rate of group 0 to 7.8125e-07.
Epoch: [8 / 40] iter: 107519 loss: 0.0640: 100%|█████████████████████████████████████████████████████████████████████████████| 13440/13440 [1:53:58<00:00, 1.97it/s]
training time: 1:53:59
validating epoch...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2880/2880 [06:37<00:00, 7.24it/s]
validating epoch success
saving latest checkpoint...
wandb: WARNING (User provided step: 7 is less than current step: 107519. Dropping entry: {'val_epoch/loss': 0.07396171241998672, '_timestamp': 1746251096.8026478}).
save top model start...
wandb: Waiting for W&B process to finish... (success).
wandb: \ 1.888 MB of 1.888 MB uploaded (0.000 MB deduped)
wandb: Run history:
wandb: loss/train ▅▇▃▅▅▄▃▅▂▃▁▄▅▂▁▃▂▄▂▂▆▄▃█▄▄▁▃▄▂▂▅▃▅▅▄▆▅▁▃
wandb: loss/val_step ▆▇▆▃▃▅▅▁▁▆▄▆▂▂▅▅▆▅▄▃▇▃▂▃▇▃▅▁▇▇▂▃▃█▂▅▅▂▄▃
wandb:
wandb: Run summary:
wandb: loss/train 0.06405
wandb: loss/val_step 0.07617
wandb:
wandb: 🚀 View run 250502_96_BBDM_axial_DDIM_MR_global_hist_context at: https://wandb.ai/niys/niys-bbdm-2024/runs/p98ugv6m
wandb: ️⚡ View job at https://wandb.ai/niys/niys-bbdm-2024/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjYyNjMxMzYwMA==/version_details/v4
wandb: Synced 6 W&B file(s), 48 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20250502_213852-p98ugv6m/logs
./train_gp.sh:行 34: 2609747 段错误 (核心已转储) python -u ../main.py --train --exp_name $exp_name --config ../configs/$config_name --HW $HW --plane $plane --batch $batch --ddim_eta $ddim_eta --sample_at_start --save_top --gpu_ids $gpu_ids
3. 测试
3.1 测试头颅
直接运行test.sh
3.2 测试骨盆
运行test_gp.sh
Bugs
Bugs.1 ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/dell/anaconda3/envs/pytorch-gpu/lib/python3.9/site-packages/google/protobuf/pyext/_message.cpython-39-x86_64-linux-gnu.so)
参考:https://blog.csdn.net/weixin_42166222/article/details/129402507
Bugs.2 使用wandb报错:ERROR Error while calling W&B API: project not found (<Response [404]>)
参考:https://blog.csdn.net/weixin_43835996/article/details/126955917
Bugs.3 无法从 “typing_extensions “导入名称 “deprecated“
pip install typing_extensions --upgrade
Bugs.4 ERROR: Could not find a version that satisfies the requirement albumentations==1.3.1 (from versions: none)
pip install albumentations==1.3.1 -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn