Education, Science, Technology, Innovation and Life
Open Access
Sign In

Conditional Diffusion Model for X-Ray Segmentation Data Generation

Download as PDF

DOI: 10.23977/jaip.2024.070102 | Downloads: 30 | Views: 335


Zehao Fang 1


1 Shanghai Pinghe School, Shanghai, China

Corresponding Author

Zehao Fang


Nowadays training a well-functioning deep learning AI model requires a large amount of data, while in the field of medicine many scenarios lack training data due to privacy issues and legal reasons. In this essay, we propose to use ControlNet, a novel approach that leverages stable diffusion models and conditional control to produce realistic and diverse medical images. ControlNet allows us to specify extra conditions that the diffusion model should follow, such as edge maps, depth maps, segmentation masks, or CLIP image embeddings. These conditions can help us to preserve the structure, shape, and semantics of the target organs or tissues, as well as to manipulate the appearance, style, and context of the generated images. Specifically, we will use ControlNet to generate X-ray of a patient with pulmonary nodules and show the improvement. 


ControlNet, Diffusion Model, Synthetic Medical Images


Zehao Fang, Conditional Diffusion Model for X-Ray Segmentation Data Generation. Journal of Artificial Intelligence Practice (2024) Vol. 7: 7-10. DOI:


[1] R. W. Blake, R. Mathew, A. George, and N. Papakostas, "Impact of artificial intelligence on engineering: Past, present and future," Procedia CIRP, vol. 104, pp. 1728–1733, 2021, 54th CIRP CMS 2021 - Towards Digitalized Manufacturing 4.0. [Online]. Available: 
[2] A. Chakraborty, A. Biswas, and A. K. Khan, "Artificial intelligence for cybersecurity: Threats, attacks and mitigation," in Artificial Intelligence for Societal Issues. Springer, 2023, pp. 3–25. 
[3] R. Zhou, H. Zhou, H. Gao, M. Tomizuka, J. Li, and Z. Xu, "Grouptron: Dynamic multi-scale graph convolutional networks for group-aware dense crowd trajectory forecasting," in 2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 805–811. 
[4] Y. Kumar, A. Koul, R. Singla, and M. F. Ijaz, "Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda," Journal of ambient intelligence and humanized computing, pp. 1–28, 2022. 
[5] L. Zhang, A. Rao, and M. Agrawala, "Adding conditional control to text-to-image diffusion models," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3836–3847. 
[6] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial networks," Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020. 
[7] D. P. Kingma and M. Welling, "Auto-encoding variational bayes," CoRR, vol. abs/1312.6114, 2013. [Online]. Available: 
[8] B. Dai and D. P. Wipf, "Diagnosing and enhancing vae models," ArXiv, vol. abs/1903.05789, 2019. [Online]. Available: 
[9] Z. Ren, S. X. Yu, and D. Whitney, "Controllable medical image generation via generative adversarial networks," Electronic Imaging, vol. 33, no. 11, pp. 112–1–1126, Jan. 2021. [Online]. Available: 
[10] Y.Skandarani,P.-M.Jodoin,andA.Lalande,"Gansformedicalimagesynthesis:Anempirical study," Journal of Imaging, vol. 9, no. 3, p. 69, 2023. 
[11] F. Li, W. Huang, M. Luo, P. Zhang, and Y. Zha, "A new vae-gan model to synthesize arterial spin labeling images from structural mri," Displays, vol. 70, p. 102079, 2021. [Online]. Available: https://www.sciencedirect. com/science/article/pii/S0141938221000858 
[12] I. Cetin, M. Stephens, O. Camara, and M. A. G. Ballester, “Attri-vae: Attribute-based inter- pretable representations of medical images with variational autoencoders,” Computerized Med- ical Imaging and Graphics, vol. 104, p. 102158, 2023. 
[13] V. Luis, B. A. D. Marques, H. C. Batagelo, and J. P. Gois, "A review on generative adversarial networks for image generation," Computers and Graphics, vol. 114, pp. 13–25, 2023. [Online]. Available: https://www.sciencedirect. com/science/article/pii/S009784932300064X 
[14] J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020. 
[15] J. Song, C. Meng, and S. Ermon, "Denoising diffusion implicit models," ArXiv, vol. abs/2010.02502, 2020. [Online]. Available: 222140788 
[16] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-resolution image syn- thesis with latent diffusion models," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 10 684–10 695.

Downloads: 6729
Visits: 190297

Sponsors, Associates, and Links

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.