d2f (digital 2 film)

deep learning-based film grain and color synthesis

๐Ÿ“† 25.01.31 ~ Temporary suspension

๐Ÿ›  Tech Stack

  • mlx framework
  • python / pytorch
  • etcโ€ฆ

๐ŸŽฏ Why Start This?

Since Iโ€™m unsure about what I want to beโ€”a businessman, a startup member, a personal developer, or something elseโ€”Iโ€™ve been feeling uneasy these days. However, one thing Iโ€™ve realized is that I have a strong desire to solve problems or fulfill my own needs by learning new skills, such as AI, circuit design, and system development. Whether intentional or not, I have a day off today, so Iโ€™m setting aside my future plans and immersing myself in this project.

Final goal is :

  1. making active users with my product
  2. finishing with a level of completeness that makes it presentable

Praying for myself ๐Ÿงฏ


๐Ÿ˜ญ Why suspension?

  1. Resource limitration -> i canโ€™t run EfficientDet(which is purposed for efficiency, using the feature fusion tech.) or ViT

  2. The max capa of my M2 air 16gb with proper batch size is UNET-512

  3. The film grain and colour complexity is not that easy, i thought as i have the my own data i can make it but unlike other tasks that deals with a specific purpose, considering 2 factors in one pipeline is complex.

  4. Deployment is not my style. If the product is all made, i could tried but without any motivation it was hard for me to learn the AWS tools.

  5. Also, thought data is enough (375 pics) since it has high resolutions(2048x3089). but as we need to use it with resized data to fit in memory and make model capture the overall colour. It was not enough.


๐Ÿ“— What i have learned?

  1. Unsupervised Denosier๋ฅผ ์ฝ์œผ๋ฉด์„œ model ์ž์ฒด๋ฅผ ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์˜ denosing์„ ์œ„ํ•ด์„œ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์ ์ด ๋†€๋ผ์› ๋‹ค. ๋Œ€๋ถ€๋ถ„์€ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  freeze๋œ ์ƒํƒœ์—์„œ์˜ ๊ฒฐ๊ณผ๋ฌผ ์ถ”๋ก ์„ ๊ธฐ๋Œ€ํ•˜๋Š”๋ฐ, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋ธ์„ ํ•˜๋‚˜์˜ ์žฌ๊ท€ํ•จ์ˆ˜์ฒ˜๋Ÿผ ์‚ฌ์šฉํ•˜์—ฌ prior๋ฅผ ๋”ฐ๋ผ๊ฐ€๋„๋ก ๋งŒ๋“œ๋Š” ๊ณผ์ •์ด ๋„ˆ๋ฌด๋‚˜ ์‹ ๊ธฐํ–ˆ๊ณ , ์ด๋Ÿฌํ•œ ๊ฐœ๋…์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ ํ™•๋ฅ ์ด ์–ผ๋งˆ๋‚˜ ์ค‘์š”ํ•œ์ง€๋ฅผ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ํŠนํžˆ AMDD์ด๋ผ๋Š” ๋ฐฉ๋ฒ•์„ ์™„๋ฒฝํ•˜๊ฒŒ ์ดํ•ดํ•˜์ง€๋Š” ๋ชปํ–ˆ์ง€๋งŒ, ๋…ธ์ด์ฆˆ ์ด๋ฏธ์ง€ ๋งŒ์„ ๊ฐ€์ง€๊ณ  ๋””๋…ธ์ด์ง•์„ ํ•ด๋‚˜์•…๋Š” ๊ณผ์ •์—์„œ์˜ ํ€„๋ฆฌํ‹ฐ๋„ ์ข‹์•˜๊ณ , ๋ชจ๋ธ์„ ๋ฐ”๋ผ๋ณด๋Š” ์ƒˆ๋กœ์šด ๊ด€์ ์„ ๋А๋‚„ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

  2. pytorch์˜ ํ•™์Šต์— ๋Œ€ํ•œ(vision model ์ค‘์‹ฌ) convention์„ ๋งŽ์ด ์ตํž ์ˆ˜ ์žˆ์—ˆ๋‹ค. ํŠนํžˆ UEGAN์€ IEEE T-IP ์— ๊ฐœ์‹œ๋œ ๋…ผ๋ฌธ์œผ๋กœ CVPR, ECCV์™€ ๊ฐ™์€ ์†๋„๊ฐ ์žˆ๋Š” ์—ฐ๊ตฌ์™€ ๋‹ฌ๋ฆฌ ์—ฐ๊ตฌ์˜ ์™„์„ฑ๋„๋ฅผ ๋А๋‚„ ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉฐ, ํ•ด๋‹น source code๋ฅผ ์ฝ์–ด๋ณด๋ฉด์„œ pytorch ํ•™์Šต์˜ ๋””ํ…Œ์ผํ•œ ๋ถ€๋ถ„๋“ค์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ํŠนํžˆ IEEE T-IP ํŠน์„ฑ์ธ์ง€ ์ฝ”๋“œ๊ฐ€ ๊ต‰์žฅํžˆ ๊น”๋”ํ•˜๊ฒŒ ์ •๋ฆฌ๋˜์–ด์žˆ์—ˆ๊ณ , ๊น”๋”ํ•œ ์ฝ”๋“œ๊ฐ€ ๊ฐ€๋…์„ฑ์— ์–ผ๋งˆ๋‚˜ ๋„์›€์ด ๋งŽ์ด๋˜๋Š”์ง€๋ฅผ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ด์ „๊นŒ์ง€๋Š” ์ฝ”๋“œ๋ฅผ ๋ณด๋ฉด์„œ ์ด๋ถ€๋ถ„๋„ ์ค‘์š”ํ•œ ์—ญํ• ์ธ๊ฐ€์— ๋Œ€ํ•œ ์ง๊ด€์ด ์—†์—ˆ๋‹ค๋ฉด, UEGAN ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‹ค์–‘ํ•œ pytorch code implementation์„ ๋ณด๋ฉด์„œ ์–ด๋–ค ๋ถ€๋ถ„์„ ์ง‘์ค‘์ ์œผ๋กœ ๋ด์•ผํ•˜๊ณ , ์–ด๋–ค ๋ถ€๋ถ„์€ convention์ ์ธ์ง€ ๊ตฌ๋ถ„ํ•˜๋Š” ์ง๊ด€์„ ๊ฐ€์ง€๊ฒŒ ๋˜์—ˆ๋‹ค.

  3. model training์—์„œ ๋ฉ”๋ชจ๋ฆฌ์™€ computation์˜ ํšจ์œจ์„ฑ๊ณผ capacity์˜ ์ค‘์š”์„ฑ์„ ๋А๋‚„ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ํŠนํžˆ ๋” ์„ฑ๋Šฅ ์ข‹์€ ๋ชจ๋ธ์„ ์œ„ํ•ด์„œ ๊ฐ€์ค‘์น˜๋ฅผ ๋†’์ด๋ฉด ๊ฐ€์ค‘์น˜ ๊ฐฏ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒƒ๋„ ๋ฌธ์ œ์ง€๋งŒ, ํ•ด๋‹น ๊ฐ€์ค‘์น˜์˜ grad๋ฅผ ์ถ”์ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ฉ”๋ชจ๋ฆฌ๋Ÿ‰๋„ ๋Š˜์–ด๋‚˜๊ธฐ ๋•Œ๋ฌธ์— ์–ผ๋งˆ๋‚˜ ์„ฑ๋Šฅ์ด ๋ฉ”๋ชจ๋ฆฌ ์ง‘์•ฝ์ ์ธ์ง€๋ฅผ ๋А๋‚„ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ ๋ชจ๋ธ์˜ computation ๋Šฅ๋ ฅ์— ๋น„ํ•ด์„œ memory w/r ๋น„์œจ์ด ๋†’์œผ๋ฉด ์„ฑ๋Šฅ ๋น„ํšจ์œจ์ด ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์— prefetcher์™€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ ๋กœ๋”๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜, tqdm๊ณผ time์œผ๋กœ ํ•™์Šต ์ดˆ๊ธฐ์—๋Š” compute / read ํšจ์œจ์„ ์ธก์ •ํ•˜๋Š” ๊ฒƒ๋„ ์ข‹์€ profiling ๋ฐฉ๋ฒ•์ด๋ผ๋Š” ์ ์„ ์•Œ๊ฒŒ๋˜์—ˆ๋‹ค.

  4. ํ•˜๋‚˜์˜ ๋ชฉํ‘œ๋ฅผ ํ–ฅํ•ด์„œ ํ•„์š”ํ•œ ์ž๋ฃŒ๋“ค๊ณผ ๋…ผ๋ฌธ๋“ค์„ ์ฝ์œผ๋ฉด์„œ ๊ณต๋ถ€๋ฅผ ํ•ด๋‚˜๊ฐ€๋Š” ๊ณผ์ •์€ ํ™•์‹คํžˆ ํฅ๋ฏธ๋กญ๋‹ค๋Š” ์ ์„ ์•Œ๊ฒŒ๋˜์—ˆ๊ณ , ์ง„์ •ํ•œ ์ „๋ฌธ๊ฐ€๊ฐ€ ๋˜๊ธฐ ์œ„ํ•ด์„œ ์ฝ์–ด์•ผํ•  ๋…ผ๋ฌธ๋“ค์ด ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ๊ฐ€๋ฅผ ์•Œ๊ฒŒ๋˜์—ˆ๋‹ค. ์–ด๋– ํ•œ ๋ชฉํ‘œ๋ฅผ ์œ„ํ•ด์„œ๋Š” ๊ทธ ๋ชฉํ‘œ๋กœ ์ž‘์„ฑ๋œ ๋…ผ๋ฌธ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํ•ด๋‹น ๋ถ„์•ผ ์ž์ฒด์˜ ๋‹ค์–‘ํ•œ ๋…ผ๋ฌธ์„ ์ฝ๋Š” ๊ฒƒ์ด ํ›จ์”ฌ ๋” ๋‹ค์–‘ํ•œ ์ ‘๊ทผ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•  ๊ฒƒ์ด๋ผ๋Š” ํ™•์‹ ์ด ๋“ค์—ˆ๋‹ค. ๋น„๋‹จ CV๋ผ๋„ NLP์˜ attention mechanism์„ ๋ถ„๋ฅ˜๋ชจ๋ธ์—์„œ๋Š” ์ ๊ทนํ™œ์šฉํ•˜๋Š” ์ ์€ AI๋ผ๋Š” ๋ถ„์•ผ ์ „๋ฐ˜์— ๋Œ€ํ•œ ๊ด€์‹ฌ์ด ํ•„์š”ํ•จ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ณต๋ถ€๋ฅผ drivingํ•˜๋Š” force๋Š” ๊ทธ ๋ถ„์•ผ์— ๋Œ€ํ•œ ๊ด€์‹ฌ์ผ ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ๋‚ด๊ฐ€ ๋งŒ๋“ค๊ณ ์ž ํ•˜๋Š” ํ•˜๋‚˜์˜ ๊ฐ€์น˜๋ฅผ ๋ช…ํ™•ํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด ํ›จ์”ฌ๋” ์ค‘์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฐ ์˜๋ฏธ์—์„œ์˜ ๋‚˜์˜ ๋ถ„์•ผ์— ๋Œ€ํ•œ ๋‚˜์˜ ๊ณผ๊ฐํ•œ ์„ ํƒ์ด ๋‚˜๋ฅผ ๊ธฐ๋‹ค๋ฆฌ๊ณ  ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค.

  5. ๋˜ ViT๋ฅผ ๊ณต๋ถ€ํ•˜๋ฉด์„œ, deepseek์˜ multi-head latent attention๊ฐ™์€ ๊ฒฝ์šฐ์—๋„ ๊ฒฐ๊ตญ latent space ๊ฐœ๋…์„ ํ™œ์šฉํ•˜๋Š” ๊ฑด๋ฐ, ์ด๋Š” VAE์—์„œ๋„ ๋‚˜์˜ค๋Š” ๊ฐœ๋…์ด๋‹ˆ task๊ณผ ๊ด€๋ จ์—†์ด ์ค‘์š”ํ•œ milestone๊ณผ ๊ฐ™์€ ๊ฐœ๋…๋“ค์„ ์ตํžˆ๊ณ  ๊ณ ์ •๊ด€๋…์„ ํ”ผํ•˜๋ ค๊ณ  ํ•˜๋Š” ๋…ธ๋ ฅ์ด ํ•„์š”ํ•  ๊ฒƒ ๊ฐ™๋‹ค.

  6. ๊ทธ๋ฆฌ๊ณ  ๋ถ„๋ช…ํžˆ AI์˜ ์‹œ๋Œ€๊ฐ€ ๋„๋ž˜ํ–ˆ์ง€๋งŒ, ์—ฌ์ „ํžˆ legacy algorithm์˜ ์ค‘์š”์„ฑ๊ณผ ํšจ์œจ์„ฑ์„ ๋ฌด์‹œํ•ด์„œ๋Š” ์•ˆ๋œ๋‹ค๋Š” ์ ์„ ์•Œ๊ฒŒ ๋˜์—ˆ๋‹ค. ADMM์„ ์œ„ํ•ด์„œ ์‚ฌ์šฉ๋œ BM3D๋‚˜ grain noise detect์„ ์œ„ํ•ด์„œ ์‚ฌ์šฉ๋œ Homogeniety block detection ๊ณผ ๊ฐ™์€ ์œˆ๋„์šฐ ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•๋“ค์€ ๊ฒฐ๊ตญ ๋‚ด๊ฐ€ ์ง€๊ธˆ ๊ฐ€์ง„ ์ž์›๋‚ด์—์„œ ๋‚ด๊ฐ€ ํ•„์š”ํ•œ ์„ฑ๋Šฅ์„ ์œ„ํ•ด์„œ ์ ํ•ฉํ•œ ๋ฐฉ์‹์„ ๋ชจ์ƒ‰ํ• ๋•Œ ๋” ๋„“์€ ์‹œ์•ผ๋ฅผ ๊ฐ€์ง€๊ฒŒ ํ•ด์ค„ ๊ฒƒ์ด๋‹ค.

  7. mlx์™€ ๊ฐ™์€ ํ•˜๋“œ์›จ์–ด ์ตœ์ ํ™” ํ”„๋กœ๊ทธ๋žจ์ด๋‚˜ framework๋ฅผ ์œ„ํ•ด์„œ๋Š” ๊ฒฐ๊ตญ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ง์ ‘์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜๋Š” low level programming language์˜ ํ•„์š”์„ฑ์„ ๋”์šฑ ์ฒด๊ฐํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‚ด๊ฐ€ python์„ ์‚ฌ์šฉํ•˜๋Š” ์‚ฌ๋žŒ์ด ๋˜๋”๋ผ๋„ ํ•„์š”์— ๋”ฐ๋ผ์„œ source code ์ž์ฒด๋ฅผ ์ˆ˜์ •ํ•˜๊ณ  buildํ•  ์ˆ˜ ์žˆ๋Š” ์‚ฌ๋žŒ์ด ๋˜๊ณ  ์‹ถ๋‹ค.

  8. ๋˜ํ•œ ํ˜„์žฌ์˜ cpu, gpu, tpu์™€ ๊ฐ™์€ Heterogeneous computing system์—์„œ apple silicon๊ณผ ๊ฐ™์€ Unified Memory๊ฐ€ ๊ฐ€์ง€๋Š” ์ด์ ์„ ํ™•์‹คํžˆ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ํŠนํžˆ CUDA์˜ ๊ฒฝ์šฐ pin_memory option์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ ์ด๋Š” ์šด์˜์ฒด์ œ์—์„œ swap ๋ชปํ•˜๋Š” page-locked memory์— ๋ฐ”๋กœ dataloader๊ฐ€ data๋ฅผ ์˜ฌ๋ฆผ์œผ๋กœ์จ DMA๊ฐ€ ๋ฐ”๋กœ GPU๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์‚ฌํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. pin_memory=false์ผ์‹œ, pageable memory์—์„œ page-locked memory๋กœ ์ด๋™ํ•˜๋Š” ์ถ”๊ฐ€๊ณผ์ •์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌด๋„ค GPU์™€ CPU๊ฐ„ ๋ฐ์ดํ„ฐ ์ „์†ก์— ๋ณ‘๋ชฉ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ Unified memory์˜ ๊ฒฝ์šฐ๋Š” GPU CPU๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์„ ๊ณต์œ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ถˆํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ ์ด๋™์ด ์—†๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Š” CPU GPU๊ฐ€ ์ œํ•œ๋œ ๋…ธํŠธ๋ถ๊ณผ ๊ฐ™์€ ํ™˜๊ฒฝ์—์„œ๋Š” ์œ ์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋Œ€๊ทœ๋ชจ ์„œ๋ฒ„ ์‹œ์Šคํ…œ์—์„œ๋Š” ํ™•์žฅ์„ฑ์—์„œ GPU์ž์ฒด์˜ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์กด์žฌํ•˜๋Š”๊ฒŒ ํ›จ์”ฌ ํšจ์œจ์ ์ผ ๊ฒƒ์œผ๋กœ ์ƒ๊ฐ๋œ๋‹ค. ์ด์™€ ๊ฐ™์ด ๊ฒฐ๊ตญ SW๋ฅผ ๋‹ค๋ฃจ๋”๋ผ๋„ ํ•˜๋“œ์›จ์–ด์— ๋Œ€ํ•œ ์ง€์‹ ์ „๋ฌธ์„ฑ์„ ๋‹ค๋ฅด๊ฒŒ ๋งŒ๋“ ๋‹ค๋Š” ํ™•์‹ ์ด ๋“ ๋‹ค. ๊ทธ๋Ÿผ์—๋„ NVIDIA์˜ ์—ฐ์‚ฐ์„ฑ๋Šฅ์€ ์ •๋ง ๋›ฐ์–ด๋‚˜๋‹ค.

  9. ๊ทธ๋ฆฌ๊ณ  ๋ฌด์—‡๋ณด๋‹ค ๋‚ด์šฉ์˜ ์ „๋ฌธ์„ฑ์ด ๋†’์•„์ง„๋‹ค๋ฉด ๊ผญ! ๊ณต์‹๋ฌธ์„œ๋ฅผ ์ •๋…ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋А๊ผˆ๋‹ค. ์•„๋ฌด๋ฆฌ GPT๋ž‘ ๋Œ€ํ™”ํ•˜๋”๋ผ๋„ ๊ฒฐ๊ตญ ๋‚˜์˜ ํ™˜๊ฒฝ์—์„œ์˜ ๋ณ€์ˆ˜๊ฐ€ ์žˆ๊ณ , ๊ณต์‹๋ฌธ์„œ๋ฅผ ์ฝ๊ฒŒ๋˜๋ฉด gpt๊ฐ€ ์•Œ๋ ค์ฃผ์ง€ ์•Š๋Š” ๋‹ค์–‘ํ•œ ๊ธฐ๋Šฅ๋“ค์„ ๋” ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ดˆ๊ธฐ์—๋Š” ๋น ๋ฅด๊ฒŒ gpt๋กœ ์ ‘๊ทผํ•˜๋”๋ผ๋„ ์กฐ๊ธˆ ๋ณต์žกํ•˜๊ฑฐ๋‚˜ ๋ชจํ˜ธํ•œ ๊ฐœ๋…์— ๋Œ€ํ•ด์„œ๋Š” ๊ณต์‹๋ฌธ์„œ์— ์‹œ๊ฐ„์„ ํˆฌ์žํ•˜๋Š” ๊ณผ๊ฐํ•œ ์„ ํƒ์€ ํ•„์ˆ˜๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค.



Thigns done โฌ‡๏ธ


๐Ÿ“ MLX framework


1.

mlx.data.buffer_from_vector

error : dictionary๋กœ ๊ตฌ์„ฑ๋œ sample = [{โ€˜imageโ€™ : bโ€™Path/โ€™}],

์ด๊ฑฐ๋ฅผ buffer_from_vector๋ฅผ mlx.data๋กœ ์‹คํ–‰ํ•˜๋ฉด byte array๊ฐ€ ๋„˜์–ด์™€์•ผํ•˜๋Š”๋ฐ, ๋นˆ๋ฐฐ์—ด์ด ๋„˜์–ด์˜ด ([] dtype=int8)

Solution :

Mac ์ดˆ๊ธฐํ™” -> mlx-data๋ฅผ pip๋กœ installํ•˜์ง€ ์•Š๊ณ  ์†Œ์Šค์ฝ”๋“œ์—์„œ python blinding์œผ๋กœ ์„ค์น˜ํ•จ

(ํ™˜๊ฒฝ๋ณ€์ˆ˜์™€ ๋‹ค์–‘ํ•œ dependency๊ฐ€ ๊ผฌ์—ฌ์žˆ๋˜ ๊ฒƒ์œผ๋กœ ํŒŒ๋‹จ๋จ)


2.

mlx framework insights

Buffer : image๋ฅผ ํ•„์š”์‹œ์— loadํ•˜์—ฌ ๋ถˆํ•„์š”ํ•˜๊ฒŒ load๋˜๋Š” ๊ฒฝ์šฐ๋ฅผ ๋ง‰๋Š”๋‹ค. -> ์• ํ”Œ์ด ์ง€๋ฆฌ๊ธดํ•˜๋Š”๋“ฏ (๊ฒฐ๊ตญ Apple silicon์„ ๋งŒ๋“ ์ด์ƒ ๊ทธ๋“ค์˜ framework์™€ protocol ๊ฐœ๋ฐœ์€ ๋ถˆ๊ฐ€ํ”ผํ•˜๊ณ , ์—ฌ๊ธฐ์„œ๋„ ๋งŒ์•ฝ์— ๋‘๊ฐ์„ ๋“œ๋Ÿฌ๋‚ธ๋‹ค๋ฉด ๊ฒฐ๊ตญ ์• ํ”Œ ์‚ฌ์ดํด ๋‹ค์‹œ ์˜จ๋‹ค๊ณ  ๋ด„)

stream.prefetch(4, 4) : CPU์—์„œ 4๊ฐœ์˜ batch๋ฅผ GPU์—ฐ์‚ฐ๋™์•ˆ loadํ•ด๋‘๊ณ , GPU๋„ 4๊ฐœ์˜ batch๋ฅผ ํ•œ๋ฒˆ์— ๋ถˆ๋Ÿฌ๋“œ๋ฆฌ๋„๋ก ์„ค์ •ํ•จ


3.

binary cross entropy์—์„œ mx.compile ๋ฌธ์ œ

ํ•ด๊ฒฐ ๋ชปํ•จ โžก๏ธ ๊ทธ๋ƒฅ num_classes=2 ๋กœ ํ•˜๊ณ  mlx.nn.losses.cross_entropy๋กœ ํ•จ

binary cross entropy์—์„œ @partial(mx.compile) ์˜ต์…˜์—์„œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๋“ฏํ•จ


4.

Training overfitting (resent44)

training data๊ฐ€ 204๊ฐœ์—์„œ training์ด overfitting์ด ๋ฐœ๊ฒฌํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค.

๋ฐ์ดํ„ฐ๋Š” (224,224) center crop๋˜์–ด ์‚ฌ์šฉ๋˜์—ˆ๊ณ , batch_size = 16, epoch=50, lr=1e-5

Epoch Train Loss Train Accuracy Test Accuracy
49 0.040 0.990 0.688

๐Ÿ“™ ์ด์œ ? ์•„๋งˆ๋„ data๊ฐ€ ๋ถ€์กฑํ•œ๋“ฏ ํ•˜๋‹ค. image crop๊นŒ์ง€๋Š” ์–ด๋–ป๊ฒŒ ํ•˜๊ธดํ•˜๋Š”๋ฐ, crop๋ณด๋‹ค๋Š” 224,224 ๋ฐ์ดํ„ฐ๋ฅผ ์ƒˆ๋กœ ์ƒ์„ฑํ•ด์„œ ์ง„ํ–‰ํ•ด์•ผ๊ฒ ๋‹ค.


5.

224,224๋กœ ์ด๋ฏธ์ง€ ์ชผ๊ฐœ์„œ ํ•™์Šต ์ง„ํ–‰

๐Ÿ’พ ์›๋ž˜ ๋ฐ์ดํ„ฐ : 1.02GB -> 743MB (6.13GB disk)

train length : 42111
Test length : 4678

๐Ÿ™‹๐Ÿพโ€โ™‚๏ธ ํ–ˆ๋Š”๋ฐ๋„ training acc๋Š” 1์— epoch1๋งŒ์— ์ˆ˜๋ ดํ•˜๋Š”๋ฐ, test accur๋Š” ์„œ์„œํžˆ ์˜ฌ๋ผ๊ฐ€๊ธดํ•˜๋‚˜ overfitting์„ ๋ณด์ž„

์‚ฌ์‹ค ๋‚ด๊ฐ€ ์–ด๋–ค ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๊ฑฐ๋‚˜ ๊ณต๋ถ€๋ฅผ ์ง„ํ–‰ํ• ๋•Œ ํ•ญ์ƒ ์ด๋Ÿฌํ•œ ๋„˜๊ธฐ ๊ท€์ฐฎ์€ ๋ฌธ์ œ๋“ค์—์„œ ํฌ๊ธฐํ•˜๊ณ  ๊ทธ๋ƒฅ ํ•™๊ต๋‚˜ ์‚ฌํšŒ์—์„œ ์ฃผ์–ด์ง„ ๊ฒƒ๋“ค์„ ์—ด์‹ฌํžˆ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋Œ์•„๊ฐ”์—ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์ด๋ฒˆ ํ•œ๋‹ฌ์˜ ๋ชฉํ‘œ๋ฅผ ์ด๋Ÿฌํ•œ ๋‚˜๋ฅผ ๋งˆ์ฃผํ•˜๊ณ  ๊ทธ๋Ÿฌํ•œ ํ—ˆ๋“ค์„ ์นจ์ฐฉ๊ณ  ์กฐ์šฉํžˆ ๊พธ์ค€ํ•˜๊ฒŒ ๋„˜์–ด๊ฐ€๋ ค๊ณ  ํ•œ๋‹ค. ์ด๋ฒˆ์—๋„ ์‚ฌ์‹ค ๊ทธ๋ƒฅ โ€˜์•„ ๋ญ์•ผ ๋ชฐ๋ผโ€™ํ•˜๊ณ  ๋„˜์–ด๊ฐˆ ์ˆ˜ ์žˆ์ง€๋งŒ ๋‚˜๋Š” ์ด๋ฒˆ๋งŒํผ์€ ๋์„ ๋ด์•ผํ•œ๋‹ค.

๋ณด๋ฉด, buffer๋ฅผ ๋žœ๋คํ•˜๊ฒŒ shuffleํ•˜๊ณ  plot์„ ํ•ด๋ณด์•˜๋Š”๋ฐ, ๋ฐ์ดํ„ฐ์˜ ์ƒ๋‹นํ•œ ํŽธํ–ฅ์„ ๋ฐœ๊ฒฌํ–ˆ๋‹ค. 224x224๋Š” ๋™์ผํ•˜๊ฒŒ cropํ–ˆ์ง€๋งŒ 1.7x ์ข‹์•„์„œ croppd์—์„œ 3:1 ์ •๋„๋กœ image๊ฐ€ ๋งŽ์€ ๊ฒƒ์„ ํ™•์ธํ–ˆ๋‹ค. ๋ฐ์ดํ„ฐ์˜ ๋ฐธ๋Ÿฐ์Šค๋ฅผ ๋งž์ถฐ์„œ ๋‹ค์‹œ ํ•™์Šต์„ ํ•ด๋ณด์•„์•ผ๊ฒ ๋‹ค.

sample loadํ•˜๊ณ  spiltํ•˜๋Š” ์ฝ”๋“œ์— min_count๋กœ ๊ฐ€์žฅ ๊ฐฏ์ˆ˜๊ฐ€ ์ ์€ label์— ๋Œ€ํ•ด์„œ 1:1๋กœ data balance๋ฅผ ๋งž์ถœ ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์˜€๋‹ค.
+optimizer๋ฅผ adma์—์„œ sgd 0.01 momentum 0.9 weight_decay=5e-4๋กœ ํ•˜๊ณ , 50step ๋งˆ๋‹ค opt updateํ•˜๋„๋ก ์ˆ˜์ •

๐Ÿ“– module.load_weights๋กœ .npz load๋ฅผ ํ•˜๊ณ  unseen data์— ๋Œ€ํ•ด์„œ ์ ์šฉํ•ด๋ณธ ๊ฒฐ๊ณผ ๊ฑฐ์˜ ๋ชจ๋“  ๊ฒฝ์šฐ์—์„œ ์˜ฌ๋ฐ”๋ฅธ ์˜ˆ์ธก์„ ํ•˜์˜€๋‹ค.
ํ•˜์ง€๋งŒ, ํ•ด๋‹น ๊ฒฐ๊ณผ๋ฅผ ๋ฐฑํŠธ๋ž˜ํ‚นํ•ด๋ณด๋‹ˆ, wiener2 7x7 ํ•„ํ„ฐ๋งŒ ์ ์šฉํ•ด๋„ Discriminator๋ฅผ ์ „๋ถ€ ์†์ด๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ–ˆ๋‹ค. + noise๋ฅผ ์ถ”๊ฐ€ํ•ด๋„ ์•ˆ๋จ
๋”ฐ๋ผ์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์กฐ๊ธˆ ๋” ๊ฐ€๊ณตํ•ด์„œ ๋‹ค์‹œ ํ•™์Šต์„ ์‹œ์ผœ์•ผ ๊ฒ ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ ‘๊ทผ๋ฒ•์„ ์—ฌ๋Ÿฌ๊ฐ€์ง€์˜ chain์œผ๋กœ ๊ฐ€์ ธ๊ฐ€๋Š” ๊ฒƒ์ด ๋งž์„ ๊ฒƒ ๊ฐ™๋‹ค๋Š” ์ƒ๊ฐ์ด๋“ ๋‹ค.


๐Ÿงน Denoiser Appoach


6.

dncnn ๋„์ž…

๐Ÿ–‹๏ธ ํ˜„์žฌ resnet20 ๊ธฐ๋ฐ˜์˜ d2f_D(discriminator)๊ฐ€ ํ•„๋ฆ„์˜ ์ƒ‰๊ฐ์„ ํ•™์Šตํ–ˆ๋‹ค๊ธฐ ๋ณด๋‹ค๋Š” texture๋ฅผ ํ•™์Šตํ–ˆ๋‹ค๊ณ  ์ƒ๊ฐ๋œ๋‹ค.
DnCnn์„ ๊ธฐ๋ฐ˜์œผ๋กœ denoisingํ•œ ๊ฒฐ๊ณผ๋กœ inference๋ฅผ ํ•ด๋ณธ ๊ฒฐ๊ณผ๊ฐ€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค. (์™„๋ฒฝํžˆ Discriminator๋ฅผ ์†์ž„)

๋”ฐ๋ผ์„œ ์ง€๊ธˆ ํ˜„์žฌ์˜ dataset์„ ๋‘๊ฐœ๋กœ ๋ถ„๋ฆฌ
โ˜๏ธ DnCnn ์ ์šฉ๋œ set(๋™์ผํ•˜๊ฒŒ ์ ์šฉ) -> ํ•œ์ชฝ๋งŒ ์ ์šฉ์‹œ ๋˜ dncnn์˜ ํŠน์„ฑ์„ ํŒŒ์•…ํ• ๊ฒƒ ๊ฐ™๋‹ค๋Š” ํŒ๋‹จ
๐Ÿ’• ๊ธฐ์กด ์›๋ณธ ์ด๋ฏธ์ง€


โ†ฆ ๊ทธ๋ฆฌ๊ณ  ์ง€๊ธˆ๊นŒ์ง€์˜ ๋ชจ๋ธ์ด discriminator๋กœ์จ ํ–‰๋™ํ•˜๋Š” ๋ชจ์Šต์„ ๋ณด์•˜์„๋•Œ, ํ•™์Šต์˜ ์•ˆ์ •์„ฑ์„ ์œ„ํ•ด์„œ๋Š” texture์™€ ์ƒ‰๊ฐ์„ ๋™์‹œ์— ์ง„ํ–‰ํ•˜๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ
2๊ฐ€์ง€ ๋ชจ๋“ˆ์˜ pipeline์ด ํ•„์š”ํ•ด ๋ณด์ธ๋‹ค.


7.

dncnn applied dataset d2f_D ๊ฒฐ๊ณผ

dncnnd์„ ํ†ต๊ณผ์‹œํ‚จ๋‹ค๊ณ , grain ํŠน์„ฑ์ด ์‚ฌ๋ผ์ง€์ง€๋Š” ์•Š์Œ โžก๏ธ ๊ธฐ์กด์˜ ์ด๋ฏธ์ง€์˜ ํŠน์„ฑ๊ณผ ์ „ํ˜€ ๋‹ค๋ฅธ ํŠน์„ฑ์œผ๋กœ ๋ณด์—ฌ์ง


Torch๋กœ ๋ณ€๊ฒฝ

๐Ÿ’ก MLX์—์„œ์˜ performance๊ฐ€ dramaticํ•˜์ง€ ์•Š๊ณ , MVP๋ฅผ ๋น ๋ฅด๊ฒŒ ๋งŒ๋“œ๋Š”๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ๋ณ€๊ฒฝ


8.

Grain Synthesis Weakness

FFT High Frequency Energy Ratios:
Digital Image: 28.44%
Film Image: 83.57%
Difference (Film - Digital): 55.13%

ํ•ด๋‹น ๊ฒฐ๊ณผ๋ฅผ ์ด์šฉํ•ด์„œ torch.fft๊ธฐ๋ฐ˜์˜ disciriminator ์„ฑ๋ถ„์„ ๋งŒ๋“ค์–ด์•ผํ•  ๊ฒƒ ๊ฐ™๋‹ค.

๋ณ„๋‹ค๋ฅธ ์„ฑ๊ณผ๋Š” ์—†์Œ โžก๏ธ fft๋ฅผ discriminator์—๋งŒ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์€ generator๊ฐ€ high frequency ์„ฑ๋ถ„์„ ํ•™์Šตํ•˜๊ฒŒ ํ•˜๋Š”๋ฐ ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.
๋˜ํ•œ ํ˜„์žฌ image์˜ ํ•™์Šต์—์„œ์˜ ์‚ฌ์ด์ฆˆ๊ฐ€ 256x256์ธ๋ฐ, film ์ด๋ฏธ์ง€์˜ bilinear transform์—์„œ ๋งŽ์€ texture ์ •๋ณด๊ฐ€ ๋‚ ์•„๊ฐ€๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๋”ฐ๋ผ์„œ ํ˜„์žฌ UEGAN์—์„œ๋Š” ์ƒ‰๊ฐ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•˜๊ณ , ํ•„๋ฆ„ ์ด๋ฏธ์ง€๋ฅผ denosiingํ•˜์—ฌ์„œ paired data๋กœ film grain model์„ ์ถ”๊ฐ€ ํ•™์Šตํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ€๋ ค๊ณ  ํ•œ๋‹ค.


9.

denoising Networks

Denoising์ด๋‚˜ image resolution๋“ฑ๊ณผ ๊ฐ™์€ ๋ชจ๋ธ์„ ๋ณผ ๋•Œ, ์ข‹์€ ์‚ฌ์ดํŠธ๋ฅผ ์ฐพ์Œ

paperswithcode

Dataset์„ ๊ฒ€์ƒ‰ํ•˜๋ฉด ํ•ด๋‹น ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•˜๋Š” task์˜ ๋…ผ๋ฌธ๋“ค์ด ์ •๋ฆฌ๋˜์–ด ์žˆ์–ด์„œ SOTA๋‚˜ ๋‚˜์™€ ๊ฐ€์žฅ fitํ•œ ๋ชจ๋ธ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค.
๋‚˜๋Š” ์ง€๊ธˆ KODAK24 dataset์„ ํ™œ์šฉํ•˜๋Š” ๋ชจ๋ธ๋“ค ์œ„์ฃผ๋กœ ํ™•์ธํ•œ๋‹ค.

๐Ÿ“ : Restormer / SwinIR / DMID-d

DMID(diffusion based) : Github DMID
Restormer(transformer based) : Github Restormer


10.

Unsuperviesd Denoiser

์ด๊ฒŒ ์ œ์ผ ๋Œ€๋ฐ•์ด๋ผ๊ณ  ์ƒ๊ฐ๋จ

Stimulating Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling
์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ถ€๋ถ„์ ์ธ gaussian noiser ์˜€๋Š”๋ฐ, ๋Œ€๋ฐ•์ž„

๐Ÿ“ (Insane) AN UNSUPERVISED DEEP LEARNING APPROACH FOR REAL-WORLD IMAGE DENOISING

๋‚ด๊ฐ€ ์ƒ๊ฐํ•˜๋Š” AI์˜ ๋ฐฉํ–ฅ์ž„ / ์ž์ฒด์ ์œผ๋กœ ํ•™์Šต์„ ์ด์–ด์„œ ํ•˜๋‹ค๊ฐ€ โžก๏ธ ์ด์ •๋„๋ฉด ๋๋‹ค๊ณ  ํ• ๋•Œ ๊ทธ๋งŒ๋‘ 

Key idea

  • SURE (Steinโ€™s Unbiased Risk Estimator)
  • Unet Based Enc / Dec -> learning picture by picture gaussian denoising

์ด๋ฏธ์ง€ ๋งˆ๋‹ค DL์„ ํ•™์Šตํ•ด์•ผ๋˜์„œ cost๋Š” ๋†’์ง€๋งŒ, ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์ด ํ•ฉ์ณ์ง€๋ฉด ๊ต‰์žฅํžˆ ์ข‹์€ ์‹œ๋„ˆ์ง€๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•จ

โ€ผ ๊ฒฐ๊ตญ ๋ชจ๋“  ํ˜„์ƒ์€ Normal Distribution์„ ๋”ฐ๋ฅด๊ฒŒ ๋˜๋‹ˆ๊นŒ

๐Ÿ˜‰ ๋ฌธ์ œ๋Š” ์ด๋ฏธ์ง€๋‹น 50๋ถ„์ด ๊ฑธ๋ฆฐ๋‹ค๋Š”๊ฑฐ์ž„

์ฝ๋‹ค๊ฐ€ ํฌ๊ธฐํ•จ / lagrangian ๊นŒ์ง€๋Š” ์–ด๋–ป๊ฒŒ ์ดํ•ดํ•˜๊ฒ ๋Š”๋ฐ, ADMM์ด๋ž‘ augmented lagrangian method์˜ ์ˆ˜์‹์„ ์ดํ•ดํ•˜๊ธฐ ์–ด๋ ค์›€

์ตœ์ ํ™” ์ด๋ก  ์ˆ˜์—…์„ ํ†ตํ•ด์„œ ํ•ด๊ฒฐํ•ด์•ผ๋˜๋Š” ๋ฌธ์ œ


11.

Neural Styletrasnfer

์ผ๋‹จ ์ƒ‰๊ฐ ์ •๋ณด๋Š” UEGAN์˜ ๊ตฌ์กฐ์™€ loss function์ด ์ž˜ ์ง€์ผœ์ค€๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์„๋•Œ, texture์— ๋Œ€ํ•œ ๋‚ด์šฉ์„ ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐํ• ์ง€์— ์ง‘์ค‘

style transfer์˜ ๊ทผ๋ณธ ๋…ผ๋ฌธ์—์„œ๋ถ€ํ„ฐ vgg19์˜ usage๋ฅผ ํ™•์ธํ•ด๋ณด๋ ค๊ณ ํ•จ

๐Ÿ‘ Gatys๊ฐ€ ์‚ฌ์šฉํ•œ gram matrix์—์„œ vgg19์˜ conv block ๋ณ„ content style ์ •๋ณด๋ฅผ ๊ฐ€์ง€๋Š” layer๋ฅผ ๊ตฌ๋ถ„ํ•˜๊ณ  ์žˆ๋Š” ์‚ฌ์‹ค์„ ๋ฐœ๊ฒฌํ•จ โžก๏ธ ์ด๋ฅผ ํ™œ์šฉํ•˜์—ฌ์„œ ๋น„์Šทํ•œ content์˜ ๋‹ค๋ฅธ style์ธ ์‚ฌ์ง„์„ ๊ทธ๋‚˜๋งˆ ๊ณจ๋ผ์„œ ๋น„๊ตํ•ด๋ณด๋ ค๊ณ  ํ•จ

  • ์ดˆ๋ฐ˜ ๋ ˆ์ด์–ด(conv1~conv3): ๋„ˆ๋ฌด ๋กœ์šฐ๋ ˆ๋ฒจ(low-level) ํŠน์ง• (์—์ง€, ์งˆ๊ฐ) โ†’ content ์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์Œ.
  • ์ค‘๊ฐ„ ๋ ˆ์ด์–ด(conv4_2, conv5_2): ํ˜•ํƒœ(structure)์™€ ์˜๋ฏธ์ ์ธ ์ •๋ณด๊ฐ€ ์ž˜ ์œ ์ง€๋จ โ†’ content feature๋กœ ์ ํ•ฉ.
  • ๊นŠ์€ ๋ ˆ์ด์–ด(conv5_4 ์ดํ›„): ์ถ”์ƒ์ ์ธ ๊ฐœ๋…์ด ๊ฐ•ํ•ด์ ธ์„œ ์„ธ๋ถ€ ๊ตฌ์กฐ ์†์‹ค ๊ฐ€๋Šฅ์„ฑ โ†’ content ์ •๋ณด๋กœ ๋ถ€์ ํ•ฉ.

โ˜•๏ธ vgg layer ๋ณ„ cosine ์œ ์‚ฌ๋„(film image vs digital image)

โ˜•๏ธ vgg layer ๋ณ„ cosine ์œ ์‚ฌ๋„(digital image vs digital image with GN)

ํ™•์‹คํžˆ vgg์˜ content feature์™€ style feature๊ฐ€ ์กด์žฌํ•˜๊ธฐ๋Š” ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•จ (content loss์™€ style loss weight๋ฅผ ์กฐ์ ˆํ•˜๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•œ ์š”์†Œ์ž„)

๋‹ค์–‘ํ•œ ๋ฐฉ์‹์„ ์‹œ๋„ํ•ด๋ณธ ๊ฒฐ๊ณผ, ์ด๊ฑด ํ™•์‹คํžˆ style๋ณด๋‹ค๋Š” ํŠน์ • ํŒจํ„ด, ํ˜น์€ ์ƒ‰์ƒ์„ ๋„ฃ๋Š” ๊ฒƒ์— ๊ฐ€๊นŒ์›€ (์ฒ˜์Œ ๋ฐœ๊ฒฌํ–ˆ์„๋•Œ vgg๋งŒ์œผ๋กœ content๋ฅผ ๋ณด์กดํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์€ ๊ต‰์žฅํ•œ ์˜๋ฏธ๊ฐ€ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ์ƒ๊ฐ๋จ)


๐Ÿ‘พ Model approaches


12.

How does UEGAN preserves content details

UEGAN์„ ๋ณด๋ฉด conent๋Š” preservingํ•˜๋ฉด์„œ ํŠนํžˆ colour ์ •๋ณด๋ฅผ ์ž˜ ๋ณ€ํ˜•ํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

๐Ÿ‘น ํ˜„์žฌ๊นŒ์ง€์˜ Limitation

  1. ์ƒ‰๊ฐ ๋ณ€ํ™˜์ด ๊ณผ๋„ํ•˜๊ฒŒ ๋˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Œ (L_qual์˜ ์ˆ˜์น˜๋ฅผ ์กฐ์ ˆํ•ด์•ผ ๋  ๋“ฏํ•จ 0.1์ด์ƒ์—์„œ๋Š” too heavyํ•˜๊ฒŒ ๋ณ€ํ˜•์ด ์ผ์–ด๋‚จ)
  2. Grain์— ํ•ด๋‹น ํ•˜๋Š” ๋‚ด์šฉ์€ synthesisํ•˜์ง€ ๋ชปํ•จ (DL ํŠน์„ฑ์ƒ 2๊ฐ€์ง€ task๋ฅผ ํ•œ๋ฒˆ์— ํ•ด๊ฒฐํ•˜๋ ค๊ณ  ํ•˜๋ฉด ๋” ๋ณต์žกํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋จ)

๐Ÿคž UEGAN์˜ ์ด์ 

  1. GAM(global attention Module)์„ ์‚ฌ์šฉํ•ด์„œ illumination์ด๋‚˜ colour๋ฅผ ๋” ์ž˜ ํฌ์ฐฉํ•˜๊ณ  ์ง‘์ค‘ํ•˜๋„๋ก ์„ค๊ณ„๋จ
  2. 3๊ฐ€์ง€ loss(Quality, Fidelity, Identity)๋ฅผ ์ด์šฉํ•ด์„œ ๊ฐ๊ฐ (enhancement, content preserve, over-enhancement preserve)

์ด๋ฒˆ task์˜ ํ•ต์‹ฌ์€ ์™„๋ฒฝํ•  ์ •๋„์˜ qualitive score๋ผ๊ณ  ์ƒ๊ฐํ•จใ…‡ใ…‡


13.

RGB grain analysis

ํ™•์‹คํžˆ ์ฐจ์ด๊ฐ€ ๋งŽ์ด ๋‚จ (๊ฐ™์€ ์™ผ์ชฝ ์ƒ๋‹จ ์œ„์น˜์—ฌ์„œ ํ•˜๋Š˜์ƒ‰ ๋‹จ์ƒ‰ ๋ถ€๋ถ„์ž„)

๐Ÿ‘บ ์ด๊ฒŒ ์ œ์ผ ์ค‘์š”ํ•œ ๋‚ด๊ฐ€ ๋ดค์„๋•Œ๋Š”..


14.

spatial analysis
Film 3D R channel value
Digital 3D R channel value
Overlay of two

15.

The key Points!11!!!!!!

3x3 stride=1๋กœ max_val์ด ์žˆ๋Š” ๊ณณ์„ 255๋กœ ์ฑ„์šด ๊ฒฐ๊ณผ๋ฌผ -> ์ฆ‰ film์˜ silver halide๊ฐ€ ์กด์žฌํ•จ์„ ์—ญ์œผ๋กœ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ๊ทธ surface ํŠน์„ฑ์„ ๊ฐ€์ง€๋„๋ก ํ•˜๋Š”๊ฒƒ์ด ํ•ต์‹ฌ์ผ ๊ฒƒ์œผ๋กœ ๋ณด์ž„ (๋ฌด์กฐ๊ฑด RGB ์ฑ„๋„ ๋”ฐ๋กœํ•˜๋Š”๊ฒŒ ๋งž์Œ)


16.

Max value interpolation

interpolationd์—์„œ min value์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ์†์‹ค๋˜์„œ grain์— ๋Œ€ํ•œ ํŠน์„ฑ์„ max_val pooling์ด ๋Œ€ํ‘œํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ์ƒ๊ฐ๋˜์ง€๋งŒ,

ํ˜น์—ฌ grainํ•œ pixel halide selection์œผ๋กœ GAN ํ•™์Šต input์œผ๋กœ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ ํ•ด๋‹น detail์˜ ๋ณด์กด ๋ฐ ์žฌ์ƒ ์—ฌ๋ถ€๋Š” ๋ฏธ์ง€๋ผ๊ณ  ์ƒ๊ฐํ•จ

Combined pooling and interpolated results

Min pooling
Combined(Min/Max) pooling

์•„๋ž˜ ๋ฐฉ์‹์ด ์ข€ ๋” grainํ•˜๊ฒŒ ๋ถ„ํฌํ•จ -> ์ด๊ฑฐ๋ฅผ input์œผ๋กœ ํ™œ์šฉํ•  ์˜ˆ์ •

RGB mask ๊ฐ๊ฐ์˜ density

Type Density of R / 1s Density of G / 1s Density of B / 1s
Film full size 0.10262 0.10287 0.17325
Film crop size 0.10365 0.095427 0.1106
Digital full 0.43867 0.4313 0.44063

์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ์—ฌ๊ฒจ์ง

๐Ÿ’ก min pooling์„ ํ• ๋•Œ, 255๋ฐ˜์ „์—์„œ max pooling์„ ์‹œ์ผœ์„œ ์ง„ํ–‰ํ•˜์—ฌ์„œ masking์ด ์ •์ƒ์ ์œผ๋กœ ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•จ

์ฃผ์˜ํ•ด์•ผ๋˜๋Š”๊ฒŒ, ์•„์•  red๊ฐ€ 0์œผ๋กœ ๋˜๋ฒ„๋ฆฌ๋Š” ๊ตฌ๊ฐ„๋„ ์กด์žฌํ•˜๋‹ˆ๊นŒ ๊ทธ๋Ÿฐ ๊ตฌ๊ฐ„์—์„œ์˜ pp halide๋ฅผ ์–ด๋–ป๊ฒŒ ์„ค์ •ํ• ์ง€์— ๋Œ€ํ•œ ๊ณ ๋ฏผ์ด ํ•„์š”ํ•ด ๋ณด์ž„

python์œผ๋กœ loadingํ•œ data

250310 : pp halide์— ๋Œ€ํ•ด์„œ 1์ฐจ ๊ฒฐ๋ก ์„ ๋‚ด๋ฆผ / 3x3 window์—์„œ ๋งˆ๋ฃจ์™€ ๊ณจ์— ๋Œ€ํ•œ mask๋ฅผ ์น ํ•˜๋ฉด ์—ฌ์ „ํžˆ random์„ฑ์„ ๋ณด์žฅํ•ด ์ฃผ๋Š” ๊ฒƒ์„ ํ™•์ธํ•จ

pp halide์— ๋Œ€ํ•ด์„œ max min pooling์„ ํ•˜๊ณ  ํ•ด๋‹น mask์™€ result png(๋ฌด์†์‹ค ์ด๋ฏธ์ง€ ํŒŒ์ผ)์— ๋Œ€ํ•ด์„œ network๋ฅผ ๊ตฌ์„ฑํ•˜๊ณ  ์ง„ํ–‰ํ•˜๋ฉด reconsturction์ด ๊ธฐ๋Œ€์— ๋ฏธ์น˜๊ฒŒ ๋‚˜์˜ฌ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ

digital image์— ๋Œ€ํ•ด์„œ๋Š” ๋ฐ๊ธฐ ๊ธฐ๋ฐ˜ mask pp halide๋ฅผ ๋ฟŒ๋ฆฌ๊ณ  ํ•ด๋‹น pixel์—์„œ์˜ value๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ network๋กœ reconstruction

(why? digital image๋Š” ์ด๋ฏธ ๋ฐ€์–ด๋ฒ„๋ฆฐ image์ด๊ธฐ ๋•Œ๋ฌธ์— min max๋ฅผ ํ•ด๋‹น pixel๋กœ ์„ค์ •ํ•˜์—ฌ๋„ ๋ฌด๋ฆฌ๊ฐ€ ์—†๋‹ค๊ณ  ํŒ๋‹จ)

๋‹ค๋งŒ ์ƒ‰๊ฐ์— ์žˆ์–ด์„œ ํ•ด๋‹น pp halide์˜ ๋ถ„ํฌ๋ฅผ ๊ทธ๋ƒฅ 1d vector๋กœ representative transformation์„ ํ•ด๋„ ์ƒ๊ด€์—†์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋จ

์™œ๋ƒํ•˜๋ฉด pixel value๊ฐ€ ๊ณผ์—ฐ ๋ Œ์ฆˆ์™€ film fixing์—์„œ structuralํ•œ globalํ•œ ์ •๋ณด๋ฅผ ๋ฐ˜์˜ํ•˜๋Š”์ง€ ๋ชจ๋ฅด๊ฒ ์Œใ…‡ใ…‡

์ด๋•Œ ์•„์•  ์–ด๋‘์šด ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ blue mask๊ฐ€ ๊ต‰์žฅํžˆ ๋ฐ€๋„๊ฐ€ ๋†’์€ ๋ชจ์Šต๋“ค์ด ํฌ์ฐฉ๋จ (์—ฌ๊ธฐ์„œ ์›๋ž˜๋ผ๋ฉด film์€ max value๋งŒ์ด ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€์ง€๋งŒ ๊ฒฐ๊ตญ ๋ชจ๋ฐฉ์„ ๊ตณ์ด ๋˜‘๊ฐ™์ด ํ•  ํ•„์š”๋Š” ์—†๋‹ค๋Š” ์ƒ๊ฐ์œผ๋กœ desnity๊ฐ€ ๊ฑฐ์˜ 0์— ๊ฐ€๊นŒ์šด ๋ถ€๋ถ„๋„ pp halide๋กœ ์ถ”์ถœํ•˜๊ธฐ๋กœ ๊ฒฐ์ • -> detail ๋ฐ edge ์ •๋ณด ์œ ์ง€)

์–ด๋‘์šด ๋ถ€๋ถ„๊ณผ ๋ฐ์€ ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ pp halide random seed๋ฅผ density ๋ฐ€๋„๋ฅผ ๋‹ค๋ฅด๊ฒŒ ํ•ด์„œ digital ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ ์ ์šฉํ•  ๊ฒƒ์ด๊ธฐ ๋–„๋ฌธ์— ๋ณ„๋กœ ๋ฌธ์ œ๊ฐ€ ๋œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ง€๋Š” ์•Š์Œ


17.

Dataloader composition

  1. 256x256 Crop
  2. randomhorizontal filp(0.5)
  3. manual_seed for cropping same regions for 3 image input(pp halide(result, mask), og img)
  4. croping 3 times random for same image for data augmentation

๐Ÿค” hmmโ€ฆ Attention์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์€๋ฐ, ์–ด๋–ป๊ฒŒ ๋ณด๋ฉด maskr๊ฐ€ attention์—ญํ• ์ด๋‹ˆ๊นŒ ๐ŸŽ€ uncertainํ•œ ๊ฒƒ์€ ๊ณผ์—ฐ ๋ชจ๋ธ์ด detail์„ ์‚ด๋ฆด ์ˆ˜ ์žˆ์„์ง€, ๊ทธ๋ฆฌ๊ณ  rgb ์ฑ„๋„์„ ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šต์‹œ์ผฐ์„๋•Œ, rgb๊ฐ„์˜ harmony๋ฅผ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋Š”์ง€๊ฐ€ ๊ฐ•๊ถŒ์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋จ


18.

UNET 512 channel based ouptut

Trial Model Configuration
Trial1 UNET128 RGB + harmony
Trial2 UNET128 harmony removed
Trial3 UNET128 RGB + harmony
Trial4 UNET512 crop size 256 / 4
Trial5 UNET512 crop size 16 / 128
  • epochs์„ 150์ •๋„๋ฅผ ํ•™์Šต์‹œ์ผœ์•ผ MSE loss๊ฐ€ 0.0044 ์ •๋„ ๋‚˜์˜จ๋‹ค (epoch 10์œผ๋กœ๋Š” ํ•™์ŠตX -> trail 1,2,3๋ฅผ ๋‹ค์‹œ ์‹œ๋„ํ•ด๋ณผ ๊ฐ€์น˜๊ฐ€ ์žˆ์Œ)
  • trail4 ๋ฐฉ๋ฒ•์—์„œ ๋‹ค์†Œ mask image ๊ฐ’์ด ๊ฐ•์กฐ๋˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์–ด์„œ, ์ผ๋ฐ˜ํ™” ์˜ํ–ฅ์ธ๊ฐ€ ์˜์‹ฌ๋˜์–ด pixel value๋Š” 16x16 ๋งŒ๋“œ๋กœ๋„ ์ถฉ๋ถ„ํžˆ interpolate ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํŒ๋‹จํ•˜๊ณ  trail 5 ์ˆ˜ํ–‰

16x16์ด ์กฐ๊ธˆ ๋” ์„ ๋ช…ํ•˜๊ณ  saltyํ•˜์ง€ ์•Š์€ ์ด๋ฏธ์ง€๋ฅผ ๋‚ด๋ณด๋‚ด๊ธดํ•˜๋‹ค.

๊ทผ๋ฐ ์—ฌ๊ธฐ์„œ salty๋Š” png์™€ jpg์˜ ์ฐจ์ด๋ผ๊ณ  ์ƒ๊ฐ๋˜๊ธฐ๋„ ํ•œ๋‹ค. (๋‹น์žฅ ํ™ˆํŽ˜์ด์ง€ ๋ Œ๋”๋ง๋งŒ ๋ด๋„ smoothํ•ด์กŒ์Œ)

๐Ÿ“ฒ ํ™•์‹คํ•œ ์ ์€ ๋””ํ…Œ์ผ์ ์ธ ๋ถ€๋ถ„์„ ์บก์ณํ•˜์ง€(๋‹ค์†Œ ์ด๋ฏธ์ง€์˜ RGB ์ฑ„๋„๊ฐ„ ๋ญ‰๊ฒŒ์ง„๋‹ค๊ณ  ์ƒ๊ฐ๋จ) ๋ชปํ•˜๋‹ค๋Š” ์ ์—์„œ UNET์ด ์•„๋‹ˆ๋ผ ์˜คํžˆ๋ ค channel๋ฐฉํ–ฅ๋ณด๋‹ค h,w ๋ฐฉํ–ฅ์œผ๋กœ์˜ ํŒฝ์ฐฝ์„ ์ƒ๊ฐํ•ด๋ณด๊ณ  ์‹ถ์–ด์งใ…‡ใ…‡


19.

Digital Image Application(statistic based) - Failed

๋‹จ์ˆœํžˆ RGB channel ๋ณ„ mask์™€ brightness์˜ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•ด์„œ mask๋ฅผ ๋งŒ๋“ค๊ณ  trail5 model๋กœ reconstruction ํ•ด ๋ณผ ์˜ˆ์ •

Analyzing dataset: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 375/375 [01:11<00:00, 5.25it/s]
Average Mask R pixel density:
Value 0: 0.8809 (ยฑ0.0329)
Value 255: 0.1191 (ยฑ0.0329)

Average Mask G pixel density:
Value 0: 0.8902 (ยฑ0.0149)
Value 255: 0.1098 (ยฑ0.0149)

Average Mask B pixel density:
Value 0: 0.8727 (ยฑ0.0418)
Value 255: 0.1273 (ยฑ0.0418)

์‹ค๋ง์Šค๋Ÿฝ์ฃ ใ…‡ใ…‡

๊ฑฐ์˜ ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฌผ์„ ๊ฐ€์ ธ์˜ด -> ๋ฌผ๋ก  ์ด์ƒํ•œ artifact๋„ ๋ฐœ์ƒํ•จใ…‡ใ…‡ (๋ ˆ์ธ๋ณด์šฐ artifact๊ฐ€ ์กด์žฌํ•จใ…‡ใ…‡)

๐Ÿฅ… Goal ์ˆ˜์ • -> ์ผ๋‹จ maks์™€ result (pp halide)๋กœ reconstruction์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•˜๊ธฐ ๋•Œ๋ฌธ์—, digital ์‚ฌ์ง„์— ์–ด๋–ป๊ฒŒ pp halide๋ฅผ ๋ถ„ํฌ ์‹œํ‚ฌ์ง€์— ๋Œ€ํ•œ ์ ‘๊ทผ์œผ๋กœ ์ƒ๊ฐํ•˜์ž

ํƒœ์ดˆ์— pp halide๋กœ ์ ‘๊ทผํ–ˆ๋˜ ๋ชฉ์ ์ด -> grain synthesis๋ฅผ ๋” ์‰ฝ๊ฒŒ ํ•˜๊ณ ์ž ํ–ˆ๋˜ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ผ๋‹จ์€ digital image์— ๋Œ€ํ•ด์„œ grain synthesis๊ฐ€ ๊ฐ€๋Šฅํ•œ๊ฐ€์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๊ณ ๋‚˜์„œ, ์ƒ‰๊ฐ(colour)๋กœ ๋„˜์–ด๊ฐˆ ๊ฒƒ์ž„

๋”ฐ๋ผ์„œ ์ผ๋‹จ detail์„ ์œ„ํ•ด์„œ๋งŒ์„ ์œ„ํ•ด์„œ ์ถ”๊ฐ€ํ–ˆ๋˜, min pooling์„ ๋นผ๊ณ  max pooling๋งŒ์œผ๋กœ ๊ตฌ์„ฑ๋œ datset๋ฅด ๊ฐ€์ง€๊ณ  UNET์ด ๊ณผ์—ฐ reconstruction์„ ์˜ณ๋ฐ”๋ฅด๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ํ™•์ธํ•  ๊ฒƒ


20.

Max pooling(only) dataset training - for the grain

์™€ ์ผ๋‹จ ์ง€๊ธˆ๊นŒ์ง€ ๊ฐ€์žฅ ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ• ๋ฐœ๊ฒฌ!

  1. DIANET์„ ์ •์˜ํ•จ (์œ„์—์„œ ๋งํ–ˆ๋˜ UNET๊ณผ ๋ฐ˜๋Œ€์˜ ๊ตฌ์กฐ๋กœ channel๊ณผ hw๊ฐ€ ํŒฝ์ฐฝํ–ˆ๋‹ค๊ฐ€ ์ค„์–ด๋“œ๋Š” ๊ตฌ์กฐ์ž„ -> skip connection๋„ ์กด์žฌ)
  2. ์ด๋•Œ channel์˜ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ์ปค์ง€๋ฉด model size๊ฐ€ ๊ต‰์žฅํžˆ ์ปค์ง
  3. 32x32 patch x 64๋กœ ํ•™์Šต ์ง„ํ–‰ํ•จ
  4. Digital image์—์„œ pp halide ๋ฐ€๋„๋ฅผ 2๋ฐฐ๋กœ ์˜ฌ๋ฆผใ…‡ใ…‡ใ…‡ (๋ณธ๋ž˜ min max density์ธ 0.1์—์„œ 2๋ฐฐ, 3๋ฐฐ๋กœ ์˜ฌ๋ฆฌ๋‹ˆ๊นŒ artifact๊ฐ€ ํ™•์‹คํžˆ ์ค„์–ด๋“ค๋ฉด์„œ grainyํ•œ ์ด๋ฏธ์ง€๊ฐ€ ์ƒ์„ฑ๋จ)
  5. (limitation1) size(3024x3024)๊ฐ€ ์ปค์กŒ์„๋Œ€ UNET์—์„  ๋ฐœ์ƒํ•˜์ง€ ์•Š์•˜๋˜ Memory ๋ถ€์กฑ์ด ๋ฐœ์ƒํ•จ
  6. (limitation2) colour error๊ฐ€ ์‹ฌํ•จ โžก๏ธ film L2 loss์—์„œ colour์— ๋Œ€ํ•œ ๋‚ด์šฉ์„ ์ง‘์ค‘ํ•˜์ง€ ๋ชปํ•˜๊ณ  structuralํ•œ ๋‚ด์šฉ์— ์ง‘์ค‘ํ•˜๋‹ค๊ฐ€ ํ•™์Šต์ด ๋๋‚œ๊ฒƒ ๊ฐ™์€ ๋А๋‚Œ
  • Densiety * 3 (approx 0.3)
  • Density * 1

๋ฌผ๋ก  ์ด๋•Œ density ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ RGB pp halide๊ฐ„์˜ correlation๋„ ํ†ต๊ณ„๋ฅผ ๋‚ด๋ ค์„œ ๋ถ„ํฌ ์‹œํ‚ด

๐Ÿ’ญ ์•Œ ์ˆ˜ ์žˆ๋Š” ์ 

  • UNET๊ณผ DIANET์€ ํ™•์‹คํžˆ ์ƒ๊ฐํ•œ๊ฒƒ๊ณผ ๊ฐ™์ด fineํ•œ ๋ถ€๋ถ„์„ ์ƒ์„ฑํ•˜๋Š”๋ฐ ์ฐจ์ด๋ฅผ ๋ณด์ธ๋‹ค.
  • enc / dec ์˜ ํ•™์Šต output์„ ํ™•์ธํ•˜๋ฉด ์ดˆ๋ฐ˜์—๋Š” structuralํ•œ ์˜์—ญ์„ ์žก์œผ๋ ค๊ณ  ๋…ธ๋ ฅํ•˜๊ณ , ์™ ๋งŒํฐ structuralํ•œ ๋‚ด์šฉ์ด saturation๋˜๋ฉด ๊ทธ๋•Œ๋ถ€ํ„ฐ ์ƒ‰๊ฐ ์ •๋ณด๋ฅผ ํŒŒ์•…ํ•˜๋ ค๊ณ  ๋…ธ๋ ฅํ•จ
  • UNET(0.0044), DIANET(0.017) ์˜ L2 loss๊ฐ€ ์ฐจ์ด๊ฐ€ ๋งŽ์ด๋‚˜๋Š” ๋ชจ์Šต์„ ๋ณด์ด๋Š”๋ฐ, DIANET์€ structural ์ •๋ณด๋ฅผ ๋งŽ์ด ์‚ด๋ฆฌ์ง€ ๋ชปํ•จ(why? min pooling์„ ์‚ญ์ œํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— edge๋‚˜ black spot์— ๋Œ€ํ•œ inference์—์„œ ์•ฝํ•œ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋จ)
  • parmeter ์ˆ˜๋Š” DIANET์ด ๋” ์ ์„์ง€ ๋ชฐ๋ผ๋„ image๋ฅผ transposedConv2d๋กœ upscalingํ•˜๋ฉด์„œ ์ถ”๊ฐ€์ ์œผ๋กœ channel๋„ ํ‚ค์›Œ์„œ memory ์‚ฌ์šฉ๋Ÿ‰์ด ๊ธ‰์ฆํ•จ
  • ๋ฐ˜๋ฉด UNET์˜ ๊ฒฝ์šฐ channel์„ ๋Š˜๋ฆฌ์ง€๋งŒ hw๋„ ์ค„์–ด๋“ค๊ธฐ ๋•Œ๋ฌธ์— memory expolde๊ฐ€ ๋ฐœ์ƒํ•˜์ง€๋Š” ์•Š์Œ

only max

max min pooling

Digital applied Comparison

UNET (Min-Max)
UNET (Max)
DIANET (Max) + density * 2

๊ฐ€์žฅ deep ํ•œ layer์—์„œ UNET512์™€ DIANET(3enc)์˜ float32์—์„œ์˜ ๋ฉ”๋ชจ๋ฆฌ usage๋Š”

๋ชจ๋ธ ์ถœ๋ ฅ ํ…์„œ ํฌ๊ธฐ ์ด ์š”์†Œ ์ˆ˜ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ๋น„๊ณ 
UNET 512x16x16 131,072 0.5 MB (524,288 B) -
DIANET 24x1024x1024 25,165,824 96 MB (100,663,296 B) 192x


๐Ÿ’ญ ์•Œ ์ˆ˜ ์žˆ๋Š” ์ 

  • model parameter size๋Š” UNET์ด ํ›จ์”ฌ ๋งŽ์ง€๋งŒ, memory usage๋Š” DIANET์ด ๋งŽ์Œ (UNET์ด ๊ฐ€์ง„ ๊ฐ•์ ์„ ๋ณผ ์ˆ˜ ์žˆ์Œ)
  • trade-off (memory vs parameter size)


๐Ÿž๏ธ Loss functions


21.

Importance of loss function

LAB์œผ๋กœ ํ•˜๋‹ˆ๊นŒ contrast๊ฐ€ ๋„ˆ๋ฌด ์Ž„์ง€๋Š” ํ˜„์ƒ ๋ฐœ์ƒ

dynamic loss function ์‚ฌ์šฉ

์˜ค๋Š˜์€ ๊ธฐ์กด์˜ ๋‹จ์ˆœํ•œ RGB ๊ธฐ๋ฐ˜ MSE loss๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์—์„œ ๋ฒ—์–ด๋‚˜์„œ, RGB MSE, LPIPS, SSIM, Gram Matrix(texture)๋ฅผ ์กฐํ•ฉํ•œ dynamic loss ๋ฐฉ์‹์„ ๋„์ž…ํ–ˆ์–ด. ์ดˆ๊ธฐ์—” structural ์ •๋ณด(SSIM)์™€ perceptual ์ •๋ณด(LPIPS)์— ์ง‘์ค‘ํ•˜๊ณ , ํ›„๋ฐ˜์—” color(RGB)์™€ ์„ธ๋ถ€์ ์ธ grain ํ‘œํ˜„(Gram Loss)์— ์ง‘์ค‘ํ•˜๋„๋ก ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•˜๋Š” ์ „๋žต์„ ์„ ํƒํ–ˆ์–ด. ์ถ”๊ฐ€๋กœ ๊ฐ loss ๊ฐ„์˜ ํฌ๊ธฐ(scale) ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด normalization ๋˜๋Š” adaptive weighting ์ „๋žต์„ ๊ณ ๋ฏผํ–ˆ์–ด. TensorBoard๋ฅผ ์ด์šฉํ•ด loss๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ถ”์ ํ•˜๊ณ  ์‹œ๊ฐํ™”ํ•˜๋„๋ก ๊ตฌ์„ฑํ–ˆ์–ด. ๐ŸŒŸ

  • MSE ๊ธฐ๋ฐ˜์€ structural ์ •๋ณด๋ฅผ captureํ•˜๊ธฐ๋Š” ํ•˜์ง€๋งŒ ๊ฐ•ํ•˜์ง€ ์•Š์Œ
  • local texture(grain) ์ •๋ณด (fine details) ๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์—†์Œ (์™œ๋ƒํ•˜๋ฉด loss๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด์„œ fineํ•œ random์„ฑ์„ ์˜ˆ์ธกํ• ๋ฐ”์—๋Š” ๊ทธ๋ƒฅ mean์œผ๋กœ texture ์ฃฝ์ด๊ณ  ํ‰ํƒ„ํ™”ํ•ด์„œ ๊ฒฝํ–ฅ์„ฑ๋งŒ ๋งž์ถ”๋ ค๊ณ  ํ•จ)

์˜คํžˆ๋ ค ์ด๋ ‡๊ฒŒ loss function์„ ์„ค์ •ํ•˜๊ณ  ๋‚˜๋‹ˆ๊ฐ€ dianet์ด detail ํ‘œํ˜„ capa์— ๋ฌด๋ฆฌ๊ฐ€ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐ๋จ (๋„ˆ๋ฌด 32x32์— fitํ•ด ์ ธ์„œ 2048x2048์—์„œ๋„ ๋‹ค์†Œ ๋‘ํƒํ•˜๊ณ  ๋„“์€ artifact๋กœ ๋‚˜์˜ค๋Š”๋“ฏํ•˜๋‹ค 150epoch๊นŒ์ง€ ์ผ๋‹จ ๋ณด๊ณ  ๊ฒฐ์ •)

๋”ฐ๋ผ์„œ loss๋ฅผ dynamicํ•˜๊ฒŒ ์„ค์ •ํ•˜๊ณ  UNET์ด ๊ณผ์—ฐ fine detatail์„ ์‚ด๋ฆด ์ˆ˜ ์žˆ์„์ง€ ์ง€์ผœ๋ณด์ž.


๐ŸฅŠ ์ด ์†์‹คํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌ์„ฑ๋จ:

Total Loss = ฮฑ ร— (RGB MSE) + ฮฒ ร— (LPIPS) + ฮณ ร— (SSIM) + ฮด ร— (Gram Loss)

๐Ÿ“Œ ๋‹จ๊ณ„๋ณ„ Scaling Factor (๊ฐ€์ค‘์น˜ ์กฐ์ •)
์ง„ํ–‰๋ฅ  (Epoch %) RGB MSE (ฮฑ) LPIPS (ฮฒ) SSIM (ฮณ) Gram Loss (ฮด) ์ง‘์ค‘ ์š”์†Œ
0 ~ 30% 0.2 0.7 0.8 0.3 Structure (SSIM), Perceptual (LPIPS)
30 ~ 60% 0.5 0.4 0.4 0.5 ๊ท ํ˜• ์žกํžŒ ํ•™์Šต (Balanced)
60 ~ 100% 0.8 0.3 0.2 0.6 Color ๋ฏธ์„ธ์กฐ์ • (RGB), ์งˆ๊ฐ (Gram)
  • ์ดˆ๊ธฐ(0~30%): ๊ตฌ์กฐ์  ๋ฐ perceptual ์ •๋ณด ํ•™์Šต ์ค‘์‹ฌ
  • ์ค‘๊ธฐ(30~60%): ๊ท ํ˜• ์žกํžŒ ํ•™์Šต ์ง„ํ–‰
  • ํ›„๊ธฐ(60~100%): ์„ธ๋ถ€ ์ปฌ๋Ÿฌ์™€ grain ๋“ฑ ๋””ํ…Œ์ผํ•œ ๋ถ€๋ถ„ ์ตœ์ ํ™”

tensorboard๋กœ loss ์‹œ๊ฐํ™”๋ฅผ ์‹œ์ž‘ํ•˜๊ณ , ๋™์ผ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ gif๋ฅด ๊ตฌ์„ฑํ•˜๊ณ  ํ•™์Šต๊ณผ์ •์„ ๋ณด๋Š”๋ฐ, ์ผ๋‹จ perceptual loss๊ฐ€ ๋“ค์–ด๊ฐ€๋ฉด ์ƒ‰์ƒ์ •๋ณด๋ฅผ ๊ฑฐ์˜ ํ•™์Šตํ•˜์ง€ ์•Š์Œ

๊ทธ๋ฆฌ๊ณ  loss๊ฐ€ ํ•™์Šต๊ณผ์ • ์ค‘์— ๊ฐ์†Œํ•œ๋‹ค๊ณ  ๋ณด๊ธฐ ์–ด๋ ต๊ณ  ํšก๋ณดํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•จ.

์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ์ ์€ ๊ฒฐ๊ตญ ์ด๋ฏธ์ง€์˜ ์ƒ‰์ƒ์ •๋ณด๋‚˜ structual ํ•œ ์ •๋ณด๋ฅผ ๋‹จ์ˆœํžˆ pixel value๋ณ„๋กœ ๋น„๊ตํ•˜๋Š”๊ฒƒ์€ ์˜๋ฏธ๊ฐ€ ์—†์–ด๋ณด์ž„

โžก๏ธ ์™œ๋ƒํ•˜๋ฉด film image ์ž์ฒด๊ฐ€ random์„ฑ์„ ๋งŽ์ด ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ  ์ƒ๊ฐ๋˜๋Š”๋ฐ, ์ด๋ฅผ pixel by pixel loss๋กœ ์ ‘๊ทผํ•˜๊ฑฐ๋‚˜ structualํ•œ loss๋กœ ์ ‘๊ทผํ•˜๋ฉด ํ•™์Šต์ด ์šฉ์ดํ•˜์ง€ ์•Š๋‹ค๊ณ  ์ƒ๊ฐ๋จ

โžก๏ธ ๋˜ํ•œ model์˜ capa์—๋„ ํ•™๊ณ„๊ฐ€ ์žˆ๋‹ค๊ณ  ๋А๊ปด์ง€๋ฉฐ, ๋™์ผํ•œ ํŒจํ„ด๋งŒ์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ์˜๋ฏธ๊ฐ€ ์—†๋‹ค๊ณ  ์ƒ๊ฐ๋จ (์œ„์™€ ๋™์ผํ•œ ๋งฅ๋ฝ).

๋”ฐ๋ผ์„œ random feature๋ฅผ ejectํ•˜๋Š” ๋ถ€๋ถ„์ด ์žˆ์œผ๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค๋Š” ์ƒ๊ฐ์ž„

https://medium.com/storm-shelter/the-importance-of-film-grain-255f0246cd64

๋™์˜์ƒ์—์„œ grain synthesis๋ž‘ film photography์—์„œ์˜ grain synthesis๋ž‘ ํ–‡๊ฐˆ๋ฆด ์ˆ˜ ์žˆ์Œ

์ƒ์—… ์นด๋ฉ”๋ผ์—์„œ Digital Image pipeline(ISP)๋Š” ์ƒ๋‹นํžˆ ์ค‘์š”ํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒƒ์œผ๋กœ ์ƒ๊ฐ๋จ

RAW์—์„œ๋ถ€ํ„ฐ JPG๋กœ์˜ 1์ฐจ ๊ฐ€๊ณต์ด๊ธฐ์— ๊ทธ ํšŒ์‚ฌ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ธฐ์ˆ ๋ ฅ๊ณผ ์ƒ‰๊ฐ ํŠน์„ฑ์„ ๋ณด์—ฌ์ค˜์•ผํ•˜๋Š” ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋‹จ๊ณ„๋ผ๊ณ  ์ƒ๊ฐ๋จ

๐Ÿง ์–ด, ์—„์ฒญ๋‚œ ์ ‘๊ทผ๋ฒ•์ด ์ƒ๊ฐ๋‚จ -> ์•„์•  no base๋กœ synthesisํ•˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ ๋‚ด๊ฐ€ ์ฃผ๋Š” dataset์„ ๋ฐ”ํƒ•์œผ๋กœ patch๋‹จ์œ„ ํ˜น์€ ststistic์„ ๊ฐ€์ง€๊ณ ์„œ ๋งˆ์น˜ ์ƒ‰์ข…์ด ๋ถ™์ด์ฒ˜๋Ÿผ digital image์— ๋งž๋Š” patch๋ฅผ ๋ถ™์ด๋Š” ๋ฐฉ๋ฒ•๋„ ๊ดœ์ฐฎ์„ ๊ฒƒ ๊ฐ™์Œ -> ์ด๋Ÿฌ๋ฉด transformer๋ฅผ ์‚ฌ์šฉํ•ด ๋ณผ ์ˆ˜๋„ ์žˆ์„๋“ฏ (๋‚ด๊ฐ€ ๊ฐ€์ง€๋Š” ์ด๋ฏธ์ง€์˜ kq๋กœ ์ด๋ฏธ์ง€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” dataset์™€์˜ value๋ฅผ ๊ณ„์‚ฐํ•ด๋ณด๋ฉด?)

[SCW06] STEFANO A. D., COLLIS W., WHITE P. R.: Synthesising and reducing film grain. Journal of Visual Communication and Image Representation 17, 1 (2006), 163โ€“82.

์—ฌ๊ธฐ์„œ ๊ทธ๋ ‡๊ฒŒ ํ–ˆ๋‹ค๋Š”๋ฐ ๋‚˜์ค‘์— ์ฝ์–ด๋ณผ ๊ฒƒ!


๐Ÿ’จ Other Approaches


22.

CVAE Method pre-test

๐ŸŒˆ dataloader์—์„œ ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ ์—ฌ๋Ÿฌ๊ฐœ์˜ patch๋ฅผ ๊ตณ์ด ์„ค์ •ํ•˜์ง€ ์•Š์•„๋„, dataset์˜ __len__ property๋ฅผ ์กฐ์ ˆํ•ด์„œ ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์—์„œ ์—ฌ๋Ÿฌ๊ฐœ์˜ patch๊ฐ€ ๋‚˜์˜ค๋„๋ก ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ์Œ

CVAE ํƒˆ๋ฝ

๊ตํ›ˆ

  • 8x8 patch๋กœ batch size 64๋กœ ํ•™์Šตํ–ˆ๋Š”๋ฐ gpu ์‚ฌ์šฉ์„ ํšจ์œจ์ ์œผ๋กœ ํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•จ
  • ์ฆ‰ 8x8์„ ์œ„ํ•ด์„œ ์˜คํžˆ๋ ค ๋” ๋งŽ์€ Image RW operation์ด ๋ฐœ์ƒํ•ด์„œ ์ƒ๋Œ€์ ์œผ๋กœ bottleneck ์ด ๊ทธ๋ถ€๋ถ„์— ๊ฑธ๋ฆผ
  • ๊ทธ๋ž˜์„œ ์ด๋ฏธ์ง€๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์— ์˜ฌ๋ ค๋†“๊ณ , 8x8 patch๋ฅผ extractingํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉ


22.

Some technics (prefetch_generator, pin_memory)

prefetch_generator(BackgroundGenerator)

For 375 image data (3089โ€Šร—โ€Š2048)

  • wo BackgroundGenerator : 45.65์ดˆ
  • w BackgroundGenerator : 26.78์ดˆ


๐Ÿ“‹ ์ด๋•Œ, ํ•™์Šต delay๋ฅผ 0.1ms๋กœ ์ฃผ์—ˆ์„๋•Œ ํšจ๊ณผ๊ฐ€ ๋“ค์–ด๋‚˜์ง€ / ๊ทธ๋ƒฅ dataloader๋งŒ ์ˆœํ™˜์‹œํ‚ค๋ฉด ํšจ๊ณผ๊ฐ€ ๋“ค์–ด๋‚˜์ง€ ์•Š์Œ
๋”ฐ๋ผ์„œ ํ•™์Šต ์‹œ num_woker๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋Š” apple silicon device์— ์‚ฌ์šฉํ•˜๋ฉด ํšจ๊ณผ๊ฐ€ ์žˆ์„ ๊ฒƒ์œผ๋กœ ์ƒ๊ฐ๋จ

pin_memory

Memory๋Š” ์šด์˜์ฒด์ œ์—์„œ ์‚ฌ์šฉํ•˜์ง€ ์•Š์„ ์‹œ ๋””์Šคํฌ ๊ณต๊ฐ„์œผ๋กœ swapํ•˜๋„๋ก ๊ด€๋ฆฌ๊ฐ€๋Šฅํ•œ pageable memory์™€ ์šด์˜์ฒด์ œ์—์„œ swap ๋ชปํ•˜๋Š” page-locked memory๊ฐ€ ์กด์žฌํ•จ
์ด๋•Œ, pin_memory ์„ค์ •์„ falseํ•˜๊ฒŒ ๋˜๋ฉด pageable memory์— data๊ฐ€ ๋ถˆ๋Ÿฌ์™€์ง€๊ฒŒ ๋˜๊ณ , gpu device mem๊ณต๊ฐ„์œผ๋กœ ์˜ด๊ธฐ๊ธฐ ์œ„ํ•ด์„œ๋Š” page-loacked memory๋กœ ๋ณต์‚ฌ + DMA๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ GPU VRAM์œผ๋กœ ์ „์†ก 2๋‹จ๊ณ„๋ฅผ ๊ฑฐ์ณ์•ผํ•จ. ์—ฌ๊ธฐ์„œ overhead ๋ฐœ์ƒ
ํ•˜์ง€๋งŒ pin_memory๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋ฉด page-locked memory๋กœ data๊ฐ€ ๋ถˆ๋Ÿฌ์™€์ง€๊ธฐ ๋•Œ๋ฌธ์— overhead๊ฐ€ ์ค„์–ด๋“ฌ

1000๊ฐœ์˜ random tensor summation์— ๋Œ€ํ•ด์„œ pin_memory ์œ ๋ฌด ์†๋„ ๋น„๊ต

์‚ฌ์šฉ ์ค‘์ธ ๋””๋ฐ”์ด์Šค: mps

  • ๐Ÿ“ˆ ์ฆ๊ฐ€ํ•œ Page-Locked Memory: 1078.12 MB
  • ๐Ÿ•’ pin_memory=False ์†Œ์š” ์‹œ๊ฐ„: 0.80์ดˆ

์‚ฌ์šฉ ์ค‘์ธ ๋””๋ฐ”์ด์Šค: mps

  • ๐Ÿ“ˆ ์ฆ๊ฐ€ํ•œ Page-Locked Memory: 1080.55 MB
  • ๐Ÿ•’ pin_memory=True ์†Œ์š” ์‹œ๊ฐ„: 0.81์ดˆ

Unified memory๋ฅผ ์‚ฌ์šฉํ•˜๋Š” M2 macbook์— ๋Œ€ํ•ด์„œ๋Š” ์ ์šฉ์ด ์•ˆ๋œ๋‹ค๊ณ  ์ƒ๊ฐ๋จ, test๊ฒฐ๊ณผ ๊ทธ๋ƒฅ

pytorch์˜ ์†Œ์Šค์ฝ”๋“œ๋ฅผ ๋ณด๋ฉด
1042๋ฒˆ์ค„์—์„œ mps device์—์„œ์˜ pin_memory option์€ ๊ด€๋ฆฌ๋ฅผ ํ•˜์ง€ ์•Š์Œ. mps ์ž์ฒด์—์„œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ด€๋ฆฌ๋ฅผ ํ•˜๋Š”๋“ฏ ๋ณด์ธ๋‹ค. ์ด๊ฒŒ UM์˜ ํŠน์„ฑ์ด๋ผ๊ณ  ์ƒ๊ฐ๋จ.

pytorch dataloader.py source code

pytorch mps MPSAllocator.mm source code

.mm file์€ objective-C์™€ C++์„ ์„ž์€ ํ˜•ํƒœ โžก๏ธ pytorch(c++ ๊ธฐ๋ฐ˜), Metal API๋Š” Object-C ๊ธฐ๋ฐ˜ โžก๏ธ 2๊ฐœ ์—ฐ๊ฒฐํ•˜๋ ค๋ฉด .mm file๋กœ ์—ฐ๊ฒฐ

๐Ÿ“Š Mach Virtual Memory Statistics (Page size: 16,384 bytes)

vm_stat command

*Available Memory**
  • Pages free: 18,064 (โ‰ˆ 281MB)
*Memory Usage**
  • Pages active: 302,460 (โ‰ˆ 4.7GB)
  • Pages inactive: 300,448 (โ‰ˆ 4.7GB)
  • Pages speculative: 933 (โ‰ˆ 15MB)
*Page-Locked Memory (Pinned Memory)**
  • Pages wired down: 113,599 (โ‰ˆ 1.8GB)
*Cached & Purgeable Memory**
  • Pages purgeable: 7,490 (โ‰ˆ 120MB)
  • File-backed pages: 139,514 (โ‰ˆ 2.2GB)
  • Anonymous pages: 464,327 (โ‰ˆ 7.4GB)
*Memory Compression**
  • Pages stored in compressor: 722,020 (โ‰ˆ 11.5GB)
  • Pages occupied by compressor: 275,678 (โ‰ˆ 4.4GB)
  • Decompressions: 2,864,785
  • Compressions: 5,081,194
*Swap & Paging**
  • Pageins: 2,560,652
  • Pageouts: 18,275
  • Swapins: 122,093
  • Swapouts: 661,024
*Additional Info**
  • Translation faults: 74,863,824
  • Pages copy-on-write: 5,963,520
  • Pages zero filled: 29,007,477
  • Pages reactivated: 2,205,490
  • Pages purged: 634,520

23.

Model architecture visualization

netron.start(โ€œunet.onnxโ€)

DIANET visulization

ONNX (Open Neural Network Exchange) : ๋‹ค์–‘ํ•œ framework๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์˜ IR ์—ญํ• ์„ ํ•˜์—ฌ์„œ HWํšŒ์‚ฌ๋“ค์€ ONNX๋ฅผ ํƒ€๊ฒŒํŒ…์œผ๋กœ compiler๋‚˜ sw ์Šคํƒ์„ ์ตœ์ ํ™”ํ•˜๋ฉด ๋จ

NVIDIA : ONNX -> TensorRT -> CUDA kernel
Intel : ONNX -> OpenVINO -> ?
Qualcomm : ONNX -> SNPE -> HexagonDSP
APPLE : ONNX -> CoreML -> ANE(apple neural engine)


24.

Model architecture seeks (clean code done!)

Vgg perceptual loss๋Š” ๋ฒ„๋ฆฌ์žใ…‡ใ…‡ -> ์ƒ‰ ์žฌํ˜„์„ ๋ชปํ•  ๋ฟ๋”๋Ÿฌ MSE loss๋งŒ ์žˆ๋˜๊ฒŒ ๋” ๋‚˜์Œ

Tries

  1. relu -> leakyrelu
  2. concat ์ง์ „์— convํ•œ๋ฒˆ
  3. sigmoid ๋ถˆ๊ฐ€ (image input์ด [-1, 1] ์ •๊ทœ๋ถ„ํฌ์ž„) -> ํฐ ๋ฌธ์ œ์˜€์Œ

DiaNet๋„ ๊ตฌ์กฐ๊ฐ€ ๋ฌธ์ œ์ธ์ง€ ํ•™์Šต๋ฐฉ๋ฒ•์ด ๋ฌธ์ œ์ธ์ง€๋Š” ๋ชจ๋ฅด๊ฒ ๋Š”๋ฐ, ์ž˜ ์•ˆ๋˜๋Š”๊ฑด ํ™•์‹คํ•จ


25.

Homogeniety block detection

์‹คํ—˜์ ์œผ๋กœ ์„ค์ •ํ•œ parameter : def extract_pure_color_patches(image_path, patch_size=16, stride=8, variance_threshold=300, edge_threshold=0.01):
homogeniety block detection์„ ์œ„ํ•œ parameter

detection ์ด๋ฏธ๋‹น 5๋งŒ์žฅ ๋‹ค ์ €์žฅํ•˜๋‹ค๊ฐ€ ๋งฅ๋ถ ์šฉ๋Ÿ‰ ๋‹ค ์žก์•„๋จน๊ณ , ์‚ญ์ œํ•˜๋Š”๋ฐ๋„ ํŒŒ์ผ ์ฝ๋Š”๋ฐ ์—„์ฒญ๋‚œ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๋Š” ๋ฌธ์ œ ๋ฐœ์ƒ. ssd ์†๋„๊ฐ€ ํ™•์‹คํžˆ ๋А๋ฆฌ๋‹ค๋Š”๊ฒƒ์„ ์ฒด๊ฐํ•˜๋Š”์ค‘ (์ ๋‹นํ•œ ์‚ฌ์ด์ฆˆ๋กœ dataset๋งŒ๋“ค๊ฑฐ๋‚˜ dataloader์—์„œ ์ž‘๋™ํ•˜๋„๋ก ํ•ด์•ผ๋ ๋“ฏ)

๋‚ด๊ฐ€ํ•˜๋Š” taskr๊ฐ€ ์ง€๊ธˆ low-level vision task(denoising, enhancement, SR) / high-level vision task (semantic tasks)

Style trasnfer๋Š” ๋‘˜ ๋‹ค ๊ณ ๋ คํ•ด์•ผ๋˜๋Š” mid-level vision task๋กœ ๋ถˆ๋ฆด ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.

๐Ÿ“ƒ Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training

์—ฌ๊ธฐ์„œ noise ์ž์ฒด์˜ randomness์™€ irregularity๋ฅผ ๊ณ ๋ คํ•ด์„œ ์ด๊ฑฐ๋ฅผ L1 loss๋กœ ํ•˜๋Š”๊ฑด ๋ถ€์ ์ ˆ (non convergence)
๋”ฐ๋ผ์„œ noise๋ฅผ Random Variable๋กœ ์—ฌ๊ธฐ๊ณ  MLE(Irn๋ถ„ํฌ๋ฅผ ์ซ’์•…๋„๋ก)์™€ Dd๋ฅผ ์ด์šฉํ•œ clean image alignment๋กœ Image์˜ ์ƒ์„ฑ์—์„œ์˜ noise๋ฅผ ๋ชจ๋ฐฉํ•˜๋„๋ก ์„ค์ •ํ•จ
๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Realistic Discriminator๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ง€๊ธˆ๊นŒ์ง€ ๊ฐ€์žฅ ์„ฑ๋Šฅ๊ณผ ์•ˆ์ •์„ฑ์ด ์ข‹์•˜๋˜ UEGAN์€ Realistic Discriminator์—์„œ ํ•œ๋ฐœ์ง ๋” ๋‚˜์•„๊ฐ„ Relativistic average HingeGAN (RaHingeGAN)์„ ํ™œ์šฉํ•œ๋‹ค.

PNGAN์—์„œ๋„ ํ•ด๋‹น ๊ธฐ๋ฒ•์„ ์ ์šฉํ•ด ๋ณผ ์ˆ˜ ์žˆ๊ฒ ๋‹ค.


26.

PNGAN denoisier + UEGAN (or others)

UEGAN์ด ์ผ๋‹จ grain์ด ์žˆ์„๋•Œ๋„ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ธด ํ–ˆ๋Š”๋ฐ, denoising์ด ๋œ ๊ฒฝ์šฐ์— ์–ด๋–ป๊ฒŒ ํ•™์Šต์ด ์ง„ํ–‰๋˜๋Š”์ง€ ๋น„๊ต๋ฅผ ์œ„ํ•ด์„œ ํ•ด๋ณด์ž.

Colour๊ณผ grain์„ ๋™์‹œ์— synthesisํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์€ ๋”ฐ๋กœ ์—†์–ด๋ณด์ด๋Š”๋ฐ..

https://developer.apple.com/videos/play/wwdc2024/10160/

Torch ์ƒํƒœ๊ณ„ ์•ˆ์—์„œ model์„ fine-tunningํ•˜๊ณ  deploymentํ•˜๋Š” ๋ฐฉ๋ฒ•

https://www.youtube.com/watch?v=SN-BISKo2lE


27.

TPU hbm Max allocatoin

PNGAN์œผ๋กœ film ์ด๋ฏธ์ง€๋ฅผ ๋ชจ๋‘ denoisingํ•˜๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ, ๋˜ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ์ƒํƒœ๊ฐ€ ๋ฐœ์ƒํ•จ (๊ทธ๋ž˜์„œ colab tpu๋กœ ํ–ˆ๋Š”๋ฐ๋„ 32gb์—์„œ๋„ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑํ•จ)

RuntimeError: Bad StatusOr access: RESOURCE_EXHAUSTED: XLA:TPU compile permanent error. Ran out of memory in memory space hbm. Used 31.40G of 7.48G hbm. Exceeded hbm capacity by 23.92G.

Total hbm usage >= 31.92G: reserved 530.00M program 31.40G arguments 0B

๋‹ค์‹œ ์ƒ๊ฐํ•ด๋ณด๋‹ˆ๊นŒ, ์• ์ดˆ์— UEGAN์—์„œ๋Š” resize๋ฅผ ํ•ด์„œ noise ์„ฑ๋ถ„์ด ๊ฑฐ์˜ ์˜ํ–ฅ์„ ์•ˆ๋ฏธ์ณค์„ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐ๋œ๊ธดํ•จ

๋”ฐ๋ผ์„œ denoising์„ ํ™œ์šฉํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋‚ฎ์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ


28.

EfficientDet

RCTNet + UEGAN + PNGAN ๊ตฌํ˜„ํ•˜๋ ค๊ณ  RCTNet ๋‹ค์‹œ ์ฝ๋‹ค๊ฐ€ EfficientDet์˜ feature fusion์„ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•ด์„œ ์ฝ์Œ

JAX๊ฐ€ ์š”์ฆ˜ ๋œจ๋Š”๋“ฏใ…‡ใ…‡

EfficientDet์€ ๊ธฐ์กด์˜ feature์„ ๋” ํšจ์œจ์ ์œผ๋กœ ์—ฐ์‚ฐ๋Ÿ‰์„ ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋ฉด ๋” ์œ ์˜๋ฏธํ•œ feature๋ฅผ ๋ฝ‘์•„๋‚ด๊ธฐ ์œ„ํ•œ ์‹œ๋„๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Œ

์ง€๊ธˆ ํ๋ฆ„์ด

  1. RCTNet
  2. UEGAN
  3. Stochastic film grain synthesis
  4. PNGAN

์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•ด๋‹น ์•Œ๊ณ ๋ฆฌ์ฆ˜, ๋ชจ๋ธ, ํ•™์Šต๋ฐฉ๋ฒ•์„ fusion ํ•˜๊ธฐ ์œ„ํ•ด์„œ ํ›„์† ๋…ผ๋ฌธ์ด๋‚˜ ์„ ํ–‰ ์—ฐ๊ตฌ ๋…ผ๋ฌธ ์ฝ๊ธฐ์ค‘

  1. EfficientDet- Scalable and Efficient Object Detection (cited by RCTNet)

  2. MAXIM: Multi-Axis MLP for Image Processing (citing UEGAN) ๋‚˜๋„ ์–ด๋–ป๊ฒŒ ๋ณด๋ฉด low-level vision tasks์—์„œ multi-stage networks๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ๋ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋จ ์ด๋Ÿฐ ์ˆœ๊ฐ„์ด ์˜ดใ…‡ใ…‡ ์ฝ๋‹ค๋ณด๋ฉด ๋„ˆ๋ฌด ์ƒˆ๋กœ์šด ๊ฐœ๋…์ด๋ผ์„œ ํ•œ๋ฒˆ์˜ ๊ธ€์ž๋ฅผ ์ฝ๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋Š” ์ดํ•ด๊ฐ€ ์•ˆ๋˜๋Š” ์ˆœ๊ฐ„์ด ์˜ด -> ๊ต‰์žฅํžˆ ๋จธ๋ฆฌ ์•„ํ”„๊ณ  ์ž๊ดด๊ฐ์ด ๋“ค์ง€๋งŒ, ์ตœ๋Œ€ํ•œ ๋จธ๋ฆฌ๋ฅผ ์ •๋ฆฌํ•˜๊ณ  ๋‹ค์‹œ ์ดํ•ดํ•˜๋ ค๊ณ  ๋…ธ๋ ฅํ•ด์„œ ํ•œ step ์ง„๋ณดํ•œ๋‹ค๋Š” ๋งˆ์Œ์œผ๋กœ ๋‚˜์•„๊ฐ

heuristic-based scaling approach : ์ง๊ด€์ด๋‚˜ ๊ฒฝํ—˜์  ํŒ๋‹จ, ๊ฐ„๋‹จํ•œ ๊ทœ์น™์„ ์ด์šฉํ•˜๋Š” ์ ‘๊ทผ

Image enhancement ๊ฐ™์€ ๊ฒฝ์šฐ์—๋Š” paper with code์—์„œ๋„ SOTA๋ฅผ ์„ ์ •ํ•˜๋Š” ๊ธฐ์ค€์ด ๋Œ€๊ฒŒ qualitivie comparision์ด๊ธฐ ๋•Œ๋ฌธ์—, ๋‹ค์†Œ ์ฃผ๊ด€์ ์ธ ์˜์—ญ์ž„

๐Ÿ’ญ๐Ÿ’ญ๐Ÿ’ญ ๋”ฐ๋ผ์„œ ์—”์ง€๋‹ˆ์–ด์˜ artistic sense๊ฐ€ ํ•„์š”ํ•œ ์˜์—ญ์ด๋ผ๊ณ  ์ƒ๊ฐ๋˜๊ณ , ๋‚ด๊ฐ€ ๊ทธ๋ž˜์„œ ํฅ๋ฏธ๊ฐ€ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋Š” ๋“ฏํ•˜๋‹ค. (SSIM, PSNR ๋“ฑ๊ณผ ๊ฐ™์€ ์ˆ˜์น˜์ ์ธ improvement๋Š” ๋‚˜์—๊ฒŒ ๋‹ค์†Œ ํฅ๋ฏธ๋กญ๊ฒŒ ์•ˆ๋А๊ปด์ง)

CUDA vs MPS

MPS(M2 16gb) Average Process Time: 0.4536 sec Average Prepare Time: 0.0005 sec Average Compute Efficiency: 1.00

CUDA(t4 colab) Average Process Time: 0.0309 sec Average Prepare Time: 0.0004 sec Average Compute Efficiency: 0.98

โŠณ Different Norms

LayerNorm: Normalizes each sample over the last dimension (C), yielding Bร—Hร—W averages = 4ร—16ร—16 = 1024. ๐Ÿ˜Š

BatchNorm: Normalizes each channel over the entire batch (B, H, W), resulting in C averages = 64. ๐Ÿš€

InstanceNorm: Normalizes each channel per sample over spatial dimensions (H, W), giving Bร—C averages = 4ร—64 = 256. ๐Ÿ‘

These results match the PyTorch documentation and deep learning literature (Ba et al., Ioffe & Szegedy).


๐Ÿ“Œ References

๐Ÿงท https://ml-explore.github.io/mlx-data/build/html/index.html
๐Ÿงท https://ml-explore.github.io/mlx/build/html/index.html
๐Ÿงท https://blog.jaeyoon.io/2017/12/jekyll-image.html
๐Ÿงท https://gyumpic.tistory.com/511


๐Ÿ“ƒ Papers

An Unsupervised Deep Learning Approach for Real-World Image Denoising
Image Style Transfer Using Convolutional Neural Networks
MAXIM: Multi-Axis MLP for Image Processing
EfficientDet: Scalable and Efficient Object Detection
Investigating properties of film grain noise for film grain management
Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training
Rapid and Reliable Detection of Film Grain Noise
Simulating Film Grain using the Noise-Power Spectrum
Texture Synthesis Using Convolutional Neural Networks
Film-GAN: towards realistic analog film photo generation
Stimulating Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling
Computational Simulation of Alternative Photographic Processes
A Stochastic Film Grain Model for Resolution-Independent Rendering
A Large-scale Film Style Dataset for Learning Multi-frequency Driven Film Enhancement
Deep Photo Enhancer: Unpaired Learning for Image Enhancement from Photographs with GANs
Contrastive Learning for Unpaired Image-to-Image Translation
Deep-based Film Grain Removal and Synthesis
Global and Local Enhancement Networks for Paired and Unpaired Image Enhancement
Representative Color Transform for Image Enhancement
Local Color Distributions Prior for Image Enhancement
PieNet: Personalized Image Enhancement Network
Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network
U-Net: Convolutional Networks for Biomedical Image Segmentation
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks