DCT Module
area optimization

methologys for VLSI optimizations

🎯 Project Overview

This project aimed to aggressively optimize the area and power consumption of a Discrete Cosine Transform (DCT) forward module for image/video compression, while maintaining image quality (PSNR > 30 dB) and meeting timing constraints.


🏆 Key Numerical Results

  • Final Area: Reduced from 6,973,098 to 1,441,982 um² (~79% reduction)
  • Final Power: Reduced from 142 mW to 53 mW (~63% reduction)
  • PSNR: Consistently above 30 dB throughout all optimization steps

⚡ Optimization Skills & Methods

Step Main Technique Area (um²) Power (mW) Reduction (from previous)
Baseline - 6,973,098 142 -
Coefficient/Bitwidth Quant. 8-bit coefficients, 9-bit intermediates 3,729,472 94 -3.24M, -48
Coefficient Symmetry Exploit 12x12 matrix symmetry 2,784,806 81 -0.95M, -13
Sub-expression Sharing ASU common sub-expression sharing 2,783,910 82 (minimal)
Coefficient Compression Fine-tune coefficients by frequency basis 2,301,881 70 -0.48M, -12
TP Memory Merging Merge two TP_MEMs into one 1,784,102 50 -0.52M, -20
Glitch/Overflow Handling Add clamping logic 1,560,684 60 -0.22M, +10
High-Frequency Removal Mask out10/out11, both DCT stages 1,441,982 53 -0.12M, -7


{
    "type": "bar",
    "data": {
        "labels": ["Glitching solved", "Co quantization", "Symmetry", "Sharing", "tp_BW=9", "Co compressed", "TP merged", "overflow_reduced", "High frequency"],
        "datasets": [
            {
                "label": "Power (mW)",
                "data": [142000, 94000, 81000, 82000, 73000, 70000, 50000, 60000, 53000],
                "backgroundColor": "rgba(54, 162, 235, 0.7)",
                "borderColor": "rgba(54, 162, 235, 1)",
                "borderWidth": 1,
                "yAxisID": "y-power"
            },
            {
                "label": "Area (um^2)",
                "data": [6973098, 3729472, 2784806, 2783910, 2471659, 2301881, 1784102, 1560684, 1441982],
                "borderColor": "rgba(255, 99, 132, 1)",
                "backgroundColor": "rgba(255, 99, 132, 0)",
                "borderWidth": 2,
                "yAxisID": "y-area",
                "type": "line",
                "fill": false,
                "pointBackgroundColor": "rgba(255, 99, 132, 1)",
                "pointRadius": 4,
                "pointHoverRadius": 6
            }
        ]
    },
    "options": {
        "responsive": true,
        "title": {
            "display": true,
            "text": "Power and Area Comparison Across Optimization Techniques"
        },
        "tooltips": {
            "mode": "index",
            "intersect": false
        },
        "scales": {
            "yAxes": [
                {
                    "id": "y-power",
                    "type": "linear",
                    "position": "left",
                    "scaleLabel": {
                        "display": true,
                        "labelString": "Power (mW)"
                    },
                    "ticks": {
                        "beginAtZero": true
                    }
                },
                {
                    "id": "y-area",
                    "type": "linear",
                    "position": "right",
                    "scaleLabel": {
                        "display": true,
                        "labelString": "Area (um^2)"
                    },
                    "gridLines": {
                        "drawOnChartArea": false
                    },
                    "ticks": {
                        "beginAtZero": true
                    }
                }
            ]
        }
    }
}


📋 Optimization Techniques (Summary)

  • Bitwidth & Coefficient Quantization: MATLAB simulations determined optimal 8/9-bit setting, ensuring PSNR > 30 dB.
  • Coefficient Symmetry: Leveraged symmetry in the 12x12 DCT matrix to minimize unique multipliers, significantly reducing area.
  • Sub-expression Sharing: Identified and shared common expressions in ASUs, although synthesis tools already performed similar optimizations.
  • Coefficient Compression: Adjusted less significant frequency coefficients (e.g., C10), further reducing logic without PSNR loss.
  • TP Memory Merging: Combined two transpose memories into one, enabling simultaneous read/write and cutting memory area by ~30%.
  • Glitch/Overflow Management: Implemented clamping logic at 12-bit truncation to prevent overflow/underflow artifacts.
  • High-Frequency Computation Reduction: Masked highest frequency outputs (out10, out11) in both DCT stages, preserving PSNR while reducing computations and area.

🔬 Synthesis & Quality Results

  • Area: 79% reduction
  • Power: 63% reduction
  • Image Quality: PSNR always > 30 dB (e.g., windmill image PSNR ~31–34 dB after all optimizations)
  • Timing: All timing constraints met; critical path remains in main DCT computation.

📈 Optimization Process (Log Highlights)

  • Early Trials: Focused on overflow/glitching fixes.
  • Quantization: Established 8/9-bit setting for coefficients/intermediates.
  • Symmetry & Sharing: Reduced multipliers and optimized ASUs.
  • Coefficient Compression: Fine-tuned coefficients for further reduction.
  • TP_MEM Merge: Major area savings by merging memories.
  • High-Frequency Removal: Zeroed high-frequency outputs with no PSNR loss.
  • Final: Explicitly handled zeroed memory locations for further optimization.

Summary: By systematically applying quantization, symmetry, coefficient compression, memory architecture simplification, and high-frequency output masking, the DCT forward module achieved dramatic area and power reductions while consistently meeting image quality and timing requirements.


Summarized using Perplexity (Claude 3.7 Sonnet) · Retouched by Duhyeon Kim

Project Report

Your browser doesn't support embedded PDFs. Please click here to download the PDF.

Presentation

Download Presentation PDF