Input: 32-frame grayscale sequence (112×112) → 3D-CNN (3 layers, 64–128–256 filters, kernel 3×3×3) → Temporal Transformer Encoder (4 heads, 2 layers) → Two heads: - Intensity: MSE loss (regression) - Authenticity: BCE loss (binary) Training: 80/10/10 split, AdamW (lr=1e-4), batch size 64, 50 epochs. | Task | Metric | Gülümseme (original) | Gülümseme 2 (ours) | Improvement | |------|--------|----------------------|---------------------|--------------| | Smile detection (binary) | Accuracy | 84.3% | 94.1% | +9.8% | | Intensity estimation | MAE | 0.94 | 0.41 | -56% | | Authenticity (spontaneous vs. posed) | F1-score | 0.75 | 0.89 | +0.14 | | Cross-cultural generalization (leave-one-group-out) | ΔAcc | -12% | -3.2% | - |
Select at least 2 products
to compare