Arnesh Batra — Research Portfolio

01 / Research

Three threads,
one curiosity.

My research lives at the intersection of self-supervised learning, signal processing, and deep-learning theory & interpretability — asking how networks build representations, what invariances they preserve, and where those representations quietly fail.

I · Self-Supervised Learning

Representations that generalize.

Theoretical work on self-supervised learning — relational representation learning, robustness, and the conditions under which invariances transfer across tasks.

II · Signal Processing

Frequency, phase, multi-view.

Spectrogram transformers, Fourier-phase forensics, and multi-view cross-attention for 2D / 3D signals — from birdsong (INTERSPEECH) to spinal MRI (CVPR), dental scans and AI-image detection.

III · Theory & Interpretability

Inside the latent.

Uncovering what intermediate layers actually encode — generalization gaps, representation geometry, and why certain features only surface deep inside the network. Work under review at ICML 2026.

Founder · Lead Researcher

StarkVision

An independent, student-led research community I founded in 2025 — where undergraduates publish at top venues as first authors.

20+ members. Three active projects. Mentorship on research methodology, writing, and publication strategy. Everything below carries student-only authorship.

Output · 2024 — 2025

✔

M-SCAN · Lumbar Stenosis Grading

Multi-view cross-attention for spinal MRI

CVPR '25 · Demo

✔

Melody or Machine · Synthetic Music Detection

Cross-modal contrastive alignment

TMLR

✔

SocialDF · Deepfake Detection

Benchmark + model for harmful synthetic content

ICMR '25 · Workshop

✔

CLAM · AI for Music

Safeguarding authenticity in the music industry

NeurIPS '25 · Workshop

○

More coming

In prog.

02 / Publications

Papers &
pre-prints.

Ten works across journals, top-tier conferences and workshops. Asterisk (*) denotes equal contribution.

★ Featured · ICML 2026 Spotlight

Uncovering the Latent Potential of Deep Intermediate Representations

Arnesh Batra, Arush Gumber, Aniket Khandelwal, Jashn Khemani, Anubha Gupta

A theoretical & empirical study of what intermediate layers actually encode — reframing them as latent reservoirs of structure, and showing how to recover and exploit that signal for generalization. Accepted as a Spotlight at ICML 2026.

ICML 2026 · Spotlight DL Theory Interpretability Camera-ready in progress

★ C.02026

Uncovering the Latent Potential of Deep Intermediate Representations

Arnesh Batra, Arush Gumber, Aniket Khandelwal, Jashn Khemani, Anubha Gupta

ICML 2026 · Spotlight · Accepted

Coming soon

J.12025

Melody or Machine: Benchmarking and Detecting Synthetic Music via Cross-Modal Contrastive Alignment

Arnesh Batra, Dev Sharma, Krish Thukral, Ruhani Bhatia, Naman Batra, Aditya Gautam

TMLR · Transactions on Machine Learning Research

PDF ↗

C.12024

Audio Spectrogram Transformer Guided Classification & Information Retrieval for Birds

Yashwardhan Chaudhuri*, Paridhi Mundra*, Arnesh Batra*, Orchid Chetia Phukan, Arun Balaji Buduru

INTERSPEECH 2024 · ISCA

PDF ↗

C.22025

M-SCAN: A Multistage Framework for Lumbar Spinal Canal Stenosis Grading Using Multi-View Cross Attention

Arnesh Batra, Arush Gumber, Anushk Kumar

CVPR 2025 · Demo Track

arXiv ↗

W.12025

Relational Representation Learning

Lucas Maes, Ian K Hajra, Arnesh Batra, Hugues Van Assel, Damien Scieur, Randall Balestriero

NeurIPS 2025 Workshop · UniReps — Unifying Representations in Neural Models

PDF ↗

W.22025

CLAM: Safeguarding Authenticity & Addressing Implications for the Music Industry

Arnesh Batra, Dev Sharma, Krish Thukral, Ruhani Bhatia, Naman Batra, Aditya Gautam

NeurIPS 2025 Workshop · AI for Music

PDF ↗

W.32025

SocialDF: Benchmark Dataset & Detection Model for Mitigating Harmful Deepfake Content on Social Media

Arnesh Batra, Anushk Kumar, Jashn Khemani, Arush Gumber, Arhan Jain, Somil Gupta

ACM ICMR 2025 Workshop · MAD'25

ACM ↗

W.42025

DentalNet: Geometric Aware Multi-View Transformer for Occlusion Grade Prediction in Dental 3D Scans

Arnesh Batra, Arush Gumber, Vaibhav Sharma, Peemit, Rinkle Sardana, Tulika Tripathi, Anubha Gupta

NeurIPS 2025 Workshop · Imageomics

PDF ↗

03 / Experience

Labs &
collaborators.

Working across academia and industry — from FAIR/Brown to medical-imaging labs at IIIT Delhi and startups building production-grade AI.

May 2025 — present

NivionAI

Co-Founder

Re-imagining industry workflows through AI-driven innovation grounded in state-of-the-art research. Cleared Stage 1 of the AI Grand Challenge organized by NCIIPC, Government of India.

● Delhi · IN

May 2025 — Dec 2025

Brown University

Research Collaborator · w/ Prof. Randall Balestriero (FAIR/Brown)

Theoretical research on self-supervised learning frameworks — representation robustness, generalization, mathematical analysis and empirical validation. Work accepted at NeurIPS '25 UniReps.

● Remote

May 2025 — present

SBI Lab · IIIT Delhi

Research Intern · w/ Prof. Anubha Gupta

Medical image analysis across 2D and 3D modalities — blood-cell, dental and spinal imaging. Designing hybrid transformer architectures integrating volumetric and multi-view data with SOTA performance on multiple datasets. Work under review at ECCV 2026.

● Delhi · IN

Feb 2025 — May 2025

UC San Diego

Research Intern · w/ Prof. Pengtao Xie

Built the first LLM framework for DNA-RNA interaction prediction. Curated a 100k+ sequence dataset, proposed lightweight genomics architectures achieving substantial scalability and efficiency gains.

● Remote

May 2025 — Jul 2025

Mythyaverse

AI Engineering Intern

Explainable deep-learning models for ECG signal interpretation. Built a real-time sign-language translation system using transformer architectures — 95%+ accuracy on Arabic Sign Language. Shipped LLM + multimodal pipelines into production.

● Remote

May 2024 — Oct 2024

HMI Lab · IIIT Delhi

Research Intern · w/ Prof. Jainendra Shukla

Multimodal LLMs for dynamic tasks — live sports commentary and fine-grained video captioning. Designed and benchmarked video encoders and attention-based fusion pipelines.

● Delhi · IN

04 / Elsewhere

Projects, awards
& leadership.

Selected projects

FOCUS FLOW — Adaptive Learning Platform2024

AI-driven learning platform for neurodiverse students with real-time attention tracking via Mediapipe + OpenCV.

SuperResolution for Gravitational Lensing2024

GANs, SRResNet and diffusion-based denoising — AUROC 0.9978 · 98.3% acc · SSIM 0.94.

RE-DACT — Secure Data Redaction2024

Transformer-based redactor for text, PDF, images and video with 99%+ accuracy. 2nd place at SIH 2024 Grand Finals.

Leadership

StarkVision — Founder & Lead Researcher2025 →

Research community of 20+ members mentoring undergraduate-led research. See the dedicated section above.

Coordinator · Cyborg, IIIT-D2023 →

Led a hands-on neural-networks workshop for 40+ students; mentored first-time model implementations.

Honors & awards

Amazon ML Challenge 2025Rank 13 / 20,000

Price prediction from product image + description.

Smart India Hackathon 2024Top 2 · Finals

RE-DACT selected for grand finale at IIT Kharagpur.

Grand AI Challenge PS-03Top 6 / 100+

NCIIPC · Government of India · cleared Stage 1.

F1nalyze · Formula 1 Datathon1st place

Kaggle competition.

ML Hackathon · IIT Jodhpur4th place

FILL THE VOID() · IIT Jammu5th place

JEE Main99.72 %ile · AIR 3334

Out of 1.2 million candidates.

Academic service

Reviewer · ACM ICMR 2025

Reviewer · NeurIPS Workshops 2025

Teaching50+ students

Neural networks & data analysis fundamentals (NumPy, Pandas).

05 / Off-hours

Away from the terminal.

When I'm not reading papers or training models — I'm usually somewhere with a camera, a racquet, or a window seat.

01 · Photography

Making light sit still.

Street, low-light, and long exposures. I shoot to train my eye on composition — the rule of thirds is basically attention.

@arnesh_photography_ ↗

02 · Badminton

Fast, cross-court.

Played since school. It's the one place I don't overthink — reaction, footwork, the satisfying click of a clean smash.

Doubles preferred

03 · Travel

New cities, same notebook.

Somewhere between changing skylines and unfamiliar menus — most of my best ideas arrive on a train.

Next: open

Self-Supervised Learning• Signal Processing• Deep-Learning Theory• Interpretability• Representations• Photography• Badminton• Travel• Self-Supervised Learning• Signal Processing• Deep-Learning Theory• Interpretability• Representations• Photography• Badminton• Travel•

Three threads,one curiosity.

Representations that generalize.

Frequency, phase, multi-view.

Inside the latent.

StarkVision

Papers &pre-prints.

Uncovering the Latent Potential of Deep Intermediate Representations

Labs &collaborators.

Projects, awards& leadership.

Selected projects

Leadership

Honors & awards

Academic service

Away from the terminal.

Making light sit still.

Fast, cross-court.

New cities, same notebook.

Three threads,
one curiosity.

Papers &
pre-prints.

Labs &
collaborators.

Projects, awards
& leadership.