Daniele Paliotta

Hi 👋 ! I’m Daniele. I am PhD candidate in the Machine Learning Group at the University of Geneva, under the supervision of François Fleuret. My focus is on transformers, LLM systems and efficiency, and alternative architectures.
Recently, I was a research intern at Cartesia working on multimodal and TTS foundation models (transformers, state space models, linear RNNs) with a strong focus on efficiency.
Previously, I was a researcher at Together AI supervised by Tri Dao, where I worked on LLM distillation, efficient inference, and developed speculative decoding for Mamba/linear RNNs.
In a previous life, I have worked as a software engineer, I have done machine learning at Truelayer, and I have played in Capture the Flag competitions.
I also love sailing, reading, writing fiction, cooking and playing guitar.
selected publications
- The Mamba in the Llama: Distilling and Accelerating Hybrid ModelsIn Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, 2024
- Fast Causal Attention with Dynamic SparsityIn Workshop on Efficient Systems for Foundation Models @ ICML2023, 2023
- Fast Attention Over Long Sequences With Dynamic Sparse Flash AttentionIn Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023
- Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners2025
- Understanding and Minimising Outlier Features in Transformer TrainingIn Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, 2024
- Leveraging the true depth of LLMs2025
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models2025