Parsa Kavehzadeh

I am Parsa Kavehzadeh, an NLP researcher specializing in the development and optimization of Large Language Models (LLMs). Currently, I work at Huawei Technologies Canada, where I introduced Sorted LLaMA, a method for integrating nested submodels into a single LLM, and developed a confidence-based early exiting mechanism to accelerate inference.

I earned my MSc in Computer Science at York University under the supervision of Professor Enamul Hoque, focusing on natural language interactions with visualizations, including chart comprehension and reasoning. My research also encompassed a novel chart pretraining method and a survey on chart question answering. Previously, I completed my BSc in Computer Engineering at Amirkabir University of Technology, exploring deep learning for anomaly detection and text chunking.

My interests include efficient LLM training, inference acceleration, and multi-modal systems integrating NLP and computer vision. I’ve published in top-tier conferences like EACL, EMNLP, and EuroVis and received recognition such as MVP at Huawei Noah’s Ark Lab.

selected publications

UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning

Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, and 2 more authors

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023

Abs

Charts are widely used for data analysis, providing visual representations and insights into complex data. To facilitate chart-based data analysis using natural language, several downstream tasks have been introduced recently such as chart question answering and chart summarization. However, existing methods for these tasks often rely on pretraining on language or vision-language tasks, neglecting the explicit modeling of chart structures (e.g., how chart elements are related to each other). To address this, we first build a large corpus of charts covering diverse topics and visual styles. We then present UniChart, a pretrained model for chart comprehension and reasoning. UniChart encodes the relevant text, data, and visual elements of charts and then uses a chart-grounded text decoder for text generation. We propose several chart-specific pretraining tasks that include: (i) low-level tasks to extract the visual elements (e.g., bars, lines) and data from charts, and (ii) high-level tasks to acquire chart understanding and reasoning skills. Our experiments demonstrate that pretraining UniChart on a large corpus with chart-specific objectives, followed by fine-tuning, yields state-of-the-art performance on four downstream tasks. Moreover, our model exhibits superior generalizability to unseen chart corpus, surpassing previous approaches that lack chart-specific objectives and utilize limited chart resources.
S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models

Parsa Kavehzadeh, Mohammadreza Pourreza, Mojtaba Valipour, and 5 more authors

Dec 2024

Abs

Deployment of autoregressive large language models (LLMs) is costly, and as these models increase in size, the associated costs will become even more considerable. Consequently, different methods have been proposed to accelerate the token generation process and reduce costs. Speculative decoding (SD) is among the most promising approaches to speed up the LLM decoding process by verifying multiple tokens in parallel and using an auxiliary smaller draft model to generate the possible tokens. In SD, usually, one draft model is used to serve a specific target model; however, in practice, LLMs are diverse, and we might need to deal with many target models or more than one target model simultaneously. In this scenario, it is not clear which draft model should be used for which target model, and searching among different draft models or training customized draft models can further increase deployment costs. In this paper, we first introduce a novel multi-target scenario for the deployment of draft models for faster inference. Then, we present a novel, more efficient sorted speculative decoding mechanism that outperforms regular baselines in multi-target settings. We evaluated our method on Spec-Bench in different settings, including base models such as Vicuna 7B, 13B, and LLama Chat 70B. Our results suggest that our draft models perform better than baselines for multiple target models at the same time.
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference

Parsa Kavehzadeh, Mojtaba Valipour, Marzieh Tahaei, and 3 more authors

In Findings of the Association for Computational Linguistics: EACL 2024, Mar 2024

Abs

Large language models (LLMs) have revolutionized natural language processing (NLP) by excelling at understanding and generating human-like text. However, their widespread deployment can be prohibitively expensive. SortedNet is a recent training technique for enabling dynamic inference by leveraging the modularity in networks and sorting sub-models based on computation/accuracy in a nested manner. We extend SortedNet to generative NLP tasks, making large language models dynamic without any Pre-Training and by only replacing Standard Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT). Our approach boosts model efficiency, eliminating the need for multiple models for various scenarios during inference. We show that this approach can unlock the potential of intermediate layers of transformers in generating the target output. Our sub-models remain integral components of the original model, minimizing storage requirements and transition costs between different computational/latency budgets. The efficacy of our proposed method was demonstrated by applying it to tune LLaMA 2 13B on the Stanford Alpaca dataset for instruction following and TriviaQA for closed-book question answering. Our results show the superior performance of sub-models in comparison to Standard Fine-Tuning and SFT+ICT (Early-Exit), all achieved with very efficient tuning and without additional memory usage during inference.