PhD Student - Research Assistant
Karahan Şahin
I am a PhD student and Research Assistant at the University of Surrey, where I work on multimodal AI for sign language technologies.
My current research focuses on sign language understanding and translation, with an emphasis on representation learning across pose, video, and language. Before starting my PhD, I worked in NLP and applied machine learning, including Turkish NLP, text summarization, conversational RAG, and LLM-based research systems.
You can find selected publications, projects, and contact links below.
Profile
Applied ML research with production engineering experience.
My background combines linguistics, cognitive science, and software engineering. I work across model design, data pipelines, experiment tracking, annotation tooling, backend services, and research writing. My earlier work was mostly in NLP and applied ML; my PhD research now centers on multimodal sign language representation learning and translation.
Experience
Research, engineering, and applied ML roles.
-
Research Assistant in Deep Learning and Computer Vision
April 2025 - Present- Working on SignGPT on Vision Language Toolkit as part of the research initiative
- Managing Multi-view Motion Capture for Sign Language Production pipelines
-
Visiting Researcher
Jun 2024 - Aug 2024- Designed a wav2vec2-based architecture adapted for sign language processing.
- Implemented contextual representation models for continuous pose sequences with temporal dynamics suited to lower-frame-rate signing data.
- Led multimodal LLM fine-tuning experiments with LLaVA, aligning image and pose encoders for translation quality.
- Collaborated on self-supervised pretraining for robust pose sequence encoding.
-
Research Engineer
Jan 2024 - Jul 2024- Worked on conversational RAG modules over decentralized vector storage systems.
- Led a project on cross-model latent representations for information discovery.
- Developed and deployed vector storage management services as a pub/sub application.
-
ML Engineer
Feb 2023 - Jan 2024- Built production-focused research and model implementations for text summarization.
- Developed a news trend clustering algorithm using dynamic time warping.
- Created a user category interest model with zero-shot learning and knowledge distillation.
-
Research Engineer
Jul 2022 - Jan 2023- Developed a BERT-variant multicategory classification model for Huawei App Gallery categorization over 100 subcategories.
- Automated text annotation with a probabilistic set theory approach for linguistically coherent classification.
- Served as a visiting lecturer for deep learning with NLP in the COP4490 AI Applications course at Bahcesehir University.
-
Graduate Research Assistant
May 2022 - Present- Teaching assistant for LING360 Computational Methods in Linguistics.
- Developed an ellipsis prediction algorithm using linear dependency graph similarity.
- Built an annotation interface with ReactJS and Flask for the Ellipsis in Turkish project.
-
Data Engineer
Jul 2021 - Jan 2022- Developed user-agent bots for crawling job-posting websites with Scrapy.
- Created dynamic dashboards for US and UK job databases using Tableau and MongoDB.
Education
Linguistics, cognitive science, and computational methods.
-
Ph.D in Computer Science
Feb 2026 - Present- Research on multimodal learning for sign language understanding and translation.
-
MA in Cognitive Science
Sep 2022 - Jan 2025- Teaching and research assistant.
- Thesis on sign language recognition and translation.
- Member of the Perceptual Intelligence Lab.
-
BA in Linguistics
Sep 2016 - Jun 2022- Graduated with distinction.
- Thesis on neural morphological analysis.
- Worked in the Text Analytics and Bioinformatics Lab.
Publications
Conference proceedings and workshop papers.
- Decoding Sign Languages: The SL-FE Framework for Phonological Analysis and Automated Annotation Karahan Şahin, Kadir Gokgoz. 11th Workshop on the Representation and Processing of Sign Languages, 2024.
- TransMorpher: A Phonologically Informed Transformer-based Morphological Analyzer Karahan Şahin, Umit Atlamaz. ALTNLP, 2022.
- Overcoming the Challenges in Morphological Annotation of Turkish in Universal Dependencies Framework Talha Bedir, Karahan Şahin, Onur Gungor, Suzan Uskudarli, Arzucan Ozgur, Tunga Gungor, Balkiz Ozturk Basaran. LAW-DMR, 2021.
Projects
Research projects and initiatives.
-
SignGPT
- Working as an Research Assistant in the SignGPT project. Project lead by University of Surrey, University of Oxford and DCAL
- Building foundation models for sign language understanding and translation with multimodal approaches.
-
Ellipsis in Turkish
- Computational modeling of Turkish ellipsis to test the limits of pretrained LLMs in hierarchical linguistic processing.
- Built retrieval experiments over large corpora and an annotation interface with Flask, MongoDB, and ReactJS.
-
Linguistics-supported Turkish NLP Platform
- Annotated Turkish corpora with the Universal Dependencies framework.
- Automated dependency annotation decisions using morphosyntactic analysis, Python, regex, and UDPipe.
Skills
Engineering stack, research tools, and languages.
-
Programming
Python, PyTorch, Pandas, NumPy, Scikit-learn, R, HTML/CSS, JavaScript, TypeScript, SQL, MongoDB.
-
Tools
Linux, Bash, Zsh, Docker, Git, LaTeX, Overleaf, R Markdown, Tableau, Firebase.
-
Research
NLP, sign language recognition and translation, multimodal learning, LLM fine-tuning, self-supervised learning.
-
Languages
Turkish native, English professional proficiency, Turkish Sign Language upper-intermediate, German beginner.
Contact
Academic contact and profile links.