Karahan Şahin

PhD Student - Research Assistant

Karahan Şahin

I am a PhD student and Research Assistant at the University of Surrey, where I work on multimodal AI for sign language technologies.

My current research focuses on sign language understanding and translation, with an emphasis on representation learning across pose, video, and language. Before starting my PhD, I worked in NLP and applied machine learning, including Turkish NLP, text summarization, conversational RAG, and LLM-based research systems.

You can find selected publications, projects, and contact links below.

Location
Guildford, United Kingdom
Current Role
PhD Student and Research Assistant at the University of Surrey
Research
Sign language technologies, multimodal learning, language understanding
Previous Work
NLP, conversational RAG, text summarization, Turkish NLP

Profile

Applied ML research with production engineering experience.

My background combines linguistics, cognitive science, and software engineering. I work across model design, data pipelines, experiment tracking, annotation tooling, backend services, and research writing. My earlier work was mostly in NLP and applied ML; my PhD research now centers on multimodal sign language representation learning and translation.

Experience

Research, engineering, and applied ML roles.

  1. Research Assistant in Deep Learning and Computer Vision

    University of Surrey - Guildford, United Kingdom
    April 2025 - Present
    • Working on SignGPT on Vision Language Toolkit as part of the research initiative
    • Managing Multi-view Motion Capture for Sign Language Production pipelines
    Python PyTorch Docker Self-supervised learning Multimodal LLMs
  2. Visiting Researcher

    Johns Hopkins University - Baltimore, MD
    Jun 2024 - Aug 2024
    • Designed a wav2vec2-based architecture adapted for sign language processing.
    • Implemented contextual representation models for continuous pose sequences with temporal dynamics suited to lower-frame-rate signing data.
    • Led multimodal LLM fine-tuning experiments with LLaVA, aligning image and pose encoders for translation quality.
    • Collaborated on self-supervised pretraining for robust pose sequence encoding.
    Python PyTorch Docker Self-supervised learning Multimodal LLMs
  3. Research Engineer

    Index Network Inc. - New York, NY
    Jan 2024 - Jul 2024
    • Worked on conversational RAG modules over decentralized vector storage systems.
    • Led a project on cross-model latent representations for information discovery.
    • Developed and deployed vector storage management services as a pub/sub application.
    Python PyTorch TypeScript Neo4j GraphQL
  4. ML Engineer

    Aposto Media Technologies - Istanbul, Turkey
    Feb 2023 - Jan 2024
    • Built production-focused research and model implementations for text summarization.
    • Developed a news trend clustering algorithm using dynamic time warping.
    • Created a user category interest model with zero-shot learning and knowledge distillation.
    PyTorch NumPy Pandas Scikit-learn MongoDB
  5. Research Engineer

    Huawei - Istanbul, Turkey
    Jul 2022 - Jan 2023
    • Developed a BERT-variant multicategory classification model for Huawei App Gallery categorization over 100 subcategories.
    • Automated text annotation with a probabilistic set theory approach for linguistically coherent classification.
    • Served as a visiting lecturer for deep learning with NLP in the COP4490 AI Applications course at Bahcesehir University.
    Hugging Face PyTorch TensorFlow SQL Technical documentation
  6. Graduate Research Assistant

    Boğaziçi University - Istanbul, Turkey
    May 2022 - Present
    • Teaching assistant for LING360 Computational Methods in Linguistics.
    • Developed an ellipsis prediction algorithm using linear dependency graph similarity.
    • Built an annotation interface with ReactJS and Flask for the Ellipsis in Turkish project.
    Python NLTK spaCy ReactJS Flask
  7. Data Engineer

    De Canaria - New York, NY, Remote
    Jul 2021 - Jan 2022
    • Developed user-agent bots for crawling job-posting websites with Scrapy.
    • Created dynamic dashboards for US and UK job databases using Tableau and MongoDB.
    Scrapy JavaScript MongoDB Linux Tableau

Education

Linguistics, cognitive science, and computational methods.

  1. Ph.D in Computer Science

    University of Surrey - Guildford, UK
    Feb 2026 - Present
    • Research on multimodal learning for sign language understanding and translation.
  2. MA in Cognitive Science

    Boğaziçi University - Istanbul, Turkey
    Sep 2022 - Jan 2025
    • Teaching and research assistant.
    • Thesis on sign language recognition and translation.
    • Member of the Perceptual Intelligence Lab.
  3. BA in Linguistics

    Boğaziçi University - Istanbul, Turkey
    Sep 2016 - Jun 2022
    • Graduated with distinction.
    • Thesis on neural morphological analysis.
    • Worked in the Text Analytics and Bioinformatics Lab.

Publications

Conference proceedings and workshop papers.

  • Decoding Sign Languages: The SL-FE Framework for Phonological Analysis and Automated Annotation Karahan Şahin, Kadir Gokgoz. 11th Workshop on the Representation and Processing of Sign Languages, 2024.
  • TransMorpher: A Phonologically Informed Transformer-based Morphological Analyzer Karahan Şahin, Umit Atlamaz. ALTNLP, 2022.
  • Overcoming the Challenges in Morphological Annotation of Turkish in Universal Dependencies Framework Talha Bedir, Karahan Şahin, Onur Gungor, Suzan Uskudarli, Arzucan Ozgur, Tunga Gungor, Balkiz Ozturk Basaran. LAW-DMR, 2021.

Projects

Research projects and initiatives.

  • SignGPT

    UKRI - June 2025 to Present
    • Working as an Research Assistant in the SignGPT project. Project lead by University of Surrey, University of Oxford and DCAL
    • Building foundation models for sign language understanding and translation with multimodal approaches.
  • Ellipsis in Turkish

    Boğaziçi University - May 2022 to Jan 2024
    • Computational modeling of Turkish ellipsis to test the limits of pretrained LLMs in hierarchical linguistic processing.
    • Built retrieval experiments over large corpora and an annotation interface with Flask, MongoDB, and ReactJS.
  • Linguistics-supported Turkish NLP Platform

    Boğaziçi University - Nov 2021 to Dec 2021
    • Annotated Turkish corpora with the Universal Dependencies framework.
    • Automated dependency annotation decisions using morphosyntactic analysis, Python, regex, and UDPipe.

Skills

Engineering stack, research tools, and languages.

  • Programming

    Python, PyTorch, Pandas, NumPy, Scikit-learn, R, HTML/CSS, JavaScript, TypeScript, SQL, MongoDB.

  • Tools

    Linux, Bash, Zsh, Docker, Git, LaTeX, Overleaf, R Markdown, Tableau, Firebase.

  • Research

    NLP, sign language recognition and translation, multimodal learning, LLM fine-tuning, self-supervised learning.

  • Languages

    Turkish native, English professional proficiency, Turkish Sign Language upper-intermediate, German beginner.