Sandra and Brage successfully defended their Master thesis!


Date
May 7, 2023 1:00 PM — 3:00 PM
Location
Chalmers University of Technology, Sweden

abstract: The annotation of CRISPR-related articles and extraction of key content has traditionally relied on manual efforts. Manual annotation is error-prone and timeconsuming. This thesis presents an alternative approach using transfer learning and pre-trained models based on the Transformer architecture. Specifically, Sentence Transformer models are fine-tuned using a CRISPR-related dataset. The dataset contains articles and key sentences, enabling automatic extraction of keyphrases. The study explores various modifications to the models and data to enhance performance for this task. The results demonstrate the effectiveness of fine-tuning Sentence Transformer models for keyphrase extraction, achieving an Average R-precision of 90.4 %. Future research could focus on alternative approaches or further automation to identify entities and relations within key sentences. Key sentence extraction is complex due to the varying definitions of key content, content location, and specific use cases. However, the potential benefits of time savings and improved workflow efficiency make this approach highly valuable.

Read the full thesis here.

Kudos 👏🏻 to Sandra and Brage for perfectly conducting their Master thesis work. The model that they developed is now being used in AddCell’s CRISPR search engine and greatly improves the search results. The thesis was conducted in a collaboration between life science and computer science departments at Chalmers University of Technology and under the supervision of Mehrdad Farahani and Rasool Saghaleyni. Special thanks to Richard Johansson for his valuable feedback and support.

Rasool Saghaleyni
Rasool Saghaleyni
Staff Research Scientist

My research interests include multi-omics integration, metabolic modeling, genome editing and structural biology.