Senior Design Projects

ECS193 A/B Winter & Spring 2021

Software and algorithm development for low complexity protein sequence identification and characterization from genomic databases.

Email **********
Name Dylan Murray
Affiliation Department of Chemistry UC Davis

Project's details

Project title Software and algorithm development for low complexity protein sequence identification and characterization from genomic databases.
Background Low complexity sequences are present in 30% of the proteins encoded by the human genome. These pseudo-degenerate sequences are biased toward a subset of the twenty naturally occurring amino acid building blocks in proteins. Within the members of this class of proteins, the individual biases vary significantly. Low complexity sequence proteins have become a major focus of modern biological research due to their ability to promote self-assembly processes in living organisms. It is currently not known what characteristics of these protein sequences give rise to this fascinating behavior.
Description Motivation

Scientists in advanced research laboratories around the world are studying the self-assembly behavior of these proteins for biomedical and agricultural purposes. Experimental efforts are throughput limited and will benefit from Big Data driven experimental design. The project aims to accelerate experimental discovery in areas such as human disease and biotechnology by facilitating the mining of genomic data from humans, animals, plants, and bacteria.

Project Description

The ultimate goal of the project is to develop and implement an open source software tool that will collect low complexity protein sequences from genomic databases that contain common features specified by user-adjustable parameters. The software development team will work closely with a team of experimental scientists on the specifics of software design. The implementation of the design will occur in three stages with regular feedback and interaction with the experimental team.

■ Stage One: Optimize a protocol to use an existing algorithm to pick out low complexity sequences from databases of known genomes.
■ Stage Two: Design algorithms to detect characteristic features in low complexity sequences.
■ Stage Three: Implement a software interface for use by scientists around the world.
Deliverable A software package for use on stand alone workstations or through a web interface.
Skill set desirable N/A
Phone number **********
Client time availability 30-60 min weekly or more
IP requirement Open source project
Attachment Click here
Selected No
Team members N/A
TA N/A