Answered You can hire a professional tutor to get the answer.

QUESTION

Background: PROSITE (http:expasy.org/prosite/) is a database of protein domains, families and functional sites.

Background: PROSITE (http://au.expasy.org/prosite/) is a database of protein domains, families and functional sites. Each PROSITE record is often associated with a pattern or profile to describe the protein domain or functional site. Please look at the record of PDOC00300 (http://prosite.expasy.org/PDOC00300) which is a GATA-type zinc finger domain that binds to DNA sites with the consensus sequence (A/T)GATA(A/G). This type of "zinc finger" domains consist of a consensus sequence of C-x2-C-x17-C-x2-C , which means one Cys, two any amino acids, one Cys, 17 any amino acids, one Cys, two any amino acids, and one Cys. Please use this consensus sequence, and write the equivalent regular expression pattern.

1) Record the regular expression pattern:

2) Please develop a Java program that uses this regular expression pattern to search through a FASTA format file containing more than 300 protein sequences and all contain the "zinc finger" in their title lines, but not all of them contain this pattern in their sequences. In the output of the program, print out the title line, the position of the pattern in the sequence, and followed by the sequence itself. Here is the protein sequence file. Please copy and paste into a text editor and save as zincFinger.txt:

>gi|289547725|ref|NP_620138.2| zinc finger protein 653 [Homo sapiens] MAERALEPEAEAEAEAGAGGEAAAEEGAAGRKARGRPRLTESDRARRRLESRKKYDVRRVYLGEAHGPWV DLRRRSGWSDAKLAAYLISLERGQRSGRHGKPWEQVPKKPKRKKRRRRNVNCLKNVVIWYEDHKHRCPYE PHLAELDPTFGLYTTAVWQCEAGHRYFQDLHSPLKPLSDSDPDSDKVGNGLVAGSSDSSSSGSASDSEES PEGQPVKAAAAAAAATPTSPVGSSGLITQEGVHIPFDVHHVESLAEQGTPLCSNPAGNGPEALETVVCVP VPVQVGAGPSALFENVPQEALGEVVASCPMPGMVPGSQVIIIAGPGYDALTAEGIHLNMAAGSGVPGSGL GEEVPCAMMEGVAAYTQTEPEGSQPSTMDATAVAGIETKKEKEDLCLLKKEEKEEPVAPELATTVPESAE PEAEADGEELDGSDMSAIIYEIPKEPEKRRRSKRSRVMDADGLLEMFHCPYEGCSQVYVALSSFQNHVNL VHRKGKTKVCPHPGCGKKFYLSNHLRRHMIIHSGVREFTCETCGKSFKRKNHLEVHRRTHTGETPLQCEI CGYQCRQRASLNWHMKKHTAEVQYNFTCDRCGKRFEKLDSVKFHTLKSHPDHKPT

>gi|289547716|ref|NP_689492.3| zinc finger protein 585B [Homo sapiens] MPASWTSPQKSSALAPEDHGSSYEGSVSFRDVAIDFSREEWRHLDLSQRNLYRDVMLETYSHLLSVGYQV PKPEVVMLEQGKEPWALQGERPRHSCPGEKLWDHNQHRKIIGYKPASSQDQKIYSGEKSYECAEFGKSFT WKSQFKVHLKVPTGEKLYVCIECGRAFVQKPEFITHQKTHMREKPYKCNECGKSFFQVSSLFRHHRIHTG EKLYECSECGKGFPYNSDLSIHEKIHTGERHHECTDCGKAFTQKSTLKIHQKIHTGERSYICIECGQAFI QKTQLIAHRRIHSGEKPYECNNCGKSFISKSQLQVHQRVHTRVKPYICTEYGKVFSNNSNLITHEKIQSR EKSSICTECGKAFTYRSELIIHQRIHTGEKPYECSDCGRAFTQKSALTVHQRIHTGEKSYICMKCGLAFI RKAHLITHQIIHTGEKPYKCGHCGKLFTSKSQLHVHKRIHTGEKPYVCNKCGKAFTNRSNLITHQKTHTG EKSYICSKCGKAFTQRSDLITHQRIHTGEKPYECNTCGKAFTQKSNLNIHQKIHTGERQYECHECGKAFN QKSILIVHQKIHTGEKPYVCTECGRAFIRKSNFITHQRIHTGEKPYECSDCGKSFTSKSQLLVHQPVHTG EKPYVCAECGKAFSGRSNLSKHQKTHTGEKPYICSECGKTFRQKSELITHHRIHTGEKPYECSDCGKSFT KKSQLQVHQRIHTGEKPYVCAECGKAFSNRSNLNKHQTTHTGDKPYKCGICGKGFVQKSVFSVHQSSHA

Below is part of a sample output of the program:

>gi|116268103|ref|NP_001070736.1| zinc finger FYVE domain-containing protein 19 [Homo sapiens]

contains the zinc finger site: CSGCLSFSAAVPRTGNTQQKVCKQC

at locations:

103 128

MNYDSQQPPLPPLPYAGCRRASGFPALGRGGTVPVGVWGGAGQGREGRSW

GEGPRGPGLGRRDLSSADPAVLGATMESRCYGCAVKFTLFKKEYGCKNCG

RAFCSGCLSFSAAVPRTGNTQQKVCKQCHEVLTRGSSANASKWSPPQNYK

KRVAALEAKQKPSTSQSQGLTRQDQMIAERLARLRQENKPKLVPSQAEIE

ARLAALKDERQGSIPSTQEMEARLAALQGRVLPSQTPQPAHHTPDTRTQA

QQTQDLLTQLAAEVAIDESWKGGGPAASLQNDLNQGGPGSTNSKRQANWS

LEEEKSRLLAEAALELREENTRQERILALAKRLAMLRGQDPERVTLQDYR

LPDSDDDEDEETAIQRVLQQLTEEASLDEASGFNIPAEQASRPWTQPRGA

EPEAQDVDPRPEAEEEELPWCCICNEDATLRCAGCDGDLFCARCFREGHD

AFELKEHQTSAYSPPRAGQEH

3) Please modify the above print out of the sequence to include a label underneath the zinc finger consensus sequence. See below the sample output:

>gi|116268103|ref|NP_001070736.1| zinc finger FYVE domain-containing protein 19 [Homo sapiens]

contains the zinc finger site: CSGCLSFSAAVPRTGNTQQKVCKQC

at locations:

103 128

MNYDSQQPPLPPLPYAGCRRASGFPALGRGGTVPVGVWGGAGQGREGRSW

GEGPRGPGLGRRDLSSADPAVLGATMESRCYGCAVKFTLFKKEYGCKNCG

RAFCSGCLSFSAAVPRTGNTQQKVCKQCHEVLTRGSSANASKWSPPQNYK

  *************************                     

KRVAALEAKQKPSTSQSQGLTRQDQMIAERLARLRQENKPKLVPSQAEIE

ARLAALKDERQGSIPSTQEMEARLAALQGRVLPSQTPQPAHHTPDTRTQA

QQTQDLLTQLAAEVAIDESWKGGGPAASLQNDLNQGGPGSTNSKRQANWS

LEEEKSRLLAEAALELREENTRQERILALAKRLAMLRGQDPERVTLQDYR

LPDSDDDEDEETAIQRVLQQLTEEASLDEASGFNIPAEQASRPWTQPRGA

EPEAQDVDPRPEAEEEELPWCCICNEDATLRCAGCDGDLFCARCFREGHD

AFELKEHQTSAYSPPRAGQEH

Show more
LEARN MORE EFFECTIVELY AND GET BETTER GRADES!
Ask a Question