Answered You can hire a professional tutor to get the answer.
Background: PROSITE (http:expasy.org/prosite/) is a database of protein domains, families and functional sites.
Background: PROSITE (http://au.expasy.org/prosite/) is a database of protein domains, families and functional sites. Each PROSITE record is often associated with a pattern or profile to describe the protein domain or functional site. Please look at the record of PDOC00300 (http://prosite.expasy.org/PDOC00300) which is a GATA-type zinc finger domain that binds to DNA sites with the consensus sequence (A/T)GATA(A/G). This type of "zinc finger" domains consist of a consensus sequence of C-x2-C-x17-C-x2-C , which means one Cys, two any amino acids, one Cys, 17 any amino acids, one Cys, two any amino acids, and one Cys. Please use this consensus sequence, and write the equivalent regular expression pattern.
1) Record the regular expression pattern:
2) Please develop a Java program that uses this regular expression pattern to search through a FASTA format file containing more than 300 protein sequences and all contain the "zinc finger" in their title lines, but not all of them contain this pattern in their sequences. In the output of the program, print out the title line, the position of the pattern in the sequence, and followed by the sequence itself. Here is the protein sequence file. Please copy and paste into a text editor and save as zincFinger.txt:
>gi|289547725|ref|NP_620138.2| zinc finger protein 653 [Homo sapiens] MAERALEPEAEAEAEAGAGGEAAAEEGAAGRKARGRPRLTESDRARRRLESRKKYDVRRVYLGEAHGPWV DLRRRSGWSDAKLAAYLISLERGQRSGRHGKPWEQVPKKPKRKKRRRRNVNCLKNVVIWYEDHKHRCPYE PHLAELDPTFGLYTTAVWQCEAGHRYFQDLHSPLKPLSDSDPDSDKVGNGLVAGSSDSSSSGSASDSEES PEGQPVKAAAAAAAATPTSPVGSSGLITQEGVHIPFDVHHVESLAEQGTPLCSNPAGNGPEALETVVCVP VPVQVGAGPSALFENVPQEALGEVVASCPMPGMVPGSQVIIIAGPGYDALTAEGIHLNMAAGSGVPGSGL GEEVPCAMMEGVAAYTQTEPEGSQPSTMDATAVAGIETKKEKEDLCLLKKEEKEEPVAPELATTVPESAE PEAEADGEELDGSDMSAIIYEIPKEPEKRRRSKRSRVMDADGLLEMFHCPYEGCSQVYVALSSFQNHVNL VHRKGKTKVCPHPGCGKKFYLSNHLRRHMIIHSGVREFTCETCGKSFKRKNHLEVHRRTHTGETPLQCEI CGYQCRQRASLNWHMKKHTAEVQYNFTCDRCGKRFEKLDSVKFHTLKSHPDHKPT
>gi|289547716|ref|NP_689492.3| zinc finger protein 585B [Homo sapiens] MPASWTSPQKSSALAPEDHGSSYEGSVSFRDVAIDFSREEWRHLDLSQRNLYRDVMLETYSHLLSVGYQV PKPEVVMLEQGKEPWALQGERPRHSCPGEKLWDHNQHRKIIGYKPASSQDQKIYSGEKSYECAEFGKSFT WKSQFKVHLKVPTGEKLYVCIECGRAFVQKPEFITHQKTHMREKPYKCNECGKSFFQVSSLFRHHRIHTG EKLYECSECGKGFPYNSDLSIHEKIHTGERHHECTDCGKAFTQKSTLKIHQKIHTGERSYICIECGQAFI QKTQLIAHRRIHSGEKPYECNNCGKSFISKSQLQVHQRVHTRVKPYICTEYGKVFSNNSNLITHEKIQSR EKSSICTECGKAFTYRSELIIHQRIHTGEKPYECSDCGRAFTQKSALTVHQRIHTGEKSYICMKCGLAFI RKAHLITHQIIHTGEKPYKCGHCGKLFTSKSQLHVHKRIHTGEKPYVCNKCGKAFTNRSNLITHQKTHTG EKSYICSKCGKAFTQRSDLITHQRIHTGEKPYECNTCGKAFTQKSNLNIHQKIHTGERQYECHECGKAFN QKSILIVHQKIHTGEKPYVCTECGRAFIRKSNFITHQRIHTGEKPYECSDCGKSFTSKSQLLVHQPVHTG EKPYVCAECGKAFSGRSNLSKHQKTHTGEKPYICSECGKTFRQKSELITHHRIHTGEKPYECSDCGKSFT KKSQLQVHQRIHTGEKPYVCAECGKAFSNRSNLNKHQTTHTGDKPYKCGICGKGFVQKSVFSVHQSSHA
Below is part of a sample output of the program:
>gi|116268103|ref|NP_001070736.1| zinc finger FYVE domain-containing protein 19 [Homo sapiens]
contains the zinc finger site: CSGCLSFSAAVPRTGNTQQKVCKQC
at locations:
103 128
MNYDSQQPPLPPLPYAGCRRASGFPALGRGGTVPVGVWGGAGQGREGRSW
GEGPRGPGLGRRDLSSADPAVLGATMESRCYGCAVKFTLFKKEYGCKNCG
RAFCSGCLSFSAAVPRTGNTQQKVCKQCHEVLTRGSSANASKWSPPQNYK
KRVAALEAKQKPSTSQSQGLTRQDQMIAERLARLRQENKPKLVPSQAEIE
ARLAALKDERQGSIPSTQEMEARLAALQGRVLPSQTPQPAHHTPDTRTQA
QQTQDLLTQLAAEVAIDESWKGGGPAASLQNDLNQGGPGSTNSKRQANWS
LEEEKSRLLAEAALELREENTRQERILALAKRLAMLRGQDPERVTLQDYR
LPDSDDDEDEETAIQRVLQQLTEEASLDEASGFNIPAEQASRPWTQPRGA
EPEAQDVDPRPEAEEEELPWCCICNEDATLRCAGCDGDLFCARCFREGHD
AFELKEHQTSAYSPPRAGQEH
3) Please modify the above print out of the sequence to include a label underneath the zinc finger consensus sequence. See below the sample output:
>gi|116268103|ref|NP_001070736.1| zinc finger FYVE domain-containing protein 19 [Homo sapiens]
contains the zinc finger site: CSGCLSFSAAVPRTGNTQQKVCKQC
at locations:
103 128
MNYDSQQPPLPPLPYAGCRRASGFPALGRGGTVPVGVWGGAGQGREGRSW
GEGPRGPGLGRRDLSSADPAVLGATMESRCYGCAVKFTLFKKEYGCKNCG
RAFCSGCLSFSAAVPRTGNTQQKVCKQCHEVLTRGSSANASKWSPPQNYK
*************************
KRVAALEAKQKPSTSQSQGLTRQDQMIAERLARLRQENKPKLVPSQAEIE
ARLAALKDERQGSIPSTQEMEARLAALQGRVLPSQTPQPAHHTPDTRTQA
QQTQDLLTQLAAEVAIDESWKGGGPAASLQNDLNQGGPGSTNSKRQANWS
LEEEKSRLLAEAALELREENTRQERILALAKRLAMLRGQDPERVTLQDYR
LPDSDDDEDEETAIQRVLQQLTEEASLDEASGFNIPAEQASRPWTQPRGA
EPEAQDVDPRPEAEEEELPWCCICNEDATLRCAGCDGDLFCARCFREGHD
AFELKEHQTSAYSPPRAGQEH