Waiting for answer This question has not been answered yet. You can hire a professional tutor to get the answer.

QUESTION

The genome of an organism can be expresses as some number G of"base pairs" (see http://en.wikipedia.org/wiki/Base_pair). Typicalsizes of various genomes are given in (http://en.wikipedia.org/wiki/Geno

The genome of an organism can be expresses as some number G of"base pairs" (see http://en.wikipedia.org/wiki/Base_pair). Typicalsizes of various genomes are given in (http://en.wikipedia.org/wiki/Genome).String matching can be used to find particular sequences in agenome. Several string matching algorithms are describedin (http://en.wikipedia.org/wiki/String_matching)Consider a program to find to find if a particular sequence ofbase pairs is found in a genome, and if so, where and howmany times.Your program will run on a cluster with the following properties:Number of nodes - 20  Number of processors per node 16 2.6 GHz Xeon  Memory per node               16 GB  GPU - 2 (NVIDIA CUDA) per node, 1024 stream processors and 4GB RAM, running at 1.5 GHz  local drives 1 T SATA , 6 GB/sec  NFS drive 10TB  RAID, bandwidth limited by network                Switched Ethernet networkLatency               L = 20 microsecondsBandwidth             B = 1Gb/sec == 100 Mbytes/sec for messages   larger than 32KbytesYou may not need all the above information. If you feel you need some other system property, feel free to assume some reasonable value (Try Wikipedia)Assume the genome you are exploring and the sequence you aretrying to find, are both initially files on the NFS disk.Deliverables:1. Parallel String Match algorithm - in MPI, OpenMP, CUDA or somecombination of these. Description in English and/or pseudocode issufficient. Is yoyur algorithm data parallel, task parallel or both? Describe data transfer during computation (disk to program, process to process,CPU - GPU and node - node). Describe how data is partitioned between processes, shared between processes, or replicated at each process.2. You may not need all the hardware available for your algorithm. You may use theentire cluster or any part of it. Describe what resources your algorithm will use to execute. Explain your choice.3. Estimate how your algorithm would perform on the computersystem described above. Consider:    a. Complexity; communication costs.    b. Is there some file size (in bytes, number of elements, or both)that is too small for your algorithm to work efficiently? Given the wide range of genomesizes (see http://en.wikipedia.org/wiki/Genome), is there some range of size that you expect wouldbe best for your algorithm?    c. How much speedup would you exepect on the given hardware as compared to runningon a single CPU? Justify your answer
Show more
LEARN MORE EFFECTIVELY AND GET BETTER GRADES!
Ask a Question