Protein Design for Fibroblast Growth Factor-1 (FGF)
Principal Investigator
Project Summary
The goal of the project itself is to design and build a thermo-stable variant of the human Fibroblast Growth Factor-1 (FGF) protein. FGF is an angiogenic factor: its presence induces the body to grow new arteries, capillaries, and veins. Stage-2 clinical trials are underway to test FGF as a treatment for coronary heart disease (CHD). In this method, FGF is injected directly into the heart muscle. Its presence causes new arteries and capillaries to form, providing better blood flow to the heart muscle. This treatment will, with great likelihood, supplant current treatments for CHD such as bypass surgery and angioplasty. The difficulty is that FGF is only marginally thermo-stable. At physiological temperatures, FGF converts from its active folded state to an inactive, aggregation-prone unfolded form. Because of this, FGF is expensive to produce, difficult to store for long periods of time, and less potent than it can be. In order to increase the stability of FGF, additives such as Heparin are added to the solution. Heparin is a small molecule extracted from pig and cow intestines. It has a number of potential side-effects in humans and creates a route for infectious agents to enter the body. Just recently, a contaminated batch of Heparin resulted in several hundred preventable hospital deaths in the US.
An alternative route is to design and build a modified version of FGF that is active, non-immunogenic, and more thermally stable. This can be accomplished by designing a new amino acid sequence that has the same fold as FGF, the same external surface, but increased stability. The amino-acids in the core (those that are not on the surface in the folded form) will be altered to increase its stability. The difficulty is that there are many possible sequences for the core. FGF has approximately 12 primary core positions. If any one of 20 amino acids can be placed at those positions, there are 20^12 possible sequences. If positions that flank the core are included, the total number of designable sites is closer to 30, and the number of possible sequences increases to about 20^30. Nutpack is a Monte Carlo code designed to find the optimal or near optimal sequence among these 20^12 or 20^30 possible sequences. It works by trying different sequences and packing arrangements for those sequences and selecting the best apparent sequence using a physics-based energy function. These predicted sequences will be constructed by a collaborator (Dr. Michael Blaber, College of Medicine, FSU) and compared with the predictions of Nutpack. We are planning to decide upon which sequences to build by the end of summer at the latest. Construction of these mutant FGF's will start immediately after.
Software
The program is currently called Nutpack. It is written in ANSI C++ using only standard C++ libraries plus MPI libraries. So far, portability has not caused us any problems. The executable itself is under 2MB in size; however, during execution it currently uses about 2GB of memory from the heap. This fairly large memory usage was a tradeoff between speed and size. We store some quantities rather than recomputing them at every Monte Carlo step. I am currently switching back to a non-memory-intensive version. This should run about 5x slower but will allocate less than 200MB of dynamic memory.
The I/O is relatively simple. After program start-up, the code reads in four files: input.txt, designFile.txt, start.pdb, and
DunbrackMod?.txt. The first file is a text file with commands to execute. The second file is a small text file which contains a list of protein amino-acid positions to modify and the set of amino-acids that can be put at those positions. The third file is a small structure file, containing the starting structure of the system. The fourth file is 50MB in size and contains a library of side-chain structures (Rotamers). After these files are read-in by the head node, they are passed to all the other nodes. This is followed by the Monte Carlo calculations, which take the majority of the executable time. Besides stdout and stderr, the only output occurs at the end of the program. About 1000 structure files (text) are produced: p.0000.pdb through p.1000.pdb. All the output is from the head node.
Project Resources
USF HPC
Number of procs and duration available for project:
- 44 Processors for 48 Hours
Date Available:
Operating System and Kernel:
- CentOS? 5.1, Kernel 2.6.18-53.1.14
Libraries:
- OpenMPI? 1.2.6 (w/ gcc 4.1, Intel 9.1, and PGI 7.0/7.1)
Available Hardware:
- 22 x Dual Opteron 248 @ 2.2 GHz, 8GB RAM, Myrinet w/ MX stack and 2.1 TB PVFS2 parallel file system
Job Manager:
System Documentation:
Systems Administrator Contact:
FSU HPC
Number of procs and duration available for project:
Date Available:
Operation system and Linux Kernel
- CentOS release 4.4 / Kernel 2.6.9-42.0.2.ELsmp
Libraries:
- OpenMPI (GNU, Intel, or PGI)
- MPICH v1 (GNU, Intel, or PGI)
- MPICH v2 (GNU, Intel, or PGI)
Hardware:
- Two Dual core AMD opteron 2220, 16GB RAM per compute node
- 100 GB shared parallel storage on Panasas file system
- DDR Infiniband
Job Manager/Scheduler:
System Documentation:
Systems Admin Contact:
UF HPC
Number of procs and duration available for project:
Date Available:
Operation system and Linux Kernel
- CentOS release 4.5 / Kernel 2.6.18-8.1.14.el5.L-1642
Libraries:
- OpenMPI (GNU, Intel)
- MPICH v1 (GNU, Intel)
- MPICH v2 (GNU, Intel)
Hardware:
- Two dual core AMD Opteron 275, 4GB or 8GB RAM per compute node
- 1 TB shared parallel storage on Lustre file system
- SDR Infiniband
Job Manager/Scheduler:
System Documentation:
Systems Admin Contact:
Notes