Monday, November 28, 2005

The human computer

Writing the fastest code, by hand, for fun
By John Markoff
http://news.com.com/Writing+the+fastest+code%2C+by+hand%2C+for+fun/2100-1022_3-5972844.html
Story last modified Sun Nov 27 19:50:00 PST 2005

SEATTLE--There was a time long ago when the word "computer" was a job description referring to the humans who performed the tedious mathematical calculations for huge military and engineering projects.

It is in the same sense that Kazushige Goto's business card says simply "high performance computing."

Goto, who is 37, might even be called the John Henry of the information age.

But instead of competing against a steam drill, Goto, a research associate at the Texas Advanced Computing Center at the University of Texas at Austin, has bested the work of a powerful automated system and entire teams of software developers in producing programs that run the world's fastest supercomputers.

He has done it alone at his keyboard the old-fashioned way--by writing code that reorders, one at a time, the instructions given to microprocessor chips.

At one point recently, Goto's software--collections of programs called subroutines--dominated the rarefied machines competing for the title of the world's fastest supercomputer. In 2003 his handmade code was used by seven of the 10 fastest supercomputers. (The Japanese Earth Simulator, which was then the world's fastest machine, however, did not use his software.)

In the most recent ranking of supercomputers, IBM machines overtook a number of supercomputers using Goto's software to capture the top three spots in the fastest computer rankings. Still, the Goto Basic Linear Algebra Subroutines, or BLAS, as his programs are known, were used by four of the world's 11 fastest computers.

Goto has become a legend in the supercomputing community because of his solitary crusade. And he shows no signs of flagging in the contest to wring every ounce of computing speed from the world's fastest microprocessor chips.

But for all the acclaim he has received, Goto is a relative newcomer to the supercomputing field, having made his breakthrough about a decade ago.

"At first I didn't know anything," he said in an interview at the annual supercomputing conference held in Seattle in mid-November. "This was all trial and error, but now I have experience."

The value of his work goes far beyond setting speed records. Because his programs can more efficiently solve complex linear equations, they can offer better solutions to virtually every computational science and engineering problem. For example, the subroutines are used in simulation programs to model the flow of air over the surface of a plane or a car more precisely.

Chip versus the hand
One of Goto's principal rivals is a software project known as Atlas, created by a group of researchers working with Jack Dongarra, a computer scientist at the University of Tennessee. Atlas is an automated effort to find the most efficient way to solve linear algebra functions for specific microprocessors--a task that Goto does meticulously by hand.

Like chess-playing software, the Atlas project tries to overcome the shortcomings of different kinds of computer designs by systematically testing thousands of solutions for each chip to find the most efficient one for each type of microprocessor.

By contrast, Goto uses only a program called a software debugger that allows him to track how data moves among different components of a microprocessor.

He then reorganizes the individual software instructions so that his subroutines perform crucial algebraic functions more quickly to gain small amounts of processing speed from a specific type of computer chip.

Typically these are highly repetitive operations that can consume vast amounts of computing capacity. For example, one challenging type of calculation requires the microprocessor to multiply numbers from two tables stored in memory together.

Dongarra acknowledges that Goto's hand-tuned programs are more efficient and can still outperform Atlas.

"I tell them that if they want the fastest they should still turn to Goto," said Dongarra, who is one of the researchers who maintains the Top500 listing of the world's fastest-performing computers from a computing speed race held twice a year.

Goto came to his passion for supercomputing almost by accident. Educated in power engineering at Waseda University in Tokyo, he worked as an employee of the Japanese Patent Office, doing research on early inventions like video recorders.

To help in his work, Goto purchased a Digital Equipment workstation based on the Alpha microprocessor in 1994 to perform a simulation.

But when it arrived he could not understand why it was performing so slowly. So he explored the Alpha's design to see where the performance bottlenecks were.

He later purchased a second Alpha-based computer and by rewriting the crucial subroutines was able to improve its performance to 78 percent of its theoretical peak calculating speed, up from 44 percent.

No formal training
Although he was not formally trained in computer or software design, he perfected his craft by learning from programmers on an Internet mailing list focusing on the Linux operating system for the Alpha chip. His curiosity quickly became a passion that he pursued in his free time and during his twice daily two-hour train commute between his job in Tokyo and his home in Kanagawa Prefecture.

"I would frequently work on these problems until midnight," he said. "I did it to relax."

As a teenager, Goto developed a passion for electronic design, building his own stereo equipment from the most basic components.

His current interest, he says, is not in the pure mathematics of the linear equations, but rather in finding clever ways to overcome the shortcomings of the architecture and internal organization of microprocessors that are used in every kind of computer, from hand-held devices to supercomputers.

Modern computers are organized to offer the programmer a hierarchical series of data storage areas that range from the computer's disk drive DRAM memory, as well as relatively small temporary memory areas called caches. Typically, the fastest memories are also the smallest.

One of the simplest ways to speed a program is to keep the calculation in the memory unit, which is closest to the microprocessor's calculating engine.

Every time the calculation engine is required to stop what it is doing to get new data from a more distant memory area, processing speed slows. But in some cases, keeping data in the closest memory cache may not be as efficient as keeping it in a larger cache that is farther away.
Robert A. van de Geijin, a computer scientist who works with Goto at the Texas Center, said that Goto's special skill was in the step-by-step reordering of software instructions to take the greatest advantage of the performance trade-offs offered by each type of chip.

"He combines both scientific insight and engineering skills," van de Geijin said.
They met in 2002 when Goto took a sabbatical from his job at the patent office to spend a year at the Texas center. (He has since resigned from the patent office.)

Once Goto arrived in Texas, he turned his attention to optimizing the speed of the Pentium 4 microprocessor. When computer scientists at the University at Buffalo added Goto BLAS to their Pentium-based supercomputer, the calculating power of the system jumped from 1.5 trillion to 2 trillion mathematical operations per second out of a theoretical limit of 3 trillion.

The increase was so astounding that the record keepers for supercomputing Top500 called the researchers in Buffalo because they did not think such a speed was credible.

"I teased them and suggested that the speed of light was faster in Buffalo than it was in Tennessee," van de Geijin recalled.

Recently there has been a quiet controversy around the Goto BLAS because Goto has been slow to offer his work as open-source software, the free model of software distribution.

Some programmers have suggested that Goto has not joined the open-source movement because he wants to protect his secrets and strategies from competitors.

That is not so, he said recently, noting that the Goto BLAS software is freely available for noncommercial use. And he said he was preparing an open-source version.

He said his next big challenge was to expose chip designers to his ideas to help speed their processors.

"Computer architects are stubborn," he observed. "They have their own ideas." His ideas on computing efficiency, he said, speak for themselves.

Entire contents, Copyright © 2005 The New York Times. All rights reserved.
Copyright ©1995-2005 CNET Networks, Inc. All rights reserved.

0 Comments:

Post a Comment

<< Home