Samuel Paik's VLIW bibliography
WARNING: you may want to print this one to read it...
Here is something I put together from my library last year in response
to a question about VLIW. It hasn't been updated.
Date: Wed, 10 Mar 93 03:31:55 -0500
From: paik@mlo.dec.com (Samuel S. Paik)
Subject: Re: VLIW Architecture
In article you write:
> I have to write a presentation about VLIW Architecture and the only
> reference I found in my University's library is :
Here are some references:
Note that many of the references are on compiler techniques (and are
by no means complete!) since they are very important for VLIW
architectures.
>From my library:
B. Ramakrishna Rau, David W. L. Yen, Wei Yen, and Ross A. Towle. "The
Cyrda 5 Departmental Supercomputer: Design Philosophies, Decisions,
and Trade-offs." Computer, Vol. 22, No. 1, January 1989, pp. 12-35.
Describes the Cydra 5, a commercial VLIW based on polycyclic
("directed dataflow") architecture concepts. A nice discussion
of their architecture and the underlying design decisions.
Alexandru Nicolau. Percolation Scheduling: A Parallel Compilation
Technique. TR 85-678, Dept. of Computer Science, Cornell University,
May 1985.
Describes percolation scheduling, a technique for moving code past
basic block boundaries based on program transformations. Uses the
BULLDOG compiler as a base.
Alexander Aiken and Alexandru Nicolau. Optimal Loop Parallelization.
Cornell University, Dept of Computer Science Technical Report. Also
in Proceedings of the 1988 ACM SIGPLAN Conference on Programming
Language Design and Implementation, June 1988.
An early work on software pipelining, by greedy scheduling on the
unrolled loop.
Kemel Ebcioglu. "Some Design Ideas for a VLIW Architecture for
Sequential-Natured Software." Proc. IFIP WG 10.3 Working Conference
on Parallel Processing, 1988, pp. 3-17.
Describes a VLIW design with non-trapping instructions, predicated
execution, and multi-way branches.
Shekhar Borkar, Robert Cohn, George Cox, Sha Gleason, Thomas Gross, H.
T. Kung, Monica Lam, Brian Moore, Craig peterson, John Pieper, Linda
Rankin, P. S. Tseng, Jim Sutton, John Urbanski, and Jon Webb. "iWarp:
An Integrated Solution to High-Speed Parallel Computing."
Supercomputing '88, pp. 330-339.
Describes iWarp, a long instruction word processor with extensive
communication capabilities intended for systolic applications.
Joseph A. Fisher. "The VLIW Machine: A Multiprocessor for Compiling
Scientific Code." Computer, July 1984, pp. 45-53.
Brief description of Bulldog and ELI.
Rajiv Gupta. "Employing Register Channels for the Exploitation of
Instruction Level Parallelism." Second ACM SIGPLAN Symposium on
Principles and Practice of Parallel Programming, SIGPLAN Notices, Vol.
25, No. 3, March 1990, pp. 118-127.
Describes a multiprocessor architecture with extensions to give some
of the advantages of VLIW architectures.
Guang R. Gao, Yeu-Bong Wong, and Qi Ning. A Timed Petri-Net Model for
Fine-Grain Loop Scheduling. Proceedings of the ACM SIGPLAN '91
Conference on Programming Language Design and Implementation, SIGPLAN
Notices, Vol. 26, No. 6, June 1991, pp. 204-218.
A different model for scheduling loops.
Suneel Jain. "Circular Scheduling: A New Technique to Perform
Software Pipelining." Proceedings of the ACM SIGPLAN '91 Conference
on Programming Language Design and Implementation, SIGPLAN Notices,
Vol. 26, No. 6, July 1991, pp. 219-228.
A simple technique to do software pipeliing.
Hester Bakewell, Donna J. Quammen, Pearl Y. Wang. "Mapping Concurrent
Programs to VLIW Processors." Third ACM SIGPLAN Symposium on
Principles and Practice of Parallel Programming, SIGPLAN Notices, Vol.
26, No. 7, July 1991, pp. 21-27.
Examines mapping occam (CSP) programs to iWarp.
B. R. Rau, M. Lee, P. P. Tirumalai, M. S. Schlansker. "Register
Allocation for Software Pipelined Loops." Proceedings of the ACM
SIGPLAN '92 Conference on Programming Language Design and Implementation,
SIGPLAN Notices, Vol. 27, No. 7, July 1992, pp. 283-299.
Discusses register allocation on the Cydra 5, which had a "rotating"
register file.
Carl E. Love, Harry F. Jordan. "An Investigation of Static Versus
Dynamic Scheduling." Proceedings of the 17th Annual International
Symposium on Computer Architecture, Computer Architectecture News,
Vol. 18, No. 2, June 1990, pp. 192-201.
Compares a VLIW to a decoupled architecture. I'm not very
confident in their findings.
Samuel Ho, Lawrence Snyder. "Balance in Architectural Design."
Proceedings of the 17th Annual International Symposium on Computer
Architecture, Computer Architectecture News, Vol. 18, No. 2, June
1990, pp. 302-310.
Attempts to derive analytic formulae for design and analysis of
hardware features based on a performance metric.
Robert Alverson, David Callahan, Daniel Cummings, Brian Koblenz, Allan
Porterfield, Burton Smith. "The Tera Computer System." Proceedings
of the 1990 International Conference on Supercomputing, Computer
Architecture News, Vol. 18, No. 3, September 1990, pp. 1-22.
Describes the Tera computer system, a medium-scale parallel, LIW
architecture, designed for very high speed, latency tolerance, and
ease of compilation.
Francois Bodin, Francois Charot. "Loop Optimization for Horizontal
Microcoded Machines." Proceedings of the 1990 International
Conference on Supercomputing, Computer Architecture News, Vol. 18, No.
3, September 1990, pp. 164-176.
Describes a software pipelining algorithm based on unrolling and
pattern recognition.
Andrew Wolfe and John P. Shen. "A Variable Instruction Stream
Extension to the VLIW Architecture." Fourth International Conference
on Architectural Support for Programming Languages and Operating
Systems, Computer Architecture News, Vol. 19, No. 2, April 1991, pp.
2-14.
Describes another parallel architecture that can act like a VLIW.
William Mangione-Smith, Santosh G. Abraham, Edward S. Davidson.
"Vector Register Design for Polycyclic Vector Scheduling." Fourth
International Conference on Architectural Support for Programming
Languages and Operating Systems, Computer Architecture News, Vol. 19,
No. 2, April 1991, pp. 154-163.
Extensions of polycyclic scheduling (i.e. directed dataflow) to
vector computers.
B. Ramakrishna Rau. "Pseudo-Randomly Interleaved Memory."
Proceedings of the 18th Annual International Symposium on Computer
Architecture, Computer Architecture News, Vol. 19, No. 3, May 1991,
pp. 74-83.
Discusses high-bandwidth memory systems needed by vector and VLIW
architectures.
Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Nancy J. Warter,
Wen-mei W. Hwu. "IMPACT: An Architectural Framework for
Multiple-Instruction-Issue Processors." Proceedings of the 18th
Annual International Symposium on Computer Architecture, Computer
Architecture News, Vol. 19, No. 3, May 1991, pp. 266-275.
Discusses a framework for testing compiler techniques for multiple
issue (superscalar and VLIW) architectures.
Tzi-cker Chiueh. "Multi-Threaded Vectorization." Proceedings of the
18th Annual International Symposium on Computer Architecture, Computer
Architecture News, Vol. 19, No. 3, May 1991, pp. 352-361.
Discusses a "compromise" between vector and VLIW architectures. Seems
to be fairly close to the polycyclic Cydra 5.
Stephen W. Keckler and William J. Dally. "Processor Coupling:
Integrating Compile Time and Runtime Scheduling for Parallelism."
Proceedings of the 19th Annual International Symposium on Computer
Architecture, Computer Architecture News, Vol. 20, No. 2, May 1992,
pp. 202-213.
Another parallel/VLIW hybrid.
Scott A. Mahlke, William Y. Chen, Wen Mei W. Hwu, B. Ramakrishna Rau,
Michael S. Schlansker. "Sentinel Scheduling for VLIW and Superscalar
Processors." Proceedings of the Fifth International Conference on
Architectural Support for Programming Languages and Operating Systems,
SIGPLAN Notices, Vol. 27, No. 9, Sept 1992, pp. 238-247.
Introduces hardware and software support for correct dectection of
exceptions with speculative execution.
Michael D. Smith, Monica S. Lam, and Mark Horowitz. "Boosting Beyond
Static Scheduling in a Superscalar Processor." Proceedings of the
17th International Symposium on computer Architecture, Computer
Architectecture News, Vol. 18, No. 2, June 1990, pp. 344-354.
Discusses a hardware technique for side-effect free speculative
execution with correct dectection of exceptions.
Michael D. Smith, Mark Horowitz, Monica S. Lam. "Efficient
Superscalar Performance Through Boosting." Proceedings of the Fifth
International Conference on Architectural Support for Programming
Languages and Operating Systems, SIGPLAN Notices, Vol. 27, No. 9, Sept
1992, pp. 248-259.
Describes a code scheduling algorithm similar to percolation, with
extensions for processors with "Boosting."
These references (which I don't have) are referenced in the above papers:
"Cydra 5 Departmental Supercomputer Product Summary." Cydrome, Inc.
Milipitas, Ca., 1988.
B. R. Rau, C. D. Glaeser, and R. L. Picard. "Efficient Code
Generation for Horizontal Architectures: Compiler Techniques and
Architectural Support." Proc. Ninth Ann. Int'l Symp. Computer
Architecture, M411, Computer Society Press, Los Alamitos, Ca, 1982.,
pp. 131-139.
J. A. Fisher. "Very Long Instruction Word Architectures and the
ELI-512." Proc. 10th Ann. Int'l Symp. Computer Architecture, M473,
Computer Society Press, Los Alamitos, Ca, 1983, pp. 140-150.
A. E. Charlesworth. "An Approach to Scientific Array Processing: The
Architectural Design of the AP-120B/FPS-164 Family." Computer, Vol.
14, No. 9, Sept. 1981, pp. 18-27.
Y. N. Patt, W.-M. Hwu, and M. Shebanow. "HPS, a New
Microarchitecture: Rationale and Introduction." Proc. 18th Ann.
Workshop Microprogramming, M653, Computer Society Press, Los Alamitos,
Ca, 1985, pp. 103-108.
D. Cohen. "A Methodology for Programming a Pipeline Array Processor."
Proc. 11th Ann. Workshop Microprogramming, M204, Computer Society
Press, Los Alamitos, Ca, 1978, pp. 82-89.
J. R. Ellis. Bulldog: A Compiler for VLIW Architectures. MIT Press,
Cambridge, Ma, 1986.
J. A. Fisher, J. R. Ellis, J. C. Ruttenberg, and A. Nicolau. "Parallel
Processing: A Smart Compiler and a Dumb Machine." Proc. of the ACM
Symposium on Compiler Construction, 1984.
J. A. Fisher. "An Effective Packing Method for Use with 2^n-way Jump
Instruction Hardware." Proc 13th Ann. Microprogramming Workshop,
SIGMICRO, 1980.
J. A. Fisher. "Trace Scheduling: A Technique for Global Microcode
Compaction." IEEE Transactions on Computers, Vol. C-30, No. 7, July
1981, pp. 478-490.
H. T. Kung. "Let's design Algorithms for VLSI Systems." Proceedings
of the Conferecne on Very Large Scale Integration: Architecture,
Design, Fabrication. California Institute of Technology, January
1979, pp. 65-90.
A. Nicolau. Parallelism, Memory Anti-Aliasing, and Correctness for
Trace Scheduling Compilers. Yale University Ph. D. Thesis, 1984.
A. Nicolau and J. Fisher. "Measuring the Parallelism Available for
Very Long Instruction Word Architectures." IEEE Transactions on
Computers, November 1984.
A. Nicolau. Loop Quantization, or Unwinding Done Right. Cornell
University, Dept of Computer Science Technical Report, 1984.
A. Aiken and A. Nicolau. Loop Quantization: An Analysis and Algorithm.
Tech Rep. 87-821, Cornell University, 1987.
A. Aiken and A. Nicolau. "A Development Environment for Horizontal
Microcode" IEEE Transactions on Software Engineering, Vol. 14, No. 5,
May 1988, pp. 584-594. Also available as Cornell Tech Report TR
86-785.
A. Aiken and A. Nicolau. "Perfect Pipelining: A New Loop
Parallelization Technique." European Symposium on Programming, pp.
221-235, Springer-Verlag Lecture Notes in Computer Science No. 300.
Also available as Cornell Tech Rep. TR 87-873.
J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. "Conversion
of Control Dependence to Data Dependence." Proc. of the 1983 Symp. on
Principles of Programming Langugages, Jan 1983, pp. 177-189.
R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth, P. K.
Rodman. "A VLIW Architecture for a Trace Scheduling Compiler."
Proceedings of the Second International Conference on Architectural
Support for Programming Languages and Operating System, 1987, pp.
180-192. Also in IEEE Transacations on Computers, Vol. 37, No. 8,
August 1988, pp 967-979.
K. Ebcioglu. "A Compilation Technique for Software Pipelining of Loops
with Conditional Jumps." Proc. Micro-20, ACM Press, Dec 1987.
J. A. Fisher. The Optimization of Horizontal Microcode within and
beyond Basic Blocks: An Application of Processor Scheduling with
Resources. Ph. D. Thesis, Dept of Computer Science, New York
University, Oct 1979.
J. A. Fisher and J. J. O'Donnel. "VLIW Machines: Multiprocessors We Can
Actually Program." Proc. Compcon '84, February 1984.
C. C. Foster and E. M. Riseman. "Percolation of Code to Enhance
Parallel Dispatching and Execution." IEEE Transactions on Computers,
December 1972, pp. 1411-1415.
K. Karplus and A. Nicolau. "Efficient Hardware for Multi-way Branches
and Pre-fetches." Proc. of the 18th Annual Workshop on
Microprogramming, 1985.
J. Lah and D. E. Atkins. "Tree Compaction of Microprograms." Proc.
16th Annual Microprogramming Workshop, Oct 1983.
B. R. Rau and C. D. Glaeser. "Some Scheduling Techniques and an
Easily Schedulable Horizontal Architecture for High Performance
Scientific Computing." Proc. 14th Annual Microprogramming Workshop,
October 1981, pp. 183-198.
M. S. Lam. A Systolic Array Optimizing Compiler. Ph. D. Thesis,
Carnegie Mellon University, May 1987.
M. Lam. "Software Pipelining: An Effective Scheduling Technique for
VLIW Machines." ACM Sigplan '88 Conference on Programming Language
Design and Implementation, June 1988, pp. 318-328.
R. Gupta and M. L. Soffa. "A Reconfigurable LIW Architecture." Proc.
of the Internation Conf. on Parallel Processing, Aug 1987, pp.
893-900.
R. Gupta and M. L. Soffa. "Region Scheduling: An Approach for
Detecting and Redistributing Parallelism." IEEE Transactions on
Software Engineering.
R. Gupta and M. L. Soffa. "Compilation Techniques for a
Reconfigurable LIW Architecture." The Journal of Supercomputing, Vol.
3, 1989, pp. 271-304.
A. Aiken. Compaction-based parallelization. Ph. D. Thesis, Tech
Report TR 88-922, Cornell University, 1988.
A. Aiken and A. Nicolau. "A Realistic Resource-Constrained Software
Pipelining Algorithm." Proceedings of the Third Workshop on
Programming Languages and Compilers for Parallel Computing, Irving,
Ca, August 1990.
Kemal Ebcioglu and Alexandru Nicolau. "A Global Resource-Constrained
Parallelization Technique." ACM SIGARCH '89 International Conference
on Supercomputing, Crete, Greece, June 1989.
A. Nicolau, K. Pingali, A. Aiken. "Fine-Grain Compilation for Parallel
Machines" The Journal of Supercomputing, 1988.
A. Nicolau, K. Pingali, and A. Aiken. "Fine-Grain Compilation for
Pipelined Machines." Technical Report TR 88-934, Dept. of Computer
Science, Cornell University, 1988.
Robert Cohn, Thomas Gross, Monica Lam, and P. S. Tseng. "Architecture
and Compiler Tradeoffs for a Long Instruction Word Microprocessor."
Third International Conference on Architectural Support for
Programming Languages and Operating Systems, April 1989, pp. 1-13.
James C. Dehnert, Peter Y.-T. Hsu, and Joseph P. Bratt. "Overlapped
Loop Support in the Cydra 5" Proceedings of the Third International
Conference on Architectural Support for Programming Languages and
Operating Systems, April 1990, pp 26-38.
K. Ebcioglu and Toshio Nakatani. "A New Compilation Technique for
Parallelizing Loops with Unpredictable Branches on a VLIW
Architecture." Proceedings of the Second Workshop on Programming
Languages and Compilers for Parallel Computing, Urbana-Champaign, Il,
1989, pp. 213-229.
L. J. Hendren, et al. "Register Allocation Using Cyclic Interval
Graphs: A New Approach to an Old Problem." ACAPS Technical Memo 33,
Advanced Computer Architecture and Program Structures Group, McGill
University, Montreal, Canada, 1992.
P. Y. T. Hsu. Highly Concurrent Scalar Processing. Thesis report
UILU-ENG-86-2203. Also in ISCA-13, May 1985, pp 386-395.
B. R. Rau. "Data Flow and Dependence Analysis for Instruction Level
Parallelism." Proceedings of the Fourth Workshop on Languages and
Compilers for Parallel Computing, Santa Clara, August 1991.
B. R. Rau, et al. "Register Allocation for Modulo Scheduled Loops:
Strategies, Algorithms, and Heuristics." HP Labs Technical Report
HPL-92-48. Hewlett-Packard Laboratories, Palo Alto, Ca, 1992.
B. R. Rau, et al. "Code Generation Schema for Modulo Scheduled,
DO-Loops and WHILE-Loops." HP Labs Technical Report HPL-92-47.
Hewlett-Packard Laboratories, Palo Alto, Ca, 1992.
D. Landskov, S. Davidson, B. Shriver, and P. W. Mallett. "Local
Microcode Compaction Techniques." ACM Computing Surveys, Vol. 12, No.
3, September 1980, pp. 261-294.
M. Breternitz Jr. VLIW Compilation and Architecture Synthesis. Ph.
D. Thesis, Carnegie Mellon, 1991.
R. P. Colwell, et al. "Architecture and Implementation of a VLIW
Supercomputer." Supercomputing '90, IEEE Computer Society Press, Nov
1990, pp. 910-919.
J. Labrousse and G. Slavenberg. "A 50 MHz microprocessor with a VLIW
architecture." ISSCC '90, IEEE, 1990.
B. R. Rau, M. S. Schlansker, and D. W. L. Yen. "The Cydra 5
Stride-Insensitive Memory System." Proc. of International Conference
on Parallel Processing, 1989, pp. 242-246.
K. Anantha and F. Long. "Code Compaction for Parallel Architectures."
Software Practice and Experience, Vol. 20, No. 6, June 1990, pp.
537-554.
P. P. Chang and W. W. Hwu. "Trace Selection for Compiling Large C
Application Programs to Microcode." Proceedings of the 21st Annual
Workshop on Microprogramming and Microarchitectures, San Diego, Ca,
Nov 1988, pp. 21-29.
P. P. Chang and W. W. Hwu. "Forward Semantic: A Compiler-Assisted
Instruction Fetch Method for Heavily Pipelined Processors."
Proceedings of the 22nd Annual Workshop on Microprogramming and
Microarchitectures, Dublin, Ireland, Aug 1989.
P. P. Chang, S. A. Mahlke, W. Y. Chen, W. W. Hwu. "Code Optimization
Techniques for Multiple-Instruction-issue Architectures." Center for
Reliable and High-Performance Computing Report, University of
Illinois.
M. C. Golumbic and V. Rainish. "Instruction Scheduling Beyond Basic
Blocks." IBM Journal of Research and Development, Vol. 34, No. 1,
January 1990, pp. 93-97.
M. A. Howland, R. A. Mueller, and P. H. Sweany. "Trace Scheduling
Optimization in a Retargetable Microcode Compiler." Proceedings of
the 20th International Microprogramming Workshop, Colorado Springs,
Denver, 1987.
W. W. Hwu and P. P. Chang. "Exploiting Parallel Microarchitectures
with a Compiler Code Generator." Proceedings of the 15th Annual
International Symposium on Computer Architecture, Honolulu, Hawaii,
May 1988.
A. Nicolau. "Uniform Parallelism Exploitation in Ordinary Programs"
Proceedings of the International Conference on Parallel Processing,
Aug 1985, pp. 614-618.
G. S. Sohi and S. Vajapeyam. "Tradeoffs in Instruction Format Design
for Horizontal Architectures." Proceedings of the Third International
Conference on Architectural Support for Programming Languages and
Operating Systems, April, 1989.
P. Tirumalai, M. Lee, and M. Schlansker. "Parallelization of Loops
with Exits on Pipelined Architectures." Proceedings of Supercomputing
'90, Nov 1990.
Here are a few more from an ASPLOS V tutorial I took (copyright
A. Wolfe and J. P. Shen):
M. Breternitz Jr. "Tradeoffs Between Pipelining and Multiple
Functional Units in fine-Grain Parallelism Expoloitation" ICS, 1989.
M. Danelutto and M. Vanneschi. "VLIW-in-the-Large: A Model for Fine
Grain Parallelism Exploitation of Distributed Memory Multiprocessors"
Proceedings of the 23th Annual Workshop on Microprogramming and
Microarchitecture, November 1990, pp 7-16.
J. Labrouuse and G. Slavenbewrg. "CREATE-LIFE: A Modular Design
Approach for High Performance ASIC's" COMPCON '90.
A. Wolfe, et al. "The White Dwarf: A High-Performance
Application-Specific Processor" ISCA-15, June 1988, pp. 212-222.
B. Su, et al. "A Software Pipelining Based VLIW Architecture and
Optimizing Compiler" Proceedings of the 23th Annual Workshop on
Microprogramming and Microarchitecture, November 1990, pp. 17-27.
M. A. Schuette and J. P. Shen. "An Instruction-Level Performance
Analysis of the Multiflow TRACE 14/300" Proceedings of the 24th Annual
Workshop on Microarchitecture, November 1991, pp. 2-11.
S. Davidson, D. Landskov, B. D. Shriver, and P. W. Mallett. "Some
Experiments in Local Microcode Compaction for Horizontal Machines"
IEEE Transactions on Computers, Vol. C-30, No. 7, July 1981, pp.
460-477.
D. Bernstein, J. M. Jaffe, and M. Rabeh. "Scheduling Arithmetic and
Load Operations in Parallel with no Spilling" SIAM J. Comp, Vol. 18,
No. 6, December 1989, pp. 1098-1127.
P. Chang, W. Chen, S. Mahlke, and W. Hwu. "Comparing Static and
Dynamic Code Scheduling for Multiple-Instruction-Issue Processors"
Proceedings 24th Annual Workshop on Microarchitecture, November 1991.
H. Shapiro. "A Comparison of Various Methods for Detecting and
Utilizing Parallelism in a Single Instruction Stream" ICPP, 1977, pp.
67-76.
T. Nakatani and K. Ebcioglu. "Using a Lookahead Window in a
Compaction-Based Parallelizing Compiler" Proceedings of the 23rd
Annual Workshop on Microprogramming and Microarchitecture, November
1990, pp 57-68.
--
Samuel Paik / Digital Equipment Corporation / 3D Device Support
paik@mlo.dec.com / 508-493-4048 / I speak only for myself
People are the only mirror we have to see ourselves in.
Lois McMaster Bujold, _Mirror Dance_