Newsgroups: comp.parallel
From: lpease@admin.ogi.edu (Linda M. Pease)
Subject: OGI High Performance Computing short courses
Organization: Oregon Graduate Institute of Science & Technology
Date: 29 May 1995 15:16:32 GMT
Message-ID: <3qcogg$21f@usenet.srv.cis.pitt.edu>

Oregon Graduate Institute of Science & Technology, Office of Continuing
Education, offers three short courses on High Performance Computing at the
OGI campus near Portland, Oregon. This information can also be found on
the World Wide Web at http://www.ogi.edu/Continuing-Ed/.

HIGH PERFORMANCE COMPUTING
      Delivering high performance from modern computing systems requires
more than aggressive semiconductor technology.  Novel processor and system
architectures are required, as well as software that is tuned to the
capabilities and limitations of the system.  This three-week sequence
studies modern high performance processor and system architectures, as
well as analyses and transformations that compilers use to optimize
performance for these systems.


HIGH PERFORMANCE COMPUTER SYSTEMS      M-F, July 17-21,1995, 8:30am-5pm

Course fee: $1,695 (includes course instruction, materials, break
refreshments and lunches, Monday night reception and Thursday night
dinner)

Course Instructor: Michael Wolfe
   
      Modern computer systems achieve high performance through a variety
of methods.  Within a processor, pipelined functional units and the
ability to issue multiple instructions at a time contribute to the final
goal. Other options are to support instructions executing out of order, to
use aggressive branch lookahead or prediction, and to integrate high speed
cache memories with the processor. Each solution brings with it a number
of different problems.  Choosing an appropriate processor architecture
requires a careful balance between the available technology, the
instruction set and various mechanisms to improve the instruction
bandwidth.

      Additional performance improvements are possible by combining
multiple processors in a single system. Many multiprocessor designs have
been proposed and constructed; with small modifications, standard
processors can be used in multiprocessor systems, ranging from shared
memory mainframes to scalable message passing multicomputers and even
networks of workstations. The best performance comes from optimizing the
network latency and bandwidth, staying within the constraints of an
affordable system. Another option is to change the processor architecture
specifically to support multiprocessor configurations. In certain
applications, it is effective to design and build special-purpose systems
that are optimized for a small class of problems.  This course presents
the many architectural solutions to performance problems.  Numerous
examples are used to demonstrate problems and how the solution addresses
each problem, and what new problems arise from that solution.

Course Outline

      Limits of semiconductor technology, faster cycle times, deeper
      pipelines, VLIW vs. superscalar control units, out-of-order    
      execution, reservation stations, multimedia instructions.

      Interleaved memory banks, cache memory, Harvard 
      architecture, multilevel caches, performance characteristics,
      virtual memory support.

      Vector registers, vector instructions, performance measurement.

      Consistency, cache coherence, races, synchronization,        
      deadlock, speedup.

      Network topologies and metrics, routing, conflict resolution,
      combining networks.

      SIMD systems, front-end/back-end, MIMD systems,             
      commodity processors.

      Latency management, cache directories, scalable coherent
      interface (SCI), cache-only memory architecture, software-
      assisted cache coherence, novel processor architectures:
      decoupled, multithreading.

      Systolic arrays, regular algorithms, regular structures.

      Historical perspective, static dataflow, tagged token
      dataflow, explicit token store, I-structures.


COMPILER ANALYSIS AND OPTIMIZATION     M-F, July 24-28, 1995, 8:30am-5pm

Course fee: $1,695 (includes course instruction, materials, break
refreshments and lunches, Monday night reception and Thursday night
dinner)

Course Instructor: Michael Wolfe
      
      Many classical code analysis and optimization algorithms have been
developed over the past 35 years.  While these are firmly based on a solid
theoretical foundation, many optimizations have become significantly more
complex due to aggressive processor architectures.  For instance, register
allocation and instruction scheduling are complicated because pipeline and
memory latencies, functional unit and datapath reservations, and explicit
delayed operations must be taken into account.  Moreover, with the
parallelism inherent in current processor designs, code generation options
can be used that would not make sense on a simple sequential processor.

Course Outline

      Front end, optimizer, code generation, peephole 
      optimizations, basic blocks, control flow graph, expression 
      trees or DAGs.

      Logic, sets, relations, lattices, monotonicity, graphs, trees,
      graph algorithms, strongly connected components,dominators.

      Monotone data-flow framework, iterative solution, live
      variables, reaching definitions, dominators, syntax-based            
      analysis, interval analysis, slotwise analysis, sparse
      evaluation graphs. 

      Dominator trees, fast dominator algorithm, loop analysis.

      Within a basic block, across basic blocks, constant 
      propagation, availability, redundancy elimination, value
      numbering, range-check optimizations.

      Efficient use-def chains, constant propagation.

      Code floating, partial redundancy elimination, strength
      reduction, linear test replacement, unswitching.

      Leaf procedures, tail recursion, tail calls, constant 
      propagation, modify/reference analysis, parameter              
      alias analysis, call graph construction.

      Loop rotation, procedure integration, loop unrolling,
      basic block cloning.

      Optimality, local allocation across basic blocks.

      Coloring algorithms, spill heuristics, coalescing, live           
      range splitting.

      Hierarchical allocation, allocation in loops, cliques.

      Basic block scheduling, filling delay slots, scheduling
      across basic blocks, code replication, control dependence, 
      speculative scheduling, conditional execution, interaction 
      with register allocation.

      Software pipelining, loop unrolling.      

      Tail merging, jump optimizations, branch prediction, 
      instruction placement.
 
      Currency of values, reporting values and position.



 
COMPILERS FOR PARALLEL COMPUTING       M-F, July 31-August 4, 8:30am-5pm

Course fee: $1,695 (includes course instruction, materials, break
refreshments and lunches, Monday night reception and Thursday night
dinner)

Course Instructor: Michael Wolfe 

      Automatic identification of parallelism and restructuring is now
well-accepted technology, and is the basis of commercial vectorizing and
autotasking compilers.  Optimizing for multiprocessor systems should take
into account such architectural features as memory locality, memory
latency and interprocessor communication.  The recent Fortran 90 (F90) and
High Performance Fortran (HPF) languages require additional analysis and
optimization for effective translation. In addition, the course will
review other proposed parallel languages and study the compilation issues.

Course Outline

   Front end, high-level optimizations, low-level optimizations,        
   code generation, peephole optimizations, basic blocks, control
   flow graph, parallel language constructs, dynamic memory
   allocation.

   Graphs, linear algebra, systems of linear equalities and
   inequalities, solution methods, Fourier-Motzkin projection.

   Dominators, control dependence, loops.

   Types of dependence, distance vectors, direction vectors.

   Use-def chains, induction variables, constant propagation,        
   scalar dependence analysis.

   Dependence equations, solvers, effectiveness of solvers. 

   I/O, static aliasing, structure aliasing, pointer aliasing,
   dynamic pointer allocation, array kill analysis.

   Hierarchical task graph.

   Loop fusion, fission, reversal, interchanging, skewing,
   general linear transformations, strip mining, tiling.

   Characterizing reuse, tiling, prefetching.

   Array assignments, FORALL statements, conformance           
   checking.

   Vectorization, vector loops, array assignments, scalar 
   expansion, strip mining, reductions.

   HPF data layout directives, conformance checking, HPF       
   parallel loops, guard regions, interprocedural analysis, 
   extensions.

   Data layout, parallel code generation, remote data access,
   automatic data layout, communication optimizations, 
   node program generation, sparse matrix  implementations.

   Global cache coherence, local cache coherence, latency
   tolerance. 

   Dataparallel C, SISAL, Crystal.


  About the Instructor
      Michael Wolfe, Ph.D., is an associate professor of Computer Science
and Engineering at Oregon Graduate Institute of Science & Technology.  His
research interests are compiler optimizations for high performance
computer systems, programming languages and computer architecture.  Dr.
Wolfe received his Ph.D. from the University of Illinois in 1982, where he
was one of the key developers in the Parafrase project.  He co-founded
Kuck and Associates, Inc., a leading developer of commercial parallelizing
compiler tools, and was vice president until 1988, when he joined OGI.  He
has served on numerous conference program committees, including POPL988,
DM9C991, Supercomputing992, PPoPP993, and PLDI994, and is program chair
for ICS995 in Barcelona.  He has been invited to give lectures in the
United States, Europe and Japan.  His research group at OGI is exploring
the effectiveness and efficiency of novel compiler optimization
techniques.


For a complete course brochure contact:
Linda M. Pease, Director
Office of Continuing Education
Oregon Graduate Institute of Science & Technology
PO Box 91000
Portland, OR  97291-1000
+1-503-690-1259
+1-503-690-1686 (fax)
e-mail: continuinged@admin.ogi.edu
WWW home page: http://www.ogi.edu/Continuing-Ed/

