Call for Talks Opens
Tuesday, May 30, 2017
Call for Talks Closes at 11:59 PM (Pacific)
Friday, September 1, 2017
Final Notifications Sent By
Friday, September 15, 2017
If you have any questions or would like to reach the GTC Content Team, please contact us.
Submissions must be about your work using GPUs. It can be completed work or work currently in progress. Submissions must provide actual or expected results/accomplishments and must demonstrate a significant innovation or improvement using GPU computing.
Talk Review Process
The GTC Content Committee will review, rate, and select submissions based on:
If your submission is focused on a service, technology, or a new product your company is offering, please contact us for information on sponsored session opportunities.
WHAT TO SUBMIT?
You will be required to provide the following information in your submission:
WHAT MAKES A SUCCESSFUL SPEAKER SUBMISSION?
EXAMPLE OF STELLAR SUBMISSION
How to Write a Great Session Title
Your session title is what gets the reader to read the first sentence of the session description. Clearly articulated session titles that have clear learning objectives along with a dash of pizzazz greatly increase the chance that conference attendees will attend the session.
The title should:
How to Write a Great Session Description
The first sentence should describe what the attendee can expect to learn from your presentation (e.g. "Learn about extensions that enable efficient use of PGAS models.") Avoid background your audience already knows (e.g., "Originally designed as graphics accelerators, GPUs have evolved into powerful parallel processors capable of accelerating many compute-intensive applications."). Subsequent sentences should offer more details about what will be covered and why the reader should attend. In general, go for clarity over cleverness.
The description should begin with an action word such as:
Select the correct duration for your submission:
GTC is soliciting submissions that provide concrete examples and contain both practical and theoretical information.
Examples of GTC submission descriptions:
Session Title: Faster, Cheaper, Better – Hybridization of Linear Algebra for GPUs
Session Description: Learn how to develop faster, cheaper and better linear algebra software for GPUs through a hybridization methodology that is built on (1) Representing linear algebra algorithms as directed acyclic graphs where nodes correspond to tasks and edges to dependencies among them, and (2) Scheduling the execution of the tasks over hybrid architectures of GPUs and multicore. Examples will be given using MAGMA, a new generation of linear algebra libraries that extends the sequential LAPACK-style algorithms to the highly parallel GPU and multicore heterogeneous architectures.
Session Title: Analysis-Driven Performance Optimization: A New Approach to Determining Performance Thresholds
Session Description: The goal of this session is to demystify performance optimization by transforming it into an analysis-driven process. There are three fundamental limiters to kernel performance: instruction throughput, memory throughput, and latency. In this session we will describe: how to use profiling tools and source code instrumentation to assess the significance of each limiter; what optimizations to apply for each limiter; how to determine when hardware limits are reached. Concepts will be illustrated with some examples and are equally applicable to both CUDA and OpenCL development. It is assumed that registrants are already familiar with the fundamental optimization techniques.
Extended Abstract Example
Our runtime designs focus on performance while ensuring truly one-sided progress of communication which is critical for PGAS models. These designs demonstrate significant potential for the users of PGAS models as well as hybrid MPI+PGAS models (as available in MVAPICH2-X) to take advantage of NVIDIA GPUs. The extensions in OpenSHMEM, coupled with an optimized runtime, improve the latency of GPU-GPU shmem-getmem operation by up to 90%, 45% and 42%, for intra-IOH (I/O Hub), inter-IOH and inter-node configurations. The proposed extensions and the associated runtime reduces the latency of 4 bytes Put to 2.7us from GPU-to-GPU. The proposed enhancements improve the performance of Stencil2D kernel by 65% on a cluster of 192 GPUs and the performance of BFS kernel by 12% on a cluster of 96 GPUs. As part of the talk, we will use benchmarks from the popular OSU micro-benchmark suite and application kernels to demonstrate how to use the new extensions and extract performance benefits from the associated runtime designs.