- Home
- Conferences
- Conference Proceedings
- Conferences
Second EAGE Workshop on High Performance Computing for Upstream
- Conference date: September 13, 2015
- Location: Dubai, United Arab Emirates
- Published: 13 September 2015
1 - 20 of 21 results
-
-
High Level Methodologies for Performance Characterization and Prediction
Authors A. Farjallah, C. Andreolli, T. Guillet, O. Awile and P. ThierrySummaryDesigning a supercomputer to satisfy the needs of future applications and workloads within a given power envelope while considering the rapidly evolving high technology environment is not an easy task.
In this frame, the prediction of performance can be used for many different needs from designing a new micro architecture or memory hierarchy to defining the interconnection and storage of the future.
Several tools already exist for analyzing the different aspects of application characterization and performance prediction. They have, however, so far rarely been connected due to their different precisions and resolutions.
Based on a first approximation of the application behavior, mostly involving memory bandwidth (BW) and floating point (FP) demands, we can demonstrate that realistic performance predictions can be easily obtained at the application level for single and multiple node configurations.
-
-
-
Simulating Multiple Realizations of Very Large Reservoir Models using MPI and GPU Acceleration
Authors K. Esler, V. Natoli, D. Dembeck, K. Mukundakrishnan, J. Shumway, B. Suchoski, Y.P. Zhang, J. Gilman, O. Angola and H.Z. MengSummaryDemands for higher fidelity and predictive capability from reservoir simulation have resulted in the development of reservoir simulation models in the hundreds of millions of cells. When combined with the need to quickly assess many realizations of such models, the cluster size and power requirements can become impractical. GPUs provide an extremely dense and efficient computational platform that can help reduce the required hardware footprint and power envelope. We discuss our attempt to efficiently scale reservoir simulation to the GPU cluster using a combination of CUDA and MPI. We describe several potential bottlenecks on performance and our strategies to them. We give examples on synthetic and real-field models, assessing both performance and accuracy. Finally, we discuss the surrounding workflow challenges which must be met to make best use of an extremely high-performance simulator.
-
-
-
Programming Perspectives for Pre-exascale Systems
Authors F. Courteille and J. EatonSummaryReservoir simulation of large scale projects is becoming increasingly complex, requiring more than simple black oil models and vertical well models to capture the behaviour of unconventional, fractured and highly heterogeneous production zones. Nvidia provides an array of accelerated linear algebra libraries to deal with the equations that must be solved in these situations. Accelerating sparse linear algebra on the latest GPU architectures has real potential for performance gains of hundreds of percent over carefully tuned multi-core CPU-only implementations, but at what cost in complexity? This talk will address the programming approaches needed to utilize GPUs at scale for today’s most challenging problems, and give a glimpse of the path forward to pre-exascale applications.
-
-
-
Supporting the Scheduling over Local HPC and Cloud Platforms: An FWI Case Study
Authors A.P.D. Binotto, L.P. Tizzei, K. Mantripragada and M.A.S. NettoSummaryFull Waveform Inversion (FWI) is a mathematically and computationally challenging inverse problem of finding a quantitative rock-property description of the subsurface to match observed seismogram data. The inversion employs a forward model, which relates the subsurface to the observed seismograms. FWI and other seismic applications require High Performance Computing (HPC) to simulate the dynamics of such complex models. Not a long ago, companies, research institutes, and universities used to acquire clusters of computers to maintain on-premise. Recently, cloud computing has become an alternative posing a challenge to the end-users, who have to decide whether they should execute their applications: on their local clusters or burst them to a remote cloud provider. In this paper, we present a decision support method to choose the correct environment considering trade-offs, such as resource costs, performance, and availability on such heterogeneous execution platforms. We evaluated the system using our FWI application and preliminary results indicate that users of HPC applications can benefit from such a cloud advisory system to reduce costs, turnaround times, and even boost local platforms.
-
-
-
Towards Fast Reverse Time Migration Kernels using Multi-threaded Wavefront Diamond Tiling
More LessSummaryToday’s high-end multicore systems are characterized by a deep memory hierarchy, i.e., several levels of local and shared caches, with limited size and bandwidth per core. The ever-increasing gap between the processor and memory speed will further exacerbate the problem and has lead the scientific community to revisit numerical software implementations to better suit the underlying memory subsystem for performance (data reuse) as well as energy efficiency (data locality). The authors propose a novel multithreaded wavefront diamond blocking (MWD) implementation in the context of stencil computations, which represents the core operation for seismic imaging in oil industry. The stencil diamond formulation introduces temporal blocking for high data reuse in the upper cache levels. The wavefront optimization technique ensures data locality by allowing multiple threads to share common adjacent point stencil. Therefore, MWD is able to take up the aforementioned challenges by alleviating the cache size limitation and releasing pressure from the memory bandwidth. Performance comparisons are shown against the optimized 25-point stencil standard seismic imaging scheme using spatial and temporal blocking and demonstrate the effectiveness of MWD.
-
-
-
Heterogeneous Architecture Library
Authors P. Souza, C.J. Newburn and L. BorgesSummaryToday’s platforms are becoming increasingly heterogeneous. A given platform may have many different computing elements in it: CPUs, coprocessors and GPUs of various kinds. And over time, the platforms on which seismic codes run may change, such that even if a given platform doesn’t have so much variety, the same code base needs to be portable across a wide variety of targets. How can seismic applications support this kind of portability?
One answer is the heterogeneous architecture library that Petrobras has created.
This library has been in production use since early 2010 by Petrobras for RTM. It has three back ends: CUDA, OpenCL and regular CPUs. A new backend is in deployment to support Intel® hStreams library, which provides a streaming abstraction for heterogeneous platforms. The main RTM application and code that manages various kinds of devices is all portable.
-
-
-
Impact of CPU Choice on Large Scale Reservoir Simulations
Authors A. Alturki, M. Baddourah and O. SaadoonSummaryIn this paper, we will discuss how to achieve the optimal price/performance and the optimal price/energy values in a complex compute environment like reservoir simulation. We will also discuss our experimentation with different CPU models and the impact of our selection on the users’ experience and data center electric power consumption using thevaried CPU models of the same generation with different number of cores and different clock speeds. For example, Intel® Ivy Bridge CPU have different SKUs with different power consumption, number of cores, threads, clock speeds, etc. Buying the highest clock speed with the highest number of cores may not be the optimal choice for a large system in a computer center. Some applications will not scale well as the number of cores increases keeping the same number of memory channels. Some applications also will not benefit from high clock speed.
-
-
-
Improving Cluster Production with Allinea’s Tools
More LessSummaryToday science and engineering projects continue to gain tremendous value from the parallel capability offered by new and existing hardware architectures. However, maintaining this trend whilst in tandem keeping energy consumption low is a technological challenge - especially when it comes to applications. During this technical session, Allinea will demonstrate how the expansion of our product portfolio is enabling our customers in the Oil & Gas market to address these challenges head on. Today, Allinea is the leading company in the development of empowering, informative and intuitive software tools designed to help you maximize the value of your HPC investments.
-
-
-
Using Reduced Order Modeling Algorithm for Reverse Time Migration
SummaryWave field simulation are needed in many algorithms in seismic data processing. Reverse time migration and full waveform inversion is a couple of this kind of algorithms. Simulation of wave propagation in industrial scale requires huge computational cost. Therefore, an extreme computational cost is the main challenge of reverse time migration and full waveform inversion. This extreme computational cost leads to these algorithms becomes less applicable in industrial scale.
This paper presents the reduced order modeling technique as a very efficient method for reduction of computation in the simulation of wave propagation. We consider the efficiency of this algorithm in the computational cost of the reverse time migration. A small part of 2D SEG/EAGE synthetic model is used for this purpose, and the performance of reduced order modeling technique is compared with application of the conventional finite element algorithm.
The obtained results demonstrate the capability, and accuracy of the reduced order modeling technique for simulation of wave propagation for some cases like reverse time migration that needs many simulations. The efficiency of this method is increased while the number of simulations or time of the simulations is increased.
-
-
-
Scalable and Robust BDDC Preconditioners for Reservoir and Electromagnetics Modeling
Authors S. Zampini, O.B. Widlund and D.E. KeyesSummaryThe purpose of the study is to show the effectiveness of recent algorithmic advances in Balancing Domain Decomposition by Constraints (BDDC) preconditioners for the solution of elliptic PDEs with highly heterogeneous coefficients, and discretized by means of the finite element method. Applications to large linear systems generated by div- and curl- conforming finite elements discretizations commonly arising in the contexts of modelling reservoirs and electromagnetics will be presented.
-
-
-
GeoInv3D: A Scalable Forward Modeling Framework for Full Waveform Inversion Problem
Authors L. Combe, R. Brossier, L. Metivier, V. Monteiller, S. Operto and J. VirieuxSummaryFull Waveform Inversion is a high-resolution imaging method that has raised considerable interest in the oil industry since a decade. It has been mainly used as a P-wave velocity modeling building tool, while extension to multi-parameter elastic anisotropic reconstruction is now an active field of research. In this context, designing computationally-efficient and versatile FWI softwares, in which different numerical schemes for seismic modeling, wave physics and optimization algorithms can be interfaced easily, is of crucial interest. In this study we describe an object oriented framework based on the definition of abstract interfaces of components involved in frequency or time domain FWI workflow. We point out the numerical modeling of the full seismic wavefield as the central component and that the proposed design is suitable to interface different modeling approaches involving different discretisations and physics with the optimization kernel. Then we demonstrate the capability of the framework to preserve parallel scalability and efficiency of kernels even in an object oriented programming context. Lastly we present a concrete realisation of this abstract framework via an application of an acoustic 3D time-domain FWI on the Valhall field using a staggered grid finite difference scheme.
-
-
-
Advanced Algebraic Multigrid Solvers for Subsurface Flow Simulation
More LessSummaryIn this research we are particularly interested in extending the robustness of multigrid solvers to encounter complex systems related to subsurface reservoir applications for flow problems in porous media. In many cases, the step for solving the pressure filed in subsurface flow simulation becomes a bottleneck for the performance of the simulator. For solving large sparse linear system arising from MPFA discretization, we choose multigrid methods as the linear solver. The possible difficulties and issues will be addressed and the corresponding remedies will be studied. As the multigrid methods are used as the linear solver, the simulator can be parallelized (although not trivial) and the high-resolution simulation become feasible, the ultimately goal which we desire to achieve.
-
-
-
Efficient Sparse Matrix-vector Multiplication for Geophysical Electromagnetic Codes on Xeon Phi Coprocessors
Authors S. Rodriguez Bernabeu, V. Puzyrev, M. Hanzich and S. FernandezSummarySparse matrix-vector multiplication (spMV) is a fundamental building block of iterative solvers in many scientific applications. spMV is known to perform poorly in modern processors due to excessive pressure over the memory system, overhead of irregular memory accesses and load imbalance due to non-uniform matrix structures. Achieving higher performance requires taking advantage of the features of the matrix and choosing the right sparse storage format to better exploit the target architecture. In this paper we describe an efficient spMV for geophysical electromagnetic simulations on Intel Xeon Phi coprocessors. The unique features of the matrix resulting from electromagnetic problems make it hard to handle with classical sparse storage formats. We propose a matrix decomposition and a tuned storage format that obtains a 4.13x performance improvement over the optimized CSR spMV kernel on Xeon Phi coprocessors.
-
-
-
An Algorithm for the Automated Generation of MPI Communication Patterns
By A. St-CyrSummaryLarge scale partial differential equation (PDE) solvers use some form of message passing to handle communications between compute nodes ( Gropp, Hoefler, Thakur, & Lusk, 2014 ). Message passing can be explicitly handled by the application developer or implicitly by the programming language. The prime example of explicit message passing is the Message Passing Interface (MPI) while Chapel ( Chamberlain, 2007 ) and UPC ( Draper, 1999 ) are examples of the PGAS programming model which make the communications implicit.
It could be argued that the best performing parallel applications are the ones using carefully crafted explicit message passing. The principal reason resides in the message passing implementation being as efficient as possible for a very specific problem. The flexibility is however lost if some changes are required in the fundamentals of the algorithm. The converse is true for implicit message passing: higher flexibility but penultimate performance unachievable ( Cristian Coarfa, 2005 ).
For a developer, both approaches solve a different problem. In applications which are using MPI, usually a handful of calls to the message passing API calls are present in the whole application. Indeed, most of the grunt work resides in finding how the messages are transacted between processes, setting buffers and ways to fill or empty them.
In this work, I propose a library to help the PDE application developer to perform those low level tasks in an automated fashion. The library is also useful for refactoring existing PDE codes for use on supercomputers. By pairing a spatial hashing function with minimal geometrical knowledge extracted from the application, the communication pattern is discovered in P log (N/P) operations where N is the global number of hashes and P the number of processes. This pattern is then used to create the actual buffers the developer needs and handles all blocking, non-blocking and one sided communications.
-
-
-
Optimizing Fully Anisotropic Elastic Propagation on Intel Xeon Phi Coprocessors
Authors D. Caballero, A. Farres, A. Duran, M. Hanzich, S. Fernández and X. MartorellSummaryThe current trend in seismic imaging aims at using an improved physical model, considering that the Earth is not rigid but an elastic body. This new model takes simulations closer to the real physics of the problem, at the cost of raising the needed computational resources. On the hardware front, recently developed high-performing devices, called accelerators or co-processors, have shown that can outperform their general purpose counterparts by orders of magnitude in terms of performance per watt. These new alternatives may then provide the necessary resources for making possible to represent complex wave physics in a reasonable time. There might be, however, a penalty associated to the usage of such devices, as some portion of the simulation code might need some re-writing or new optimization strategies explored and applied. In this work we will show some optimization strategies evaluated and applied to an elastic propagator based on a Fully Staggered Grid, running on the Intel® Xeon Phi(TM) coprocessor. It is important to remark, that the propagator is able to reproduce elastic wave propagation, even for an arbitrary anisotropy.
-
-
-
Reservoir Simulation History Matching with Fully Integrated HPC Architecture
Authors I. C. Pallister, T. Dodd and C. OzgenSummaryReservoir simulation history matching is a computationally expensive process in which model parameters are varied according to guiding algorithms and simulation runs performed until the results form an acceptable match to known reservoir dynamic history. HPC has had more than a decade of successful application by speeding up each individual reservoir simulation run. In this work, a software system is described, with an example application, in which HPC application is extended into the accompanying history matching algorithms, allowing a more thorough, multi-solution approach that might otherwise not be achievable for a given time frame.
-
-
-
Encapsulating HPC System Level Complexities via Software Framework
By S. DasSummaryLeveraging petascale systems effectively and efficiently is not a trivial challenge. A good harmony between system architecture, algorithms, and application architecture is the key to success. In addition, when complex algorithms build up a workflow it becomes ever more challenging to get optimal execution. Successful business application requires not only the innovation in algorithm space but also efficient execution leveraging HPC environment. Positioning these complex technologies in a robust, scalable commercial usable form makes the business successful.
Here is a case study where different technologies are integrated in a software framework leveraging HPC environment. Also, leveraging a HPC system introduces additional complexities of achieving high level of parallelism. In the HPC group, as we started commercializing these technologies, we added them to a software framework to reduce end users complexities, improve technology uptake and effective HPC resource utilization.
-
-
-
OpenVec Portable SIMD Intrinsics
Authors P. Souza, L. Borges, C. Andreolli and P. ThierrySummaryToday, the widest vector units found on a mass production processor are in the Intel Xeon Phi coprocessor with its 512-bit vector registers. These vector units have a theoretical single precision peak performance gain of 16x for single flop operations. In practice, due to limiting factors like memory access latency, I/O demand, serial code sections, and global synchronization, the real performance improvement number is typically much lower.
In this work, we present a solution to take advantage of vector units across various processor SIMD architectures with a single, portable source code. This is accomplished by just adding a vector type and hardware intrinsics support to C/C++ language through a header file that is compatible with gcc and commercially available compilers in general. We hide different hardware/compiler feature sets under a common portable programming syntax. In addition, the implementation supports a scalar backend alternative to target unknown architectures.
This implementation has been successfully demonstrated on multiple SIMD architectures including Intel SSE/AVX/AVX-512/IMCI, ARM NEON and IBM Power VSX using only a common header file to enable the compiler to generate highly optimized code with proper SIMD instructions for the given underlying architecture.
-
-
-
Full Waveform Inversion: A Modular Approach for Flexibility and Speed
By G. ClarkSummaryFull waveform inversion (FWI) is an increasingly popular algorithm for automatically improving earth models in seismic exploration. The method is incredibly computationally expensive because the earth model is built up with many iterations of forward modelling and reverse time migration (RTM). Current research is helping to reduce the number of iterations and improve the robustness of convergence, but the prohibitive cost of FWI makes running real world datasets impractical for many researchers. Furthermore, information regarding the geological region can be used to accelerate convergence. Therefore, an FWI implementation must be both high performance, and flexible. This commercial case study outlines our approach to minimizing the cost of a flexible implementation.
The key feature of our approach is that we divide the algorithm into low cost and high cost steps. Fortunately, the low cost steps in the FWI algorithm are also the ones subject to the most research. The extremely high cost full wave modelling steps are comparatively consistent between variations of the FWI algorithm. Therefore, the full wave modelling is hardware optimized and all other steps can be quickly rewritten in Python.
-
-
-
Using Heterogeneous Nodes for In-Situ Visualization
By P. MessmerSummaryRunning complex simulations is only one of the challenges in the computational geophysics workflow. The massive amounts of data produced by simulations need to be analyzed and visualized in order to extract scientific understanding from the computations. Following the traditional post-processing workflow is increasingly showing its limitations due to the growing discrepancy between compute- and IO performance in HPC systems. Data should therefore be analyzed and visualized where it was generated, and ideally even while it is being generated. In this talk, we will present different techniques for in-situ visualization to remote rendering that help computational scientist to avoid moving large datasets between compute and analysis and how GPUs can be used in this process.
-