Building A Robust, Company-Wide Data Science Pipeline Using Programming Abstraction And Virtualization

N. Jones; K. Torbert

doi:10.3997/2214-4609.201803030

Building A Robust, Company-Wide Data Science Pipeline Using Programming Abstraction And Virtualization
Authors N. Jones¹, K. Torbert¹
View Affiliations Hide Affiliations

Affiliations: ¹ California Resources Corporation
Publisher: European Association of Geoscientists & Engineers
Source: Conference Proceedings, First EAGE/PESGB Workshop Machine Learning, Nov 2018, Volume 2018, p.1 - 4
DOI: https://doi.org/10.3997/2214-4609.201803030

Abstract

Summary

The oil and gas industry presents a challenging and exciting environment for data projects due to the size, complexity, and variability in formatting, type, and quality of the data collected. This environment makes delivering and maintaining a data science pipeline from source systems through to the end user an enormous challenge in many companies ( Scully et al. 2014 ). Many projects fail before any analytics can even applied to the data due to difficulties handling legacy systems, data silos, complex dependencies between data sources, and more. In other cases, data science projects can only advance in one area or division of a company because of differences in data handling despite having broad applicability through the company’s assets. This presentation will discuss California Resources Corporation’s new company-wide data analytics effort as a case study of how we have used technologies like data virtualization ( Van Der Lans, 2018 ) and programming architectural principles such as abstraction to tackle difficult data integration and data quality problems to construct a data science pipeline capable of delivering results company-wide. Many of these problems have frustrated multimillion dollar attempts to address them in the recent past.

Article metrics loading...

/content/papers/10.3997/2214-4609.201803030

2018-11-30

2024-04-19

From This Site

/content/papers/10.3997/2214-4609.201803030

dcterms_title,dcterms_subject,pub_keyword

-contentType:Journal -contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

References

Martin, R.
[2017] The database is a detail. Clean Architecture for Code: A Craftsman’s Guide to Software Structure and Design. 277–281. ISBN 978-0134494166.
[Google Scholar]
Van Der Lans, R.
[2018] Architecting the Multi-Purpose Data Lake with Data Virtualization. Denodo whitepapers.
[Google Scholar]
Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, , Chaudhary, V., and Young, M.
[2014] Machine Learning: The High Interest Credit Card of Technical Debt. Software Engineering for Machine Learning (NIPS 2014 Workshop).
[Google Scholar]

http://instance.metastore.ingenta.com/content/papers/10.3997/2214-4609.201803030

Building A Robust, Company-Wide Data Science Pipeline Using Programming Abstraction And Virtualization

Conference Proceedings 2018, 1 (2018); https://doi.org/10.3997/2214-4609.201803030

/content/papers/10.3997/2214-4609.201803030

Data & Media loading...

Most Cited This Month Most Cited RSS feed

- The natural combination of full and image‐based waveform inversion
  
  Authors Tariq Alkhalifah and Zedong Wu
- Poststack diffraction imaging using reverse‐time migration
  
  Authors Ilya Silvestrov, Reda Baina and Evgeny Landa
- Characterizing the effect of elastic interactions on the effective elastic properties of porous, cracked rocks
  
  Authors Luanxiao Zhao, Qiuliang Yao, De‐hua Han, Fuyong Yan and Mosab Nasser
- Fracture detection by Gaussian beam imaging of seismic data and image spectrum analysis
  
  Authors M.I. Protasov, G.V. Reshetova and V.A. Tcheverda
- Laboratory measurements of guided‐wave propagation within a fluid‐saturated fracture
  
  Authors Seiji Nakagawa, Shinichiro Nakashima and Valeri A. Korneev
More Less

Building A Robust, Company-Wide Data Science Pipeline Using Programming Abstraction And Virtualization

Abstract

From This Site

Most Read This Month

Most Cited This Month Most Cited RSS feed

The natural combination of full and image‐based waveform inversion

Poststack diffraction imaging using reverse‐time migration

Characterizing the effect of elastic interactions on the effective elastic properties of porous, cracked rocks

Fracture detection by Gaussian beam imaging of seismic data and image spectrum analysis

Laboratory measurements of guided‐wave propagation within a fluid‐saturated fracture