Probabilistic computation holds too much promise for it to be stifled by playing zero sum games with data.
Learn how to add big data to your organization's business processes.
A deep dive into Uber's engineering effort to optimize geospatial queries in Presto.
The O'Reilly Podcast: Dave Cassel on building a unified enterprise database to store and query any type of data.
Learn about some of the common issues you will encounter when developing algorithms for a modern anomaly detection system.
Learn the difference between live and streaming anomaly detection systems and how to address the challenges different data velocities pose.
See examples of the many traps you can fall into if you use off-the-shelf anomaly detection techniques.
The O’Reilly Data Show Podcast: Ion Stoica and Matei Zaharia explore the rich ecosystem of analytic tools around Apache Spark.
Learn to identify problems that may indicate data team dysfunction.
Learn some of the benefits of using real-time processing of data for some use cases.
6 lessons learned to get a quick start on productivity.
A look at the Layer API, TFLearn, and Keras.
Applications of CNNs for real-time image classification in the enterprise.
Building a production-grade real-time image classification system.
Why machine learning needs real-time data infrastructure.
The O’Reilly Data Show Podcast: Kenneth Stanley on neuroevolution and other principled ways of exploring the world without an objective.
Lorena Barba explores how we can build a capacity to support reproducible research into the design of tools like Jupyter.
Nadia Eghbal explores how money can support open source development without changing its incentives.
William Merchan shares fundamental trends driving the adoption of Jupyter and its deployment in large organizations.
Andrew Odewahn explains how O’Reilly Media applied the Jupyter architecture to create the next generation of technical content.
Brett Cannon looks at how healthy expectations can maintain a balanced relationship between open source users and project maintainers.
Jeremy Freeman describes a growing ecosystem of scientific solutions, many of which involve Jupyter.
Fernando Perez explains how Project Jupyter fits into a vision of collaborative development of tools that are applicable to research, education, and industry.
Labz ‘N Da Wild 2.0: Teaching signal and data processing at scale using Jupyter notebooks in the cloud
Demba Ba explains how he designed and implemented two Harvard courses that use cloud-based Jupyter notebooks.
Watch highlights covering Jupyter notebooks, data management, collaborative data science, and more. From JupyterCon in New York 2017.
Peter Wang talks about the co-evolution of Jupyter and Anaconda and looks at what’s needed to sustain an open and innovative future.
Rachel Thomas shares her experience using Jupyter notebooks to help students understand deep learning through experimentation.
Wes McKinney makes the case for a shared infrastructure for data science.
Recent trends in practical use and a discussion of key bottlenecks in supervised machine learning.
The toughest part of machine learning with Spark isn't what you think it is.
Human-guided ML pipelines for data unification and cleaning might be the only way to provide complete and trustworthy data sets for effective analytics.
The O’Reilly Data Show Podcast: Robert Nishihara and Philipp Moritz on a new framework for reinforcement learning and AI applications.
As a data professional, you are invited to share your valuable insights. Help us gain insight into the demographics, work environments, tools, and compensation of practitioners in our growing field. All responses are reported in aggregate to assure your anonymity. The survey will require approximately 5-10 minutes to complete.
Using a single cloud provider is a thing of the past.
Practical questions to help you make a decision.
Tamr’s Eliot Knudsen on algorithms that work alongside human experts.
Jupyter in education, Jupyter-in-the-loop, and reproducibility in science.
A step-by-step tutorial on how to install and run JupyterHub on gcloud.
The O’Reilly Data Show Podcast: Soumith Chintala on building a worthy successor to Torch and on deep learning within Facebook.
A multi-model approach to transforming data from a liability to an asset.
A framework for moving from data to wisdom.
Learn how to use PixieDust in Jupyter Notebooks to create quick, easy, and powerful visualizations for exploring your data.
Authors Julia Silge and David Robinson discuss the power of tidy data principles, sentiment lexicons, and what they're up to at Stack Overflow.
Recapping winners of the Strata San Jose Startup Showcase.
The O’Reilly Data Show Podcast: Evangelos Simoudis on next-generation mobility services.
Dimitar Filev on bringing cutting-edge computational intelligence to cars and the factories that build them.
It’s pretty easy to grasp the concept, but it’s a tricky algorithm to implement.
Stewart Rogers on building and managing products with embedded analytics.
A new architecture for today’s data-rich modern applications.
The O’Reilly Data Show Podcast: Pinterest data scientist Grace Huang on lessons learned in the course of machine learning product launches.
The O’Reilly Data Show Podcast: Naveen Rao on emerging hardware and software infrastructure for AI.
An algorithm that generates Bézier curves using an increasing number of control points.
Integrate and access any form of data using a multi-model database.
The O’Reilly Data Show Podcast: Michael Freedman on TimescaleDB and scaling SQL for time-series.
An algorithm for rubber-banding random points.
Jupyter for sharing and prototyping, Jupyter in academia, and FAIR principles.
Exploring a reference architecture solution.
To succeed in digital transformation, businesses need to adopt tools that enable collaboration, sharing, and rapid deployment. Jupyter fits that bill.
Karley Yoder on what GE Healthcare has learned as it embraces artificial intelligence.
The O’Reilly Data Show Podcast: Geoffrey Bradway on building a trading system that synthesizes many different models.