Phil is an internationally recognised speaker and friend with expertise that lies between cloud-native software and data science. He is a strong believer that we should be avoiding the same mistakes caused by the disintegration of dev and ops in data science. And that we need a new breed of engineers that span data-dev-ops.
One of Phil’s businesses, Winder Research, provides cloud-native data science consultancy, training and development. He also runs the website https://TrainingDataScience.com which provides free and paid data science training courses and workshops to help the next generation of engineering-focused data scientists.
Phil’s talks focus on being fun and inclusive but also deliver value to businesses through knowledge transfer. Make sure you find and talk to him; ask him about brewing beer!
The term Serverless has become synonymous with AWS Lambda. Decoupling from AWS has two benefits; it avoids lock in and improves flexibility.
The misnomer Serverless, is a set of techniques and technologies that abstract away the underlying hardware completely. Obviously these functions still run on “servers” somewhere, but the point is we don’t care. Developers only need to provide code as a function. Functions are then used or consumed via an API, usually REST, but also through message bus technologies (Kafka, Kinesis, Nats, SQS, etc.).
This provides a comparison and recommendation for a Serverless framework for the Kubernetes platform.
Infrastructure as code has become a paradigm, but infrastructure scripts are often written and run only once. This works for simplistic infrastructure requirements (e.g. k8s deployments). But when there is a requirement for more varied infrastructure or greater resiliency then testing infrastructure code becomes a requirement. This blog post introduces a current project that has found tools and patterns to deal with this problem.
Technology choices in data-driven products are, as you would expect, largely directed by the type and amount of data. The first and most crucial decision to make is whether the data will be processed in a batch or streaming fashion.
Data Science has become an important part of any business because it provides a competitive advantage. Very early on, Amazon’s data on book purchases allowed them to deliver personalised recommendations whilst customers were browsing their site. Their main competitor in the US at the time was Borders, who mainly operated in physical stores. This physicality prevented them from seamlessly providing customers with personalised recommendations . This example highlights how strategic business decisions and data science are inextricably linked.
A current project required a list of Amazon Machine Images (AMIs) for all regions for use in terraform. I couldn’t find a script to do this for me, so here you will find one that uses the aws cli, jq and a bit of Bash.
https://prometheus.io is an open source time series database that focuses on capturing measurements and exposing them via an API. I love Prometheus because it it so simple; it’s minimalism is its greatest feature. It achieves this by pulling metrics from instrumented applications, not pulling like many of its competitors. In other words Prometheus “scrapes” the metrics from the application.
This means that it works very well in a distributed, cloud-native environment. All of the services are unburdened by load on the monitoring system. This has knock on effects meaning that HA is supported through simple duplication and scaling is supported through segmentation.
What do you mean by monitoring? Why do you need it? What are the real needs and are you monitoring them? Ask yourself these questions. Can you answer them? If not, you’re probably doing monitoring wrong.
This post asks the basic question. What is monitoring? How does it compare to logging and tracing? Let’s find out.