Confidence, by Spotifys omslagsbild
Confidence, by Spotify

Confidence, by Spotify

Teknik, information och internet

Stockholm, Stockholm County 1 637 följare

Experimentation at scale — built by Spotify.

Om oss

Confidence is an experimentation and feature flagging platform built by Spotify

Webbplats
confidence.spotify.com
Bransch
Teknik, information och internet
Företagsstorlek
5 001–10 000 anställda
Huvudkontor
Stockholm, Stockholm County

Uppdateringar

  • On June 16th, together with Operators & Friends, we’re bringing together a small group of founders & engineers to talk about how the best teams are using AI to build products. In discussion with: Kristian Lindwall - Senior Director of Engineering, Spotify Vijay Edupuganti - Agent Engineering Lead, Sierra From experimentation at scale to shipping AI agents in production — how the best teams validate, iterate, and build with confidence. 👉 Join the livestream: https://lnkd.in/enzpmUKb

    The best teams ship and validate at the speed of light. On 16 June in London, we're exploring what it takes to build and ship product in the AI era, with Confidence, by Spotify, Spotify, and Sierra. Sierra hit $100M ARR in just 7 quarters and reached a $15.8B valuation with their latest Series E, led by Tiger Global and GV (Google Ventures), alongside Benchmark, Sequoia Capital and Greenoaks. Their UK and London presence is growing fast. Spotify reached a >$100B market cap around their 20th anniversary by building products people love. AI has changed the rules of shipping and validating, so we'll hear directly from the people doing the work: Kristian Lindwall - Senior Director of Engineering at Spotify Vijay Edupuganti - Agent Engineering Lead at Sierra Join us in London to learn what it takes to win when code is commoditised.

    • Ingen alternativ bildtext i den här bilden
  • Confidence, by Spotify omdelade detta

    AI is writing the code. So who decides what ships? 🤔 Build velocity is becoming less of the constraint. Engineering teams at Anthropic ship 8x more code per quarter. And at Spotify, Honk merges 1,000 PRs every 10 days. But the gap in learning and validation may be widening. Data from Faros found that high AI adoption teams saw bugs per developer rise 54%, incidents rise 58%, and review times increase roughly 5×. More output means more surface area for risk, and less time for humans to review every change. Closing that gap requires experimentation infrastructure that scales with build velocity: guardrails that catch what humans can't review, and feedback loops that compound over time. The companies that pull ahead won't just build faster. They'll learn faster. Read the full piece on the Confidence blog to see why experimentation is becoming the operating system for AI-powered software development 👇 https://hubs.li/Q04kwvdR0 #Experimentation #AIEngineering #SpotifyEngineering

  • AI is writing the code. So who decides what ships? 🤔 Build velocity is becoming less of the constraint. Engineering teams at Anthropic ship 8x more code per quarter. And at Spotify, Honk merges 1,000 PRs every 10 days. But the gap in learning and validation may be widening. Data from Faros found that high AI adoption teams saw bugs per developer rise 54%, incidents rise 58%, and review times increase roughly 5×. More output means more surface area for risk, and less time for humans to review every change. Closing that gap requires experimentation infrastructure that scales with build velocity: guardrails that catch what humans can't review, and feedback loops that compound over time. The companies that pull ahead won't just build faster. They'll learn faster. Read the full piece on the Confidence blog to see why experimentation is becoming the operating system for AI-powered software development 👇 https://hubs.li/Q04kwvdR0 #Experimentation #AIEngineering #SpotifyEngineering

  • Confidence, by Spotify omdelade detta

    After an insanely popular first edition in Stockholm - we're bringing back Building and Shipping in the AI Era together with Confidence, by Spotify & Spotify on June 16th in London 🇬🇧 AI is changing how top team ship and we'll hear directly from builders doing the work: Kristian Lindwall - Senior Director of Engineering at Spotify Vijay Edupuganti - Agent Engineering Lead at Sierra We're keeping space for a small group of founders & engineers to join the discussion - last time we talked about with it takes to run code reviews on agent-written code, token-maxxing and more :) Apply in the comments!

    • Ingen alternativ bildtext i den här bilden
  • Confidence, by Spotify omdelade detta

    The best teams ship and validate at the speed of light. On 16 June in London, we're exploring what it takes to build and ship product in the AI era, with Confidence, by Spotify, Spotify, and Sierra. Sierra hit $100M ARR in just 7 quarters and reached a $15.8B valuation with their latest Series E, led by Tiger Global and GV (Google Ventures), alongside Benchmark, Sequoia Capital and Greenoaks. Their UK and London presence is growing fast. Spotify reached a >$100B market cap around their 20th anniversary by building products people love. AI has changed the rules of shipping and validating, so we'll hear directly from the people doing the work: Kristian Lindwall - Senior Director of Engineering at Spotify Vijay Edupuganti - Agent Engineering Lead at Sierra Join us in London to learn what it takes to win when code is commoditised.

    • Ingen alternativ bildtext i den här bilden
  • We're excited to announce a new experimentation miniseries on the NerdOut@Spotify podcast, hosted by two of our finest experimentation nerds 🚀 Luke Frake and Mårten Schultzberg. For the first episode, they sat down with Bhavic Patel, Data Director at Huel, to talk about building experimentation culture at a company where not every question can be answered with a traditional A/B test. Huel sells direct-to-consumer, on Amazon, in supermarkets, and on TikTok Shop — so measurement means synthetic control groups, geo-lift tests, difference-in-difference analysis, and a lot of cross-platform data stitching. They also get into why experiment relevancy matters more than velocity, how AI is accelerating experiment design, and what it takes to make experimentation everyone's job, not just the data team's. Whether you're scaling an experimentation program or just getting started, this one's worth a listen. 🎧 Listen now: https://hubs.li/Q04jbslh0 🧪🎧🧪🎧

  • Most sample size calculators are wrong for your experiment. 🎯 Not because the math is broken. Because the calculator was never connected to the analysis in the first place. If your platform uses sequential testing, corrects for multiple success metrics, or applies variance reduction, many calculators give you incorrect numbers by construction. You plan one experiment and run a completely different one. A team using always-valid inference but calculating sample size assuming a fixed-sample test can underestimate the required observations by a third or more. Add multiple metric corrections on top, and the errors compound quietly. At Spotify, experiment bandwidth is always scarce relative to the number of ideas we want to test. An underpowered experiment wastes a slot that could have gone to something conclusive. An overpowered one ties up traffic for weeks longer than necessary. That constraint is why we've spent more time than most getting sample size calculations right, and why we think the fix isn't being more careful with the inputs but using a calculator that actually knows how your experiment will be analyzed. In our latest blogpost, we discuss in detail what makes a good sample size calculator. Link in comments 👇

    • Ingen alternativ bildtext i den här bilden
  • 📚 Intro to Experimentation: why we need randomization The first lesson in the Confidence Bootcamp doesn't start with statistics. It starts with the IKEA effect. We overvalue things we built ourselves. We seek evidence that confirms what we already believe. These aren't character flaws. They're defaults. They apply equally to the PM reviewing user research and the exec reviewing the business case. Randomized controlled trials exist precisely because of this. When you randomly assign users to treatment and control, you remove your judgment from the assignment. The only systematic difference between the groups is the feature being tested. The lesson. We don't experiment because data is better than intuition. We experiment because randomization is the only reliable way to separate the effect of your feature from the effect of wanting it to work. Link to the Confidence Bootcamp lesson in comments 👇

    • Ingen alternativ bildtext i den här bilden
  • Confidence, by Spotify omdelade detta

    At Spotify, 12% of experiments win. Doesn't sound so great. But the actual number that matters is 64%. That's our learning rate: the fraction of experiments where teams made a confident ship, hold, or iterate decision based on the results. The other 36% were inconclusive, usually underpowered or too timidly implemented to produce a clear answer. The gap between 12% and 64% is where most of the ROI lives, and it's the part almost nobody tracks. The accounting error is almost universal: teams justify experimentation by counting winners. With industry win rates between 10% and 33%, that framing makes experimentation look like a terrible investment. An 88% "failure rate" doesn't survive a budget review. That framing misses two categories of value. - Prevented harm. At Spotify, 42% of experiments result in the team deciding not to ship after guardrail metrics flag regressions. Each of those catches a product degradation before it reaches 750M users. No dashboard counts that as a win, but a single prevented regression on a core metric can be worth months of incremental optimization. - Compounding knowledge. Teams with longer experimentation histories on a surface generate better hypotheses. On Spotify's mobile home screen, teams went from 250 experiments per year to 520. Better tooling and more teams contributed, but the teams that had been experimenting longest on that surface also wasted fewer tests on dead ends. If you run an experimentation program and need to make the case for it, three numbers work better than win rate alone: your rollback rate, your learning rate, and your informative experiments per team per quarter (experiments that reached adequate power and led to a clear ship or no-ship decision). The metric that matters most is how fast your organization learns. (Link to full post in comments)

  • Confidence, by Spotify omdelade detta

    When building gets cheaper, bad ideas stop dying in planning. That's the risk AI coding tools introduced to my team last year. We got roughly 30% more productive. We could build more, ship faster, and attempt ideas that would have been cut a year earlier because there wasn't time. The implicit filter was gone. When building was expensive, there wasn't time for everything, so teams debated, pressure-tested, and cut. That friction was painful, but it was doing real work. Now ideas that used to get killed early make it to production. They come with convincing prototypes and realistic implementations. Without rigorous validation, they ship. This is the judgment gap: the distance between how fast you can build and how fast you can validate what you built. There's good speed and bad speed. Good speed compresses learning cycles: you test more, discard faster, and compound what you learn into better decisions. Bad speed compresses only build cycles. You ship more without knowing what worked. At Spotify, 58 teams ran 520 experiments on mobile Home page alone in 2025. That's good speed. That took years of investing in experimentation infrastructure as seriously as we invest in engineering velocity. And that investment compounds. Every experiment builds institutional knowledge that makes the next decision faster and more accurate. If your team adopted AI coding tools but didn't invest equally in validation infrastructure, you're getting faster at building the wrong things. (link to blog in comments)

Anslutna sidor

Liknande sidor