Skip to content
This repository has been archived by the owner on Jan 1, 2024. It is now read-only.

Collaborate on experimentation library concepts #395

Open
jejacks0n opened this issue Dec 15, 2022 · 3 comments
Open

Collaborate on experimentation library concepts #395

jejacks0n opened this issue Dec 15, 2022 · 3 comments

Comments

@jejacks0n
Copy link

Hi! I just stumbled across this project and am bummed that I didn't know about it before I started my own project (intentionally not linked here,) so I wanted to open an issue to see if there's any interest in figuring out how we might collaborate on the concepts?

I think they're approaching the challenge from slightly different perspectives, so I see an opportunity to consolidate a couple things, or at least talk about how one might take advantage of the other.

Background: I wrote a library while working at GitLab that allowed me to explore what worked well and what didn't work well from the engineers that used it. That library was initially based on the scientist gem, which is also really cool. I took what I learned on that project and applied it to the ActiveExperiment library. Please check it out and let's talk about what we might collaborate on if that's interesting.

@bensheldon
Copy link
Collaborator

@jejacks0n so happy you opened up this issue! I've been helping maintain Vanity for a minute :-)

I'm really excited about seeing a slimmer, more contemporary, Rails-specific product/design experimentation tool. I can list out the things I've thought of (I did peek out your library, but I want to keep this freeform in the spirit of your comment):

  • Rails-specific. I'm excited to see other Ruby frameworks outside of Rails, but being framework agnostic is really challenging and Rails has a big footprint.
  • ActiveRecord-native. I like that Vanity uses the database (especially necessary if doing experiments in background jobs or rake tasks outside a web requests). I think persistence is a necessity, especially in my experience collaborating with data scientists.
  • Dashboard. Vanity has a great dashboard which makes eyeballing experiment progress easy.
  • Multivariate / multi-arm bandit. I think Vanity is pretty unique at doing this. Keep.
  • Support complex goals, like streaks and DAU/MAU ratios. I've spent a lot of time hacking around Vanity to meet product/datascience requests and I think these use-cases are important.
  • Calculating validity. Again, surfacing that sample sizes are valid. I'm not a data-scientist 😁
  • Performance. Vanity has N+1 challenges to be fixed.

I'll say from my experience with using Vanity, and maintaining GoodJob, that being opinionated about how to use it, and being judicious at declining use-cases that fall outside those opinions, is important.

Hopefully it's obvious I'm excited 👍🏻

@jejacks0n
Copy link
Author

Yeah! I saw that you were maintaining it @bensheldon, so thanks for responding!

I agree with everything you've written, and it might be worth explaining my mental model, which is simplified into three high level concerns:

  1. Rollout: includes controlling if the experiment is active and things like variant weights.
  2. Reporting: includes where and what data is stored, and can generate knowledge (summaries, fancy graphs, etc.)
  3. Refinement: includes an understanding of reporting, and can drive changes back into Rollout.

The rollout layer should have an interface and/or api because deploying code for toggling a feature flag, starting/stopping an experiment or adjusting variant weights isn't optimal.

The reporting layer can be as simple as logging, and might do multiple things, like increment a count and push complex event data into some local or remote datastore.

The refinement layer can be automated or can happen through manual intervention -- using knowledge gained from the reporting layer. Refinements can be as simple as just not running a specific variant anymore, because an exception is raised within it, and I've often considered how this can be automated using complex details within the reporting layer, or by simply simplistic rollups/counts.

The way ActiveExperiment attempts to handle all of that is to provide the basic (hopefully elegant) functionality, and outsource the more advanced stuff to adapter layers. Given your other projects this probably makes sense, and ActiveJob was a source of inspiration in how ActiveExperiment was written. Take the Unleash adapter for example, where the complex task of rollout, variant definition, and counting impressions is handled for us.

I want to continue to build out handling of these three concerns, and see Vanity as a good example that. I'm still kicking around what's worth doing myself, what's worth doing as adapter layers, and what's worth collaborating on. So are there things you think would be worth sharing or extracting from Vanity? Things that you'd like to remove and have in a different project that's more reusable? Is it worth me writing an adapter layer for Vanity, even though it's already providing a lot of the same kinds of concepts?

And so I can understand your experience better, what's your mental model of all the concerns in experimentation?

@bensheldon
Copy link
Collaborator

@jejacks0n ooh, that sounds really interesting. I think adapter layers would solve some of the various challenges I've had with Vanity.

Having thought about "what is my mental model" all day yesterday, I think I came to the conclusion that my thoughts are somewhat orthogonal to experiments:

  • Metrics/goals/events/funnel/product-performance. "Metrics" are what Vanity calls the goal conditions, which Vanity allows tracking independently of experiments (the "Vanity" of "vanity metrics"). I've always tried to use some form of Pirate Metrics in my products/features (I tend to operate as a "Growth Hacker" despite my loathe of that term). Ideally, my dream tool would be that the metrics/funnel of the product would be instrumented prior to launch (e.g. in my bucket of good practice: tests, CI/CD ... metrics). And I wrote "funnel" because I think there is a story in which these events are somewhat complicated and non-atomic, like retention/usage. I would want the API of this to drive the questions of like "what do we want people to do inside of the thing we're building and how will we know?"
  • Identity/identification/re-identification. Who is using the thing, sorta. I think about the flow of someone between session/cookies and across account creation, and also making sure it works across multiple devices as well. Also, some metrics/goals operate at a multi-user level (organization, team). Again, these are non-atomic things (e.g. "Within a given week, for each team, we want to see 50%+ of members using feature X").
  • And lastly, experiments. I'll be honest, I don't think much about experiments because Vanity just of kind of works. It's "Experiment has a goal, it has variants, and in practice, it has participants". If I were going to re-build vanity, this would be the stuff (mathy/statistics stuff) that I'd simply extract.

So maybe that's helpful 🤷🏻 It's orthogonal the more I think about it.

So if I'm pie-in the skying, I want:

  • Instrument activity on the site (links and buttons and forms and scroll-depth/intersection-observer, and payment and email and feature usage, and first-time-on-site UTM stuff)
  • Roll up the data across users (teams/organizations) into complex freeform goal conditions.
  • (Experiments) be able to serve different variations to different people/teams/organizations
  • Regress over all of it, or some of it (statistically significance is hard). I realize it's probably spicy to not force pre-registering one's experiments, but I think there needs to both be able to guess at "why might these people be performing so much differently than everyone else?" as well as "let's see if we can validate the resulting hypothesis".

I think my principles/values on a lot of this stuff is "help small teams do big things" and "build for speed of learning", and "do less work, with more purpose behind the work that is done". Which probably lends itself less to a full-featured A/B testing framework, and more towards a tightly focused metrics framework that can do variations. I dunno, just noodling here 😄

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants