Shopify, a leading provider of essential internet infrastructure for commerce, is relied on by millions of merchants worldwide to market and grow their retail businesses. This past Black Friday and Cyber Monday weekend alone, Shopify merchants achieved a record-breaking $6.3 billion in sales. Wendy Foster, Shopify’s Director of Engineering and Data Science, leads the charge in empowering this diverse set of merchants through the development and deployment of sophisticated AI systems.
Why did you first get into data science and machine learning?
Foster: It was a long and twisty road. Early on in my transition from humanities to technology, my passion was in game development – particularly building interactive and generative experiences more holistically. That interest took some permutations as I transitioned to an industry context versus a creative academic context. Ultimately, it led me into data science, which didn’t formally exist as a broadly-established discipline when I had my first role in that space – at the time, companies often struggled with how to put extra definers around analyst roles.
How would you describe your role at Shopify?
Foster: I see my role as supporting and enabling groups of incredibly talented data scientists, data engineers, and software developers to build the future for Shopify merchants’ insights, experiences, and operational workflows. On paper, I’m the Director of Data science and Engineering within the Commerce Intelligence group at Shopify but I see my actual role as an enabler and unblocker for great merchant-focused AI innovation.
What type of models is Shopify deploying into production and how are they helping merchants? Product Categorization looks like a pretty amazing application at Shopify, for example.
Foster: Machine learning models are used broadly across Shopify to empower merchants. Product categorization, which one of the teams in my group works on, is one way we support merchant workflows. With product categorization, the goal is to reduce the operational toil that can be associated with defining product type in a standardized way for merchant inventories.
It’s worth underlining that reduced toil and enhanced decision support, because those are the key outcomes that we are always striving for across Shopify with any machine learning-driven feature development. My team’s principal mission and vision statement always has that merchant lens – we want to make the things that are hard for merchants today easy.
Product taxonomy is a small but powerful example. A merchant being able to automatically define product type can have an incredible impact on interoperability for merchants between sales or marketplace channels as well as discoverability for their products to potential customers. In terms of process, we automatically suggest the appropriate standard type for a product, then a merchant can accept or reject the suggestion or enter a custom type if a suggestion is incorrect or they want to define it a different way. By knowing in advance a standard taxonomic way to define their product – such as for store search, SEO, or product collections — we can make a big difference in dramatically reducing the effort for merchants. Instead of manually having to go through and categorize hundreds or thousands of products one-by-one, merchants can spend time on the important parts of their business.
This must be an interesting time for the millions of businesses relying on Shopify as they navigate record demand, supply chain challenges, inflation and a lot more. Is the current environment creating any interesting issues around model performance – and how is your team monitoring and troubleshooting things like drift?
Foster: It has actually been an unprecedented time for going on three years now, so what would be considered anomalous is now historical continuity – though we fully expect that to change, too, making benchmarking and monitoring for things like drift all the more important.
We care deeply about data and machine learning operations (MLOps) at Shopify. Across our products that leverage machine learning, we routinely monitor, alert, and deep dive on feature drift, differentials between prediction and production, and shifts in dataset distributions that can and do impact model performance. We are really lucky in that we have a top in-house machine learning platform team that helps automate some of that monitoring and observability. It’s a key part of our practice, and it’s routine for not just the product categorization team but the whole group.
Broadly, we are dedicated to mirroring Shopify’s core values by ensuring high trust and reliability for all models we serve into production.
How do you approach collaborating with business counterparts in terms of not just quantifying AI ROI but also ensuring broader governance goals, responsible AI and/or AI risk management?
Foster: Partnerships across governance stakeholder groups at Shopify typically include product, operations, trust and security, legal as well as our broader R&D organization – with data and engineering as close partners.
If I could distill our machine learning team’s governance goals into three pillars, they would be: accuracy, compliance, and consistency. It is more nuanced than that of course, but those pillars require coordinated efforts. That’s why we have working groups and dedicated teams working on these issues, including a strong data platform center of excellence. The data platform in particular is key to ensuring governance pillars are baked into our data infrastructure and then by default into the data products we build.
From an ROI perspective, we take a number of approaches depending on the business or product that machine learning is being deployed into. The common thread is that everything is always measured by whether we are doing positive things for our merchants.
Your academic background spans both the humanities and technical disciplines; how does that inform your perspective as an executive and thought leader on AI ethics?
Foster: I actually think about this a lot, unpacking it to see what I can mine for my teams in terms of best practices for critical thinking from the humanities or other disciplines.
My stream in the humanities was cultural studies. Generally, your job in cultural studies is to interrogate how the things we make impact the worlds they are applied to – so moving into technology was a natural ladder for the application of those studies.
How I carry that over into the job today is in strategy and knowing the right things to build that bring the highest degree of merchant value. In my teams, I also try to sponsor the lessons that underlie the legacy of humanities research around consciousness, equitable technology, and social impact. There is a praxis in humanities that you’re constantly interrogating, and that approach by nature feeds back into my work and the mindset that I try to inculcate with my teams.
Working at Shopify – where our mission is to empower entrepreneurs with platform tools and opportunities to bridge gaps in access to an online market that can be life-changing – feels very connected to the humanities, especially since a lot of my research was focused in that area.
You’ve talked about giving users greater agency and control over AI products — why is this so important? Is anyone doing this really well today?
Foster: AI ethics is such an important and broad space with many different loci of concern, so it’s worth emphasizing that the preeminence of the user is a small part of the overall umbrella of what matters. I care deeply about the agency of the user, with that context. I don’t believe any group or application should have the unfettered ability to make bad or dangerous decisions – that kind of pure agency piece – so there is a lot of nuance around the agency that AI products and their application layer (much less a technology development layer) introduce from a responsibility perspective.
Day-to-day, I’m focused on the impacts that the application of machine learning has on users’ ability to make decisions effectively – so if I talk about the parts of Shopify that do this really well, it’s when we understand merchant workflows and make those less toilsome. An example from an adjacent domain is when a user is served a recommendation or a relevant product, there’s probably an underlying bias for the business that is trying to influence their decision making. Where I think that business can add value is to allow end-users to customize that output – saying it works or doesn’t work for them – and reweight it so the outputs are more meaningful and support their decision making process more effectively. It is a win-win for the potential customer and the merchant.
It’s worth being careful, however, in extrapolating too much from this – taking away every guardrail from framing and shaping the output may not always be desirable.
One other thing you mentioned recently is that explainability isn’t inherently the best approach – there’s a broader need for observability (at Arize, we often talk in similar terms on not relying on explainability alone as a cure-all). Can you outline your thoughts on this?
I’ll start by saying that I’m a huge fan of the work Arize AI does. Arize was really the first in-market putting the emphasis firmly on ML observability, and I think why I connect so much to Arize’s mission is that for me observability is the cornerstone of operational excellence in general and it drives accountability. Explainability or interpretability, on the other hand, is really at a step removed from accountability – that’s how I feel and how I’ve seen it in applications. Observation from a DevOps practice incites action-based remediation; that’s the goal of observation on any metrics in DevOps environments in my experience.
Do you think enterprises should have a codified AI ethics framework?
Yes and no. For me, a principled framework is a strong yes; a genericized checklist is a no or a “meh.” I feel strongly that principles guide decisions and catalyze learning moments, whereas checklists invite apathy or best efforts and those don’t always align with good practice. Building a culture that bakes ethical considerations into development as a given – so AI ethics by default – should actually be the goal and I worry that checklists are empty artifacts.
Why is representation so critical to the future of AI, not just in terms of hiring but also in data used to train models – and do you think there are prerequisites for organizations to take representation seriously?
Others have said this better, but the underlying principle is that we’re building for the world – and the world that we’re building for is best served by diverse makers and diverse datasets that really reflect reality.
I believe that representation needs to be a mission focus at organizations and there should absolutely be hiring disclosures and things that create public accountability. Shopify, for example, discloses hiring and diversity in our yearly sustainability report, which is so important not only to our organizational culture but in ensuring longitudinal stability.
What inspires you about the millions of entrepreneurs and small businesses that Shopify works with?
Honestly, so much! Shopify’s merchant focus and authentic orientation around merchant success is the core reason I joined the company. We have very close connections to our merchants. Every time I speak to a merchant, I’m overwhelmed by how tenacious they are – their fierceness, their goals and how they are self-learners and self-drivers.
Shopify’s merchants live out one of our key internal values – “thrive on change” – and embody our core principles and our core vision. Thinking about them, their stories and their success is quite honestly what gets me up in the morning. It has been a real honor to meet so many of them and build AI-powered products for them.