atlassian data scientist mark skarr

Interview: Mark Scarr, Senior Director of Data Science at Atlassian

Gabe Barcelos

Founding Engineer

Mark Scarr is the Senior Director of Data Science at Atlassian, where he heads up the Core Machine Learning Team. We talked to him about what the team is working on, Atlassian’s use cases, cloud migration, and what’s in store for the future.

Please briefly introduce yourself and outline your role at Atlassian

I lead the Core Machine Learning Team at Atlassian. It’s a cross-functional agile team, and we have quite a broad remit. We work with various other teams across the organization from marketing to growth/product on a number of different initiatives. Just for a little bit of additional context, stepping back, I’ve been at Atlassian about four years. And I joined and have grown the team from a tiny seed. Prior to Atlassian, I worked for a couple of years still in the B2B space at Box. And then prior to that I worked at PayPal back even further, Yahoo. I’ve been in the tech space for a number of years on the B2C side but also on the B2B side. So I’ve kind of seen a 360 degree view of how things have evolved in those different areas.

How many models do you have in production and how big is the ML team at Atlassian?

We’re a relatively small ML team. And as I said, we have a very cross-functional remit. So some of the projects we work with span the whole gamut of the organization across marketing and growth. We have worked on the ML side with Trello, which is one of our products, with Confluence, with Jira. In terms of the types of models that we’ve been involved with,  recommendation engines are one use case. Propensity modeling and then starting to explore more recently the generative AI space–it’s something that we consider is certain in Atlassian’s future in a number of different capacities.

What are Atlassian’s primary machine learning use cases across the Jira family (Software, Service Management, etc.), Confluence, Trello, and the rest of your product portfolio?

We work very closely with our performance marketing team on building out a framework for harvesting keywords for performance marketing and search keywords and bidding optimization–we recently filed some patents in that space as well. That framework allows us to take a pool of keywords and then augment it by looking for other keywords that we think would be suitable and fall within the same bucket. So you can apply lots of interesting NLP-type approaches to explore that space, and clustering methods to increase that pool. If you like, you can use semantic similarity, and there’s a whole bunch of other techniques under the NLP umbrella that we could also employ. That’s one area within marketing where we continue to do a lot of work, and build out that functionality. That’s an ongoing project and that’s kind of where I think the ultimate vision would be to have the system fully dynamic and fully automated end-to-end.

We’ve also done a lot of work around modeling the customer lifetime value. Customer lifetime value in the B2B space is a lot more nuanced than in the B2C space. There have been a lot of modeling and data challenges in coming up with a decent way to actually compute that, so we’ve done some interesting modeling work there. And that actually ties into the application I was just mentioning, because clearly customer lifetime value–or a derivative of that–is actually used as part of the input into making those bidding predictions for what the downstream value would be of a particular customer.

Can you talk a little about how Atlassian’s migration to the cloud and how that has impacted the business and ML team in particular?

Cloud migration has been a game changer for our business opportunities. Since we’re talking about ML, the biggest impact is really the prevalence of more data. In cloud-based applications, you have a much richer data set accessible to you, which allows you to train models a lot more efficiently and effectively compared to a hosted solution. So the biggest game changer really from the cloud migration, from an ML perspective, is the additional richness of data that we have available for model training. Cloud migration is ongoing, but we’re committed to cloud solutions across our product portfolio because it’s advantageous from an ML and data richness perspective. Another key associated issue with that is obviously instrumentation. If you’re going to be relying on or having a lot of this additional data available to you, then you really want to make sure that your instrumentation is there to be able to capture all these new data points. So, that’s another area we’ve explored and been investing in as well.

One of the benefits of cloud migration is the ability to instrument things a lot easier, a lot faster, in a cookie cutter fashion across all different types of use cases. Are you applying business metrics or is that something that’s still a little bit down the line?

No, it’s critical that you can actually measure the lift or the change in those business metrics. So the business metric may or may not be your objective function depending on the problem. But it’s key that you can actually measure, you can actually demonstrate a change in those metrics. So we work very closely with that and obviously with the experimentation team, if we’re gonna run an experiment, those are the metrics we want to move the needle on, rather than our standard, you know, your internal model metrics, whether you’re looking at, you know, MRR or precision/recall X or whatever the metric happens to be. Internally, obviously we monitor those, but it’s really the external business metrics, which is why having the business stakeholder as part of that partnership. And the analytics person as well, because they have a handle on that, and typically can help establish what that baseline is that we want to move the needle from. So I think that’s very important as well.

My academic background is statistics, so I’m a big fan of a gentleman by the name of William Edwards Deming–He was a statistician and he coined the Plan, Do, Study, Act cycle. Which has many, many variants out there. And it’s very much this kind of cyclical, iterative approach to building and developing models where you can fail fast and move on. I find that a very illuminating way to frame the engagement and kind of what that process looks like. It’s been adopted by, in various guises over the years, in a lot of different disciplines, but I find that very useful as a framework.

How do you collaborate with business and product leads and tie model metrics to business results?

We adopt a very flexible engagement model that varies from project to project, depending on the team we’re working with. But typically we partner very closely with our analytics compatriots, and any project we work on is driven through a business stakeholder. So if there isn’t a business stakeholder involved, then we wouldn’t work on it. We’re not in the business of building something we think is “cool” and then shopping it around–that’s a recipe for failure. I’ve been through that and the ML scientists working in that space are super excited because they’re building something cool, but the business folks are like, actually we asked X and you’ve answered Y. So it’s a lose all round. If you flip it around and build something the business stakeholders want, they’re happy that they’re getting a solution to their business, the ML scientists are happy that they’re making an impact, so it’s a win-win. So we definitely adopt that approach.

It takes a village as the old saying goes to build, train and develop a model, and put it into production, so we make sure that all of those different parties are in the loop and so they each bring something to the table. The analysts are usually the experts in that particular area and know the metrics and the data. Obviously the business stakeholder keeps everybody honest, so make sure we’re answering the right question. And then the ML scientists bring in the modeling expertise. And then, you know, the engineering design and all the other pieces play their roles as well. But typically that’s how we think about engagement, and it could be more or less embedded depending on the project. It’s quite flexible and fluid in that sense.

How do you view the evolving MLOps and ML infrastructure space?

It’s a very rich dynamic space. There are a lot of companies operating in that area, producing lots of tools and software for post-production model monitoring, bias, and feature drift, so there are lots of tools out there to help with that. But what is less clear to me is how that tooling fits in organizationally. And so I think that’s where the MLOps team comes in as kind of analogous to DevOps, where they can actually own the models post-production, and be responsible for the monitoring and the refinement of deployment–if there’s additional models that you want to deploy once that model’s in production.

So I think they’re definitely what I would consider as a separate team because it’s almost a separate skillset to your standard engineering teams or your standard platform teams who would work with them. I think that’s something that as you scale, you definitely want to have dedicated resources. I see an ML scientist as very different from an ML engineer who’s very different from a data engineer who’s very different from an ML platform engineer and so on and so forth. Those are all quite distinct disciplines within the space and there is clearly overlap, but I think some of those disciplines not only require different skills, but different types of people. There are certain people who gravitate more towards the R&D aspects of ML science, and people who gravitate to the engineering side of things.  I think it’s good for us to acknowledge that these are these differences and it’s not a one size fits all.

Is there anything your team is working on that you’re excited about? Anything in the generative space you’re starting to explore?

We’ve been working on some recommendation engine applications for our Atlassian apps and one of the aspirations there is to actually build it out into a full ecosystem. That’s an area we’ve been developing and building out a recommendation engine in that space. And then in the product space we’ve also done some work with the Confluence team on recommendations. In the generative AI space you can imagine with Confluence, there’s lots of ideas around NLP functionality like auto completion, text summarization are just two that come to mind. In the Jira space, we’re thinking about automatically generating tickets for example. Those are some ideas that we’re exploring amongst others.

And so that brings us into the specifics around generative AI. There are applications within Atlassian we’ve identified where we’re looking at generating text, video, image, and voice as well. So, maybe rather than interacting through a keyboard, you actually speak to Confluence or Jira. I’m definitely excited about this space.

And I think from a company like Atlassian, these publicly available APIs and services aren’t necessarily a game changer or give us a competitive advantage in and of themselves because they’re publicly available. You might get a short-term first movement advantage, but you’re not going to get the long-term competitive advantage necessary. But I think where companies like Atlassian really get the advantage is in pre-trained LLMs coupled with what I would call a secret sauce. It’s going to include obviously the richness of data in your domain specific area and what you do with that data, along with the business knowledge, feature engineering that you apply on top of that data, and any augmented models that you train on top of the LLMs. I think that’s where the competitive advantage is, and we’ve seen a little bit of that in some of the exploration work that we’ve done.

Can you talk a little bit about Atlassian’s recent releases?

The big news was the announcement at Team ‘23 of Atlassian Intelligence on the company’s commitment to infusing AI functionality into our portfolio of products. That takes many different forms, and I don’t necessarily want to go into any great detail, but certainly all the press releases around Team ‘23 have more information about that. But a lot of it actually focuses around generative AI applications and leveraging LLMs to make the end user’s life a lot a lot better. It’s very exciting that we’ve made that commitment to AI.

What are you most excited about over the next 3 to 6 months either in Team ‘23 or in just the generative space?

What I find most exciting is the passion and the engagement around LLMs, and the fact that there’s been a really unprecedented level of democratization around ML, which I think is overall fantastic. There’s a darker side to that too, with well-documented shortcomings of LLMs. Hallucination is one example, and we could go on and probably spend an hour talking about other things as well, and the limitations. But I think just focusing on the positive and what’s exciting is the adoption and the fact that there are all these applications out there and we’re starting to explore them, and include them in our product portfolio.

What’s interesting from a machine learning perspective is how do we start thinking about measuring the efficacy of these models? If I have two or three different models that I want to try for a different application, how do I decide which is the best? There are some standardized frameworks out there at the moment. The CRFM folks at Stanford have produced a framework called HELM and Open AI has the Evals framework, for example.These are very much standard generic frameworks for model comparison. But when you get down to your individual applications, like if i’m providing, say, some text summarization capabilities in Confluence. How do I know model A is better than model B? How do I measure that? And so I think that becomes a much more interesting problem from a data science perspective. There’s lots of internal discussion around that, and I think that’s where science can add value, but it requires a lot more thought, and it’s dependent on the use case. There are lots of different ways we can approach that but coming up with a framework to allow us to do that is going to be critical to measuring the efficacy of these models.

How do you see workplace interactions with generative AI shifting?

Work will evolve, and so will how we work and how we interact. If you think about how we interact with our computers through a keyboard–it’s very inefficient. And even how the keyboard is laid out is well-known to be an inefficient keyboard layout. So I think there’s room for improvement. And once we kind of get more comfortable with these virtual assistants–which may take many forms–there’s going to be a lot of privacy challenges there with avoiding data leakage. But I think moving beyond that, how we interact with machines will evolve. The logical next step is speech to text. There’s plenty of speech to text software out there already that works very well, so you’re already starting to see some of that. I’ve recently been in the ER at my local hospital, and there was a whole wall of desks, and the doctors were all sitting there with some kind of recorder. They’re all speaking into these devices, and it’s just transcribing it directly on the screen. And that’s a much more efficient way in a high pressure environment like an emergency room where they can actually get their thoughts down into their reports. And so you know that’s probably ahead of the curve, but I can see you know the natural evolution of work trending in that direction. And then there’s also the augmented reality track where I think that there’s going to be some advances in that space as well.

How do you see data scientists and machine learning engineers evolving with these changes?

The bulk of the ML space is still unsupervised learning. If you were to draw a pie chart or some kind of graph, the chunk that’s related to generative AI relative to these supervised learning pieces is much smaller. And I don’t see that changing in the short to medium term. There’s still going to be a need for what you might call nowadays traditional supervised models, and I think that space will evolve. But you know, anyone who’s ever built any kind of supervised model knows that we’re already at a point where the model is commoditized. You build your training data, set your pipelines, and you basically take an off the shelf model, whether it’s a standard logistic or a linear regression model, or even a deep learning model and plug it into your pipeline. But localized, specialized knowledge is still critical to that process, and that’s something that I don’t see changing that dramatically. But the most exciting potential piece of the current LLMs is with the notion of plugins, or what LangChain calls agents, where you basically have an LLM that acts like a conductor in an orchestra and farms out services. Now, that’s where it could get really interesting, you could actually construct a whole ML pipeline through the natural language if you had the right pieces, and I’m sure people are starting to think about it from that perspective, but I haven’t looked into it in any great detail.

You co-authored a paper called “Measuring Productivity and User Engagement from Workspace and Network Interactions.” How did you approach studying this and what are some of the interesting takeaways?

That was an interesting project that grew out of some of the more well established, well known social professional network graphs. Companies like Linkedin have a professional network graph, and obviously Facebook has their social graph, so the idea was that we could build a similar network graph to model workplace interactions or collaboration. In light of the fact that a lot of software nowadays is geared towards folks who can collaborate digitally, how do you measure productivity? And the idea was that if you could measure productivity for a given product, then you can obviously attach value to that. And if you can show that there’s a changing productivity or an improvement as a result of that particular product, and obviously there is value to the end user. So that was the motivation behind the paper and measuring productivity, especially white collar productivity is quite hard to do. So the idea here was to use the collaboration graph approach using some kind of network theory to basically say, can we measure productivity in terms of how people actually collaborate on projects? The idea with this company I worked for–this was prior to Atlassian–had a product similar to Confluence called Box Notes, and basically people could produce content and then share it with other people. And then other folks could edit it, comment on it like it, and so on, and so forth. So this creates a network of information sharing right? And then you can create a network graph based off of that, see how information disseminates over time, and how it expands and contracts. Then you apply classical network theory metrics and methods, and look for communities and influences in the graph. That information can then be used to make strategic business decisions and provide insight back to how people are actually using the product. It’s very useful for targeting from a marketing perspective.

What are some tips you would give to ML leaders or even anyone getting started in the space?

I think that probably the key one is flexibility–the ability to be flexible with how you think about things. With this influx of generative AI models, how do we think about them in a positive light? Being able to say: Okay, generative AI is empowering for these reasons, and now I need to focus more on some things I wouldn’t have before. That would be the one thing I would say is this notion of having a very flexible thought process and mindset, and being able to adapt. Because it’s never going to go back to how it was before. And I think it’s about acknowledging that, and being comfortable with that, and being able to adapt and embrace this new technology. Ultimately, I think that’s one of the reasons why we’re in space in the first place, and it’s such a privilege to be at the forefront of it. It’s a great time for machine learning and AI, because we’ve opened up a whole new dimension here. It’s going to be exciting over the next 6 to 12 months to see how it evolves.

Are you hiring right now?

There are always opportunities for the right candidates. Anyone with a robust background within different realms of machine learning are always in demand. So, we’re always open to those types of candidates.