Tom Carr - Analyzing for Knowledge
Welcome to "AigoraCast", conversations with industry experts on how new technologies are transforming sensory and consumer science!
Tom Carr has used statistics to solve problems in the consumer products industries his entire career. He is very product focused. He has worked closely with his clients’ organizations to introduce new products and to improve existing ones. His work has exposed him to a wide variety of disciplines including engineering, chemistry, rheology, sensory, psychology, consumer science, and more. Tom is motivated by the challenge of figuring out how to use the information from these diverse sources help his clients achieve their goals.
Tom has a BA in Mathematics from the University of Dayton and a MS in Statistics from Colorado State University.
Transcript (Semi-automated, forgive typos!)
John: Tom, welcome to the show.
Tom: Thanks, John. I'm really happy to be here.
John: Wonderful. Alright, Tom. So you have obviously a very deep background in consumer products field and you've been in statistics and sensometrics for 30 plus years, 40 years even now at this point.
Tom: Yeah, it's a little past 40.
John: Wow. That's really great. So you're, I think, an ideal candidate beyond the show. Just talk about how you've seen things change over the previous 40 years. You know, when you came in to the field, maybe talk a little bit about what that was like. And then you can take us forward into the present and especially talk about what your clients are, what you see from your clients as far as what their needs are and how the field has changed and the incorporation of new technologies, the new quantitative methods.
Tom: Oh, absolutely, John. Thanks. Yeah, I thought a lot about exactly that topic and what has happened over the last 40 years after you invited me to come on the cast and I go back to my early days at Best Foods when I worked in the research and development division there. And the biggest thing that jumps to mind about the impact of technology, of course, is the speed with which we can get stuff done. And it just continues to this day. People don't even know what I'm talking about some days when I talk about a 300 board modem. And in an acoustical coupler where you actually pick up the phone and dialed a remote time-sharing computer. I can remember if we had of a big balance, didn't complete block design consumer tests that we had to analyze and we quite literally would send it off in SAS at the end of the day to be analyzed and we would come back the next morning. And it would be typing out on our computers. Well, our terminal, we didn't have a computer. And now, you know, with the just the laptop. It's a matter of 10 seconds for a huge study with a thousand people at it. And you're done.
John: Even then, people are impatient. They're wondering why it's taking so long.
Tom: You know, and I'm so glad you brought that up, because that's I think that's the good and the bad of that speed. Obviously, the ability to get this stuff done quickly is a wonderful benefit. But I've reminded of something Howard Moskowitz said years ago. He said, you know, some of his greatest insights came late at night when he was laboring manually over data analysis in his lab and that's when the great discoveries came to end. And I think that although fast analysis enable us to have a lot more information and run a lot more what if scenarios, a quickly a fast analysis doesn't automatically translate to faster insights. I think it still takes a certain amount of time to digest the information that we've obtained from the analysis, to connect it with other information that we have or that we know exist. And then it takes time to integrate all of that and to come up with ideas about what's going on. I think we have to resist the temptation to feel obligated to give a complete answer just because we can get the analysis done more quickly. I think you know, we have to fight that urge and give ourselves the time to think it through and figure out what we really think is going on.
John: Yeah, I mean, that's really a deep point. You're making there, Tom because I totally relate to what you're saying. In fact, I was on a client visit earlier this week and I'm old enough now. I'll be 43 in December and there are actually people who are younger than me, you know, in the field. And I was explaining to people in the room, now, when I took statistics, we had to actually compute sums of squared by hand. In a piece of paper and a pencil, and I was doing an over, we would write out the sums of squares and that whole idea was, I think, very foreign to them. But in the course of doing that, I do feel like there was almost physical involvement with the data that was informed of, and then gradually it's gotten more and like, okay, then you introduce a calculator, you're pushing some buttons, and then pretty soon you've got, you know, SAS or SPSS. Do you have any of these statistical programs that do these calculations for you? But you still have to do them for every variable. Then things get compiled up and now you have things being run in batch. Now, eventually you would like AutoML where you kind of push a button and lots of models get run. Right? And so it seems like. Yes, on the one hand, you're getting what would seem like a wider breadth of knowledge. But then you have to ask yourself, are you really getting more knowledge? I mean, what are you getting when you use these tools? So maybe you talk a little bit about that in your experience. I think that's a really deep point about how the tools might be leading to some sense less knowledge.
Tom: Well, I yeah, I think that the temptation is to immediately go to the very sophisticated analysis because they are so readily available. And, you know, often there's the old trite joke about, you know, the first three rules of real estate are location, location, location. And the first three rules of data analysis are plot your data, plot your data, plan your data. Jumping to a very sophisticated analysis when there are obvious outliers present in the data. You know, doesn't necessarily get you to the right answer. But because it's so convenient to have access to all these very sophisticated tools, I think sometimes we jump the gun and start to apply them before we've kind of, like you said, gotten dirty with the data, gotten physically familiar with what's going on. And I mean, we all know that bad data creeps in from time to time in our studies. And , you know, less so I always used to tell people, look for typographical errors in their data set. Well, with the automated data acquisition we have now, that's less and less apparent way. But, you know, things still happen. And so I think we've got to be careful not to distance ourselves too much from the raw data before we start applying the sophisticated tools.
John: Yeah, that's really interesting, Tom. It reminds me of something that, you know, we were talking a little bit before this call about experimental design and the importance of, you know, there's this temptation now. You just get a lot of data and you have a lot of computing power that you can somehow get some insight. Some is like the observational approach to science. Right? I mean, just a purely empirical approach where there's not really been any model. There's been theory, there's no design. You talked a little bit about what you think we're missing when we just go with the pure big data approach.
Tom: Yeah. What you said reminded me of a conversation I had years ago with a pretty well-known statistician, George Box. He was talking exactly about what you just brought up, the comparison of observational science versus experimental science. And he said that a scientific discovery is, I'm hoping I can remember this well, it was very eloquent at the time. It's the intersection of an interesting event with an interested observer and so observational science, there's all sorts of interesting events that take place all the time. But there isn't always somebody watching or somebody realizing that what they've seen is quite interesting. Whereas when experimental science kicked in, you might Millikan's oil drop experiment or whatever you want to put it. The growth of scientific knowledge expanded exponentially and it continues to. And I think if I can borrow on George's story, experimental science sheet. The experimenter is creating a lot of interesting events. And just by the fact that he's doing or she's doing that. They're automatically an interested observer. They're creating changes and they're observing what happens. So I think it's only natural that you're creating a lot more of those intersections of interesting events with interested observers. And I think, I'm going to sound like an old codger. But that's one of the concerns I have with a lot of the machine learning and the big data and the web research or...
John: A web scraping looking at reviews.
Tom: Social media based and other web scraping and social media researches. We don't we don't necessarily have our eyes open for interesting things. And just having a lot of data doesn't necessarily mean that it's going to, the interesting stuff is going to emerge from it. So I'm still a real strong advocate for experimental design. I think the question has to come first. So what do we want to learn? What are we trying to figure out? And you can't have an experimental design until you have some hypothesis so that is a necessary first step. Anytime. What are we trying to answer? And then the thought the thought flows, you know. Well, what are the things that might influence that answer? Those become our experimental variables. So I think that the whole experimental design, paradigm design of experiments and statistics, one approach really forces that thoughtfulness at the beginning of a research project that really aids and it is being ultimately successful. And it also highlights the barriers, the secondary considerations, the logistical concerns that are involved in any sort of scientific undertaking, you know. Can we measure what we're trying to measure? Some of your earlier podcasts with Chris? It came up, you know, do is overall liking really the right endpoint for us to be paying attention to? Or do we want interest or motivation to be more or involvement to be or the endpoint that we should be focusing on? I think all of that has to factor into our planning. And just running out and collecting a bunch of data and throwing it into a machine learning algorithm doesn't necessarily yield the kinds of information people need. I've never had anybody answer the question. You know, is a machine learning model going to tell a product developer whether two or two and a half percent salt is the better concentration? I have a hard time imagining that is actually a merge.
John: I mean, that one I think is a question of deep. Okay, so there's kind of a few different topics here. I mean, one thing I would say is that a weakness of a lot of that kind of publicly available data that's being fed into this, especially for software as a service. You see a lot of this where you've got people scraping data from publicly available sources and then looking for patterns, looking for trends, you know, that kind of thing. Those data are not coming from any kind of scientific process, really. They're not being scientifically collected typically. I mean, sometimes some companies are better than others, but they're not being generated according to some sort of thoughtful design. I mean, it's really just like river sampling where, you know, you've got things coming along whenever you get is what you get. Right?
Tom: That's right.
John: I think that's one weakness. Now, I do think that if you have a lot of data inside a company where you have years of information, where you've got percentage of ingredients, for proprietary data inside a company like Kraft or something like that, or maybe they have years of data. I do think if you got that those data organized in a reasonable way, you might be able to mind that for at least some interesting ideas about percentage of salt. But I think that's not what people are doing right now. Right now, it's just easy to get data rather than it would be a lot of work to organize all of the crap data, for example, into a form, you can ask that question.
Tom: We've actually had several projects over the past, I'd say decade. Carr consulting is in its third decade now and probably even less than that, maybe over the last seven years we're getting more and more clients coming back and saying exactly that. We've got decades of similar studies still available to us. You know, what can we get out of that? And you're right. The lion's share of the effort in those kinds of projects is getting the data into a compatible format where you can access all of it. I think it's a minor aside, but I think a lot of people don't realize that probably the vast majority of the statisticians time is spent doing data management. It's getting that information that we acquire from people into a format that we can begin to use. And I think that's another aspect of, you know, getting dirty with the data where you discover things in the data. You never expected to be there. I mean, we have scale reversals. People think they've run the studies the same way all the time. We have different number of categories and some studies show some of hedonic is on a seven point scale. Sometimes it's on at nine. So, of course, that's going to wreak havoc with any summary measures if you don't catch that before.
John: I mean, I have time span. You're right. Like, right now, it's the same with my clients as well in terms of trying to do something with historical data. That, of course, is a very hot topic right now because people are sitting on tons of data and they feel like there must be something to do with it. And I have come to believe that graph databases are actually really the answer for the right way to organize the data set or the database. And I am working with that with a few clients on that now. And I'm gonna be giving some proposing some talks in sensometrics on this topic, actually, some use cases as well. So I do think it's the right path. But you're right. Sometimes that historical data. It's like a crime scene trying to figure out what happened.
Tom: So, yeah. And some of the products we actually with one of our clients, we would filter out all of the studies that were based on experimental designs because they wanted sort of, if you will a best performance measure. You know what we're good prototypes, how did they perform with consumers in experimental designs? In those studies, you intentionally have products that, you know, are suboptimal. You know, they're at the extremes of the design space where everybody knows, you know, that isn't where we want to be. But we need to push those limits. We would filter those out and not include them in the database because they would skew the information.
John: Oh, interesting. Your norm. Like the historical norms on what liking scores should be or whatever that being dragged down. Right. That's interesting. Alright. So we talked a little bit about, you know, people doing something with your data. And I do want to come back to the idea that somehow things can become too easy. I think it is true that if you've got tools where there's essentially no friction, you push a button, get results, that there's no depth to someone's thinking anymore. It's just everything's on the surface at that point. So I'd like to come back to that before we do that. I'd like to hear, you know, what else is there that you are hearing from your client? What are the other concerns they have right now, the other problems that you have, like, what else is on your client's mind these days?
Tom: Yes, it's not exactly new, but it's more and more prevalent. People are, sensory has gone way beyond being just a collection of methods. It's at a point now where we're really trying to answer questions that can't be or haven't been answered by other technologies up to this point. And the holy grail as I see it is predicting in market performance. Can we do something with internal measures, preferably internal late stage product development measures that would be predictive of how likely it is that the product will be successful of moving forward in the market and I think everybody would agree. It's you know, we're not, we don't think that overall liking is going to do it for us. So, I mean, not to say that overall liking doesn't have value in many context. But in terms of, you know, John, it's the "What? So what? Now what?" paradigm. We have a lot of tools, those are the "what's". The "so what" are, so what can we do with them. And then the "now what" is how do we move forward. Chris Simon was talking about involvement and engagement and that maybe that would be more predictive of success in the market than overall liking and the issues of context. I'm getting more questions from clients about whether or not it's necessary to be testing in contexts to get reliable information in the end. I think the answer is sometimes yes, sometimes no. Chris even mentioned that some products probably are more or I should say responses to some products are more influenced by contexts than others. So learning the details around that is interesting from a methodological point of view. But the clients aren't asking us for methods. They're asking us for answers. And it's that prediction, and I think that's one of the great benefits of these new and emerging technologies, is that they might offer up approaches to solving problems that we haven't had a good approach for in the past. And, I think the stage that we're at right now in these new technologies is that we need to integrate them with what we've been using in the past. Sometimes the new things will replace what's been used in the past because they do a better job of it. Other times they're going to complement approaches that have been used for years because those older approaches still deliver a great deal of value. I don't think it's ever been the case that you just completely abandoned approaches that have been used in the past. It's not an us versus them situation. It's how do you get the most value out of everything we use, and not feel like you have to close one door, just open another.
John: I mean, for example, it's hard to see different testing going away completely just because, you know, sometimes the question is, are we confident that people aren't going to be able to tell the difference when we make this ingredient substitution? Like that question, I think is going to be around forever. And I think we have a good method for answering that question. So I suppose eventually you might get your predictive models up to such a high degree that you could actually predict responses. But I don't see that coming anytime in the immediate future. Well, I think where we see I see kind of two ways for these new tools is really coming in at one on the data collection side and the other on the data analytic side. So I'm wondering, what is your, are you seeing new types of data come in, you know at Carr consulting, are you seeing more neurobiological data or reaction time or are you seeing, like, more diverse data now than you used to see?
Tom: We are more metadata. Like you said, the response time to answer questions is becoming a pretty regular thing for us to look at now. We are not getting a lot of neurophysiological data and the folks that we're working with, we are trying to integrate, in many circumstances, we're trying to integrate, I won't call it methods it's ancillary information. So what were the conditions of the test? Sometimes environmental conditions, sometimes things like time of day temperature, environmental factors, and seeing if those are having any sort of from an old statistician's point of view or, you know, are there blocking effects taking place or covariate-effects taking place that there is these uncontrolled or unintended but unavoidable differences that are taking place during the test situation? There might be having an influence but we don't. So there is more attention to that.
John: So there are richer models. You are developing richer models than you might have once upon a time. I think more different types of data.
Tom: Yeah. But I think you touched on a very important point that you know, Mark Dancer, very very talented statistician and researcher from France, brought up at a sensometrics meeting years ago. We really live in different worlds physiologically. He said that every, there are about 450 or so proteins or receptor sites for the on the olfactory bulb or there can be, each person has about 200 to 300 of these 400 plus possible receptors. So you and I actually live in a different olfactory world because you probably have a different set of receptors than I do. Now, for better or for worse, there's apparently enough overlap that, you know, we can both smell a rose and say, yes, that's a rose. Now, what you're smelling and what I'm smelling physiologically are probably different. But, you know, like I say, this overlap is sufficient for us to to agree. But, you know, that is a daunting challenge. That's why I'm hesitating to talk. You know, how are we going to deal with that and are we ever going to deal with that unless we somehow can customize food on a customer by customer? Well, even, you know, in individual by individual basis to really optimize. And I think here to fore we're you know, we are really relying on that overlap to allow us to make a product that has relatively broad appeal. And not even knowing why.
John: Yeah. And I think that's right. I mean, the big lesson I got from Pangborn this year was the idea that it probably is the case that when two people taste food, the actual experience is different. It's not just simply that they have different preferences when people give different liking scores, could well be that their experience with the food is different. And so you know that, and Chris Finley has some interesting ideas about doing genotyping or something, you know, getting really detailed information about people putting that into the model and then starting to make predictions that take into account that person by person differences. Now, of course, that's really, like you said, daunting, but it seems like that may ultimately be what's required. If you're trying to get that level of predictive power to really nail down individual differences, you might have to know a lot about people.
Tom: To nail down the individual differences. Yes, I'm a you know, I'm a product guy and I'm a consumer products guy. You know, the mass market so I think it's going to be a hugely interesting and enlightening area of research. I think it'll be quite a long time before the consumer products industries are going to be able to carry that. I think you got to be looking for a massive.
John: You go up to a Coke freestyle machine and it's says spit in the cup. We'll make you the perfect drink.
Tom: That's right. It'll be you know, maybe my great grandchildren.
John: Alright. So we have a couple of open loops here that it's going to wrap up. One is the idea that these tools for analytics make life so easy that people are actually ending up with less knowledge than they might have had. They used a simple tool that required more involvement or more engagement. So do you have any thoughts on that? And how is it, how would you recommend that a kind of sensory analysts proceed these days with this wealth of tools available? How should we really use those tools in order to make sure that we increase our knowledge instead of just going faster?
Tom: Well, I think that preliminary checks on the quality of the data are essential that, you know, ensuring that, you know, the old garbage in garbage out phenomena that if you can analyze information into data, no matter how good your data analytics tools are. They can't create information in a data set that doesn't exist to begin with. So ensuring that you've eliminated anything that's going to dilute the information that's in the data that you have, that's the essential preliminary step. And I think it's also important to always remember that our job is not to analyze data. Our job is to answer questions. And we have to put the question first and always go through the "What? So what? Now what?" paradigm to say, okay, here's what we've seen in these data analysis results. Now the work begins. And we have to give ourselves the time, no matter how fast our data analytic tools are. We have to give ourselves the time to interpret what the results mean relative to the question that we're trying to answer. You know, the work only begins when the printer quits printing, you know, then you have to think about what it all means and what are you going to do about it. We have to give ourselves permission to take the time to think that through.
John: Right. I totally agree with that. And that kind of leads us back to the last kind of open loop here that something you and I have been talking about, which is the idea that there are these new tools, it isn't like you said a new paradigm that's opposed to the old paradigm and that, you know, there's data scientist versus statistics or whatever statisticians, for example. There's a set of tools that are available and that we need to find the right tool for the job. But keep the science, you know, as the guiding principle. So what thoughts do you have on that? Just to kind of wrap up as far as how do we move forward into this new world where suddenly we have many more tools available to us?
Tom: Well, I think it might seem overly simplistic, but it's one that I said before. You have to start with the question. And you have to start in with a goal oriented perspective on what you're trying to find out and then bring to bear the best collection of tools to answer it. And I know I can take it back. Who we mentioned discrimination testing a moment ago. My dear colleague and mentor, Morten Meilgaard was a staunch advocate for you for biasing the respondent pool in discrimination test. He would he worked for Stroh Brewery at the time, and he would say if there was a concern about a particular fage being present in the brewery, he would intentionally recruit respondents who had a track record of being highly sensitive to the flavor impact that fage might present. And he felt, you know, his goal was it was a very de minimus, risk averse approach. He wanted to identify a problem if any consumer would be able to notice it. And a lot of people would say, well, no, no, no. You know, we should recruit consumers to participate or naive consumers to participate in our discrimination testing. Well, they don't realize that by doing that, they've actually changed the question that they're answering. You know, it's different to say, can anybody notice a difference versus saying can the average consumer notice a difference? And neither question is wrong. But I think we should be aware of how we've actually changed the character of the information that we're going to obtain based on how we choose to run the study. So really thinking through what our question is and what risks we're willing to take on and which ones we want to avoid, I think it's essential to good experimental planning.
John: Right. Yeah. So understand our questions, design a study reverently and then use appropriate tools, you know, at every choice, every decision point, make our choices consciously based on our goals instead of just being carried away with new technology.
Tom: Right. You know, apply the best tool to the job. I don't use as many handsaws as I did when I was a kid. Not to say that a handsaw isn't a bad thing, but boy, oh, boy, it's a lot easier to use an electric.
John: Right. Okay. Well, great. Well, that's really insightful, Tom. Well, thank you so much for being on the show. Just kind of wrap up here. How can people find you? I mean, someone on follow-up and if they wanted to work with you. How should they reach out to connect with you?
Tom: Oh, thanks. Yeah. Well, the best way to get ahold of me is through my email: firstname.lastname@example.org. And then I'm on LinkedIn also. Not a big social media person, but I have a lot of contacts on LinkedIn. So that's another good route to get in touch with.
John: And I'm sure you'll be in a variety of conferences as well next year. Someone wants to reach out to you.
Tom: I'm going to be still active throughout the coming year and moving into 2021 up in Vancouver. I'm looking forward to that meeting.
John: Wonderful. Okay, we'll put all the links and the notes for the show. So thanks again, Tom. It's great having you on. And, I'll see you before too long, I'm sure.
Tom: Sure thing, John. Thanks for having me.
John: Okay. That's it. Hope you enjoyed this conversation. If you did, please help us grow our audience by telling a friend about AigoraCast and leaving us a positive review on iTunes. Thanks.
That's it for now. If you'd like to receive email updates from Aigora, including weekly video recaps of our blog activity, click on the button below to join our email list. Thanks for stopping by!