AI Can Never Do It Alone

Originally published by our sister publication Anesthesiology News

By Bruce Ramshaw, MD

One of the most successful and famous artificial intelligence (actual intelligence) applications was demonstrated for three days of competition on “Jeopardy,” the long-running game show. Ken Jennings and Brad Rutter, the two most successful champions in the game’s history, were pitted against IBM’s Watson computer. Knowing how Watson was programmed to beat the best of the best “Jeopardy” players can help us understand how to appropriately apply data science tools in healthcare.

Watson’s somewhat surprising victory didn’t depend on just filling the computer with searchable facts (À la Google, Wikipedia or all the encyclopedias in the world). One of the key ideas was for Watson’s human programming team to recognize the need to provide context for the information programmed into the computer so that Watson could identify the appropriate patterns. Instead of asking the “Jeopardy” champions to recommend what knowledge should be put into the computer, the team realized that the real context would come from the game show’s question writers. They programmed into Watson all prior questions and answers ever written for “Jeopardy,” going back to the first episode on March 30, 1964.

IBM Watson’s programming team also needed to program the computer to understand the English language used in the context of playing “Jeopardy.” Ultimately, a team using trial and error over a period of years developed an approach they called DeepQA that demonstrated ensemble learning, networking multiple unique algorithms generated from many different local environments. They used over 100 different natural language processing programs, combining them to allow the computer to understand and answer the questions correctly most of the time, and usually faster than the human competitors.

In a multiyear, iterative process—many testing phases were performed—Watson’s ability to predict the correct answer improved over time. There were failures along the way, and the outcome of the competition was not guaranteed, but the result was impressive, with Watson scoring $77,147 compared with Jennings’ $24,000 and Rutter’s $21,600.

Since the early days of computing, there has been a lower brain fear that computers could replace or even be a danger to humans. Many famous and intelligent people, including Elon Musk and others, have popularized these fears. In part, these fears come from the Turing test concept, the likelihood that one day we will not be able to determine whether we are speaking with a human or a computer during a normal conversation. But the Turing test is flawed, and these fears are misguided because we will never have computers that can think like humans. We humans have the potential to be creative and use critical thinking to discover new knowledge and innovations. Computers will always depend on humans to program what goes into them and interpret the output.

Before Watson, IBM had another supercomputer called Deep Blue, which was programmed to play chess. In a famous rematch against world champion Garry Kasparov in 1997, Deep Blue won, and computers have consistently beaten the world’s greatest chess players since then. But with the appropriate application of data science, there is something that now regularly defeats the fastest supercomputers at chess: a system called Centaurs. This human–computer symbiosis combines human team members’ intuition, creativity and empathy with supercomputers’ massive computing capabilities. These different strengths are complementary.

Our hernia team learned these data science principles over the past decade using a human–computer symbiosis applied to real patient care. It wasn’t an easy process; we didn’t have a road map or textbook. We were trying to take the principles of a scientific paradigm used in other industries—financial industry, baseball, etc.—and apply them to healthcare.

One issue was that data science in other industries was used to improve the organization’s revenue and profit or win against competitors. It was not being used to enhance the value for customers as we were attempting to do. So, we had a lot of trial and error.

At first, we measured way too many data points. Because I’m such a hernia nerd, I wanted to see everything we could think of. We spent too much time and resources capturing data and not nearly enough time and resources figuring out how to measure outcomes in terms of value. As we realized we measured too many process measures, we went from collecting more than 600 process data points to only a few dozen, focusing on the ones we thought mattered the most.

We also needed to use various analytics and data visualization tools to gain insight through feedback loops to improve value. After a few years, the hospital noticed they were no longer losing money on our complex hernia patients. They even began to make a modest positive net margin on each patient.

Our most unexpected and vital discovery came a few years after we started. It took another few years to mature our understanding of our findings’ impact and learn what improvements we could implement to address this discovery.

One day, we had a hernia team clinical quality improvement (CQI) meeting, and we were looking at the data for patients with complications and other less-than-ideal outcomes. We looked at our operative techniques and the typical patient factors like body mass index and smoking, but nothing seemed to explain a pattern for these patients with bad outcomes. Then Remi, our patient care manager, spoke up and noted that the patients with less-than-optimal results seemed to be the same patients who were more challenging before surgery.

Remi described patterns in these patients: Some were angry; some had unrealistic expectations or were looking for a “quick fix”; some had high anxiety, depression or a controlling personality. We didn’t yet know how to measure this, but Remi convinced us there was a pattern. We needed some sort of measurement tool. Lacking much expertise in this area at the time, we settled on a subjective measure we called “emotional complexity,” and we put patients in categories of either high, medium or low.

As the next six to nine months passed, we recorded emotional complexity and a few dozen other data points. The following factor analysis of the data showed that emotional complexity was the highest modifiable factor in predicting our patients’ outcomes. The only elements with a higher correlation to outcomes were the hernia size and number of prior hernia recurrences, which could not be modified.

Since this was such an important factor, we invited a small group of social scientists to our next CQI meeting to develop a more robust measurement tool. We developed a 12-question tool to help identify the specific issues that might affect outcomes. Two of the questions asked about anger toward a prior surgeon and anger toward a mesh company. We found that in some cases, if patients were angry, this could affect their outcomes if they didn’t go through a process of healing and addressing their anger before surgery.

This discovery didn’t come from any preconceived belief in the importance of a patient’s emotional state. Remember, I’m a surgeon. I believed that a patient’s outcome was almost entirely due to my excellent surgical skills—at that time. These findings came from continuous improvement and data science principles using various analytical tools as a part of our human–computer symbiosis. At first, I was shocked. How could a patient’s emotional state be more important than my surgical skills?

As our team learned more about the neurocognitive/emotional state that can develop after traumatic experiences (violence, sexual abuse, financial stress, psychological abuse, multiple previous surgeries, etc.), we learned there could be a neurophysiologic change in the brain that leads to a chronic stress state. This chronic stress can negatively affect the hormonal and immune systems and increase inflammation in the body.

Fortunately, we also learned that various cognitive therapies could rewire the brain to help resolve this chronic stress state. With the results from our analysis of data and our creativity to find potential solutions from other medical disciplines, we implemented the concept of prehabilitation with preoperative cognitive behavioral therapy for patient subpopulations with chronic pain and other signs of prior trauma. Since implementing prehabilitation, we’ve seen a positive impact on a complex group of patients suffering from chronic pain after hernia repair and have published our findings.

Read part 1 from our March issue!

Over the decade, as we learned to apply these principles, we made many mistakes and overcame many challenges. But as we became more skilled in using data science principles, we’ve seen the potential for our healthcare system to be transformed. Computers alone, AI, could never have discovered this new knowledge for our hernia program. Through human–computer symbiosis, actual intelligence, any measured outcomes can be improved for patients suffering from any health problem. The challenge ahead is implementing a data and analytics infrastructure in each local clinical environment, so the potential of the human–computer symbiosis can be achieved. We’ve done it in chess. We’ve done it on “Jeopardy” and in baseball. Why not in healthcare?

Ramshaw is a general surgeon and data scientist in Knoxville, Tenn., and co-founder and CEO at CQInsights.

Commentary

AI Can Never Do It Alone

Artificial Intelligence Versus Actual Intelligence (Part 2 of 2)

Read part 1 from our March issue!