How Hertz Helped an AI Company Take Shape
Diffeo, a startup company recently acquired by Salesforce, has deep roots in the Hertz community.
The seeds of an idea can sprout in many ways. On their own, a motivated person can follow their passion toward invention. In a classroom or research lab, a team might purposefully set out on a quest to discover something new. But other times, an idea is born of chance meetings and coincidences.
Hertz Fellow John Frank says it was this kind of organic collaboration that sprouted Diffeo, a startup company with deep roots in the Hertz community. Diffeo’s story illustrates the importance of connecting with other people for entrepreneurship.
“There’s a lot of value in ‘showing up.’ There’s no substitute for going on-site to meet people. It opens you up for serendipity,” says Frank. “Collective IQ can create totally new opportunities.”
Diffeo’s origin also shows how organizations like the Hertz Foundation offer fertile soil to help ideas sprout and boost the chances of such emergent ideas.
“Diffeo came directly out of the Hertz Foundation,” says Hertz Fellow Dan Roberts, another co-founder of the company. “And so did my PhD, and so did many other positive things in my life. You can trace their paths back to interactions with the foundation and other fellows and friends of fellows.”
In 2010, physicists John Frank and Dan Roberts sat down next to each other on a bus. Frank was a 1998 Hertz Fellow and graduate student at Massachusetts Institute of Technology. Roberts was a 2009 Hertz Fellow and was midway through two years as a Marshall Scholar, studying theoretical physics in the UK. That July, they were both on their way to a Hertz Foundation Summer Workshop in Virginia.
They quickly found common interests in technology and entrepreneurship. Frank had just sold his first startup, MetaCarta, which had developed text mining software to link documents to geographic maps. Like many Hertz Fellows, Roberts was excited about startups as a platform for innovation that often moves faster than academia and large laboratories. The two naturally started talking about what might come next on the frontiers of artificial intelligence.
That encounter between Frank and Roberts could have ended with that conversation. Fortunately, a year later, the pair crossed paths once again, at the next annual Hertz Summer Workshop. This time, Frank, Roberts and a new Hertz Fellow — Max Kleiman-Weiner, who studied computational cognitive science at MIT — were grouped together to brainstorm a new technology. All three were interested in artificial intelligence, and they began to wonder how AI might predict which information would be added to a human-powered knowledge base like Wikipedia. If successful, the AI would be capable of automatically distilling the deluge of data being produced on the web into a format accessible to people — effectively proposing high-quality edits to Wikipedia.
“Assigning people to brainstorm groups was really just meant as an icebreaker. It was a fun way for fellows to start interacting even before the retreat,” recalls Roberts.
“Each brainstorm group was supposed to make a draft of a future TED talk. Making that talk took on a life of its own,” said Kleiman-Weiner.
Frank, Roberts and Kleiman-Weiner wrote code to digest Wikipedia edit history and determine how long it takes for a news article to be cited in a Wikipedia article. They found that the average time lag between a news article’s initial publication and its eventual use as a citation in Wikipedia is about one year. However, some articles get updated very quickly (in minutes or hours) and others lay dormant for years, waiting for a human editor to take interest. When an editor decides to update an article, they often search retrospectively to find missing citations. As a result, gaps in the knowledge base get filled haphazardly after the fact.
“The Hertz Foundation has this ability to create relationships between amazing people that spur things like this.”
Vice President of Collaborative AI, Salesforce
Growing an Idea
The trio of researchers kept chatting after the 2011 workshop. Their quick project on Wikipedia had sparked an idea: Could they devise an AI system to accelerate the integration of new information into Wikipedia?
The challenge, they found, was that for certain types of information, humans were much better than current algorithms at determining the relevance of an article or webpage. Frank, for instance, would get Google Alerts for content that mentioned his name — “John R. Frank” — and weren’t about him. While he would immediately recognize which references were about him, Google did not.
Beyond that initial task of disambiguating references to a named entity, a simple email alert or search result might be on topic but have no new information. Traditional keyword searches allow the user to express what they want to find. However, what if a user wants to learn about something for which they don’t yet know the keywords?
“In studying these trade-offs, it’s clear that humans have essential roles to play in judging and ascribing value to pieces of knowledge,” explains Roberts. “The question is, how should AI help?”
To dig into this question, the trio proposed a new competition to the US government’s Text Retrieval Conference, or TREC, which is hosted each year by the National Institute of Standards and Technology.
“Ever since I had first learned about TREC, I had wanted to organize a TREC track, and this kind of hard problem in human-machine collaboration was a perfect opportunity,” explains Frank. They named the competition “TREC Knowledge Base Acceleration” and it ran for three years, from 2012 to 2014.
“That’s how Diffeo got off the ground. We started the company in order to gather a corpus of billions of news articles and blog posts with time stamps of when they were published,” says Kleiman-Weiner. The dataset is still the largest corpus with time stamped natural language documents publicly released for evaluating algorithms (see https://trec.nist.gov/data/kba.html). By launching TREC KBA, the Diffeo team established a test bed for figuring out how machines can accelerate the limited number of editors who try to keep massive knowledge bases up to date.
In the first year of TREC, “we basically showed that all the algorithms were worse than random — they actually wasted the users’ time because they recommended so many irrelevant links. It’s like making a Google Alert for too broad of a concept, and then weeding through every hit,” explains Frank. However, by developing quantitative measures for comparing the different approaches, the evaluation program illuminated several paths forward.
As a result of the public TREC KBA competition and corpus, a burgeoning community of researchers emerged. Many computer scientists had been interested in automatic knowledge base population, so the opportunity to test those ideas in a human-in-the-loop context was attractive. One critical improvement came from realizing that human readers have a deep understanding of the relationships between people, organizations, places and ideas — or in the case of Wikipedia, the links between the articles in a knowledge graph. Researchers found that a user can quickly digest new content if it’s presented as a suggested connection between people. When the AI presents snippets of text that explain the nature of the relationship, people’s comprehension increases even more.
A second key innovation occurred through a fortunate coding bug. “I built a simple baseline algorithm as a test case,” explains Roberts. “It scored well but mostly didn’t match the target article. Something seemed broken. Then I realized that there was a sign error. It had inverted the ranking of a large batch of top-ranked results.” This pulled hits out of the long-tail up to the top. The team realized that boosting missing information would help users, and Diffeo’s entity-centric novelty ranking was born.
This became a core aspect of the collaborative intelligence product that Diffeo then took to market. By 2015, this approach was scoring well on the evaluation methods developed by KBA. It surprised the user with a useful new relationship in roughly one out of every three recommendations. By recommending related people and companies that are not yet in your knowledge base, the AI accelerates your work.
“Diffeo came directly out of the Hertz Foundation. And so did my PhD and many other positive things in my life.”
Principal Researcher, Salesforce
The Flexibility to Follow Ideas
Funded by Hertz Fellowships, Roberts, Frank and Kleiman-Weiner each had the flexibility to follow their interests, rather than being tied to a strict timeline for their graduate degrees. Ultimately, the results of TREC KBA helped them develop a prototype for a product and launch their startup, Diffeo.
To lead the company, Frank put his PhD work on hold — something he had already done once when he founded MetaCarta and was comfortable doing again. While he was passionate about his graduate work, he saw these business ventures as once-in-a-lifetime opportunities. Thanks to Hertz, he says, he didn’t feel pressured to stay in school full time.
Roberts and Kleiman-Weiner found ways to balance their graduate work and Diffeo — and they also credit Hertz for allowing that to happen.
“The opportunity that we had to pursue cutting-edge ideas in both academia and startups was certainly only possible because of the support of Hertz,” says Roberts.
“Hertz encouraged us to push the limits, so we did,” says Frank. Diffeo stayed in “science mode” longer than most startups can tolerate, and that paid off. While Diffeo’s recommender engine uses some of the same algorithmic ideas that social media platforms use to drive clicks, Diffeo stayed focused on products that will accelerate human knowledge. As a collaborative research tool, it uses the click feedback loop to help users understand their world better.
By 2015, the team had a working product, and over the following few years, Diffeo expanded, raised seed funds and gathered clients with a wide range of research analysts. “Hertz was crucial throughout all these steps,” says Frank. “Members of the Hertz community were investing in our company and making connections for us.” The team won the Newman Entrepreneurial Award, the NGA Disparate Data Challenge and the MassChallenge FinTech 2019 Diamond Prize, along with several other awards.
In 2016, Diffeo acquired Meta and entered the financial services marketplace. Jason Briggs, the CEO of Meta, became the COO at Diffeo. Interest from Wall Street surged.
Turning Over a New Leaf
In 2017, Diffeo joined the Salesforce Incubator — a five-month program for startup companies — with the goal of growing their businesses and building new technologies. Ultimately, in 2019, Salesforce acquired the 22-person company, and Roberts, Frank, Briggs and others from Diffeo joined Salesforce. Diffeo’s collaborative intelligence product is now an offering on the salesforce.com platform called “Einstein Relationship Insights” that helps people understand relationships that drive their business.
As a researcher at Salesforce, Roberts authored a book, “The Principles of Deep Learning Theory,” published by Cambridge University Press, that applies effective theory techniques from theoretical physics to understanding deep learning.
Kleiman-Weiner founded a new company, Common Sense Machines, based on some of the research he started during his PhD. Common Sense Machines is creating AI that learns to create 3-D models of the world and uses them to enable autonomous systems to perceive, track, and reason about everyday objects, spaces and agents. “My experience as a founder of Diffeo and the support and mentorship of the broader Hertz community gave me the confidence to launch a new venture with a big mission,” says Kleiman-Weiner.
While supporting Salesforce National Security as vice president of collaborative AI, Frank is finishing his Ph.D. after two decades. “Luckily, my thesis adviser has been very patient,” he says. “Thanks to Hertz, I’ve had a unique path.”
He says his interests have recently been veering toward geoengineering and forestry. As always, he’s trying to surround himself with creative people — and showing up, in person, to let serendipity happen. Recently, he’s been exploring habitat restoration projects on the Big Island of Hawaii and in Nova Scotia.
“If I hadn’t showed up at that retreat in Virginia, Diffeo wouldn’t have happened,” he says. “The Hertz Foundation has this ability to create relationships between amazing people that spur things like this.”
About the Fannie and John Hertz Foundation
The Fannie and John Hertz Foundation identifies the nation’s most promising innovators in science and technology and empowers them to pursue solutions to our toughest challenges. Launched in 1963, the Hertz Fellowship is the most exclusive fellowship program in the United States, fueling more than 1,200 leaders, disruptors and creators who apply their remarkable talents where they’re needed most — from the future of health care to the future health of our environment. Hertz Fellows hold 3,000+ patents, have founded 375+ companies, and have received 200+ major national and international awards, including two Nobel Prizes, eight Breakthrough Prizes, the National Medal of Technology, the Fields Medal and the Turing Award. Learn more at HertzFoundation.org.
Hertz Fellows John Frank, Dan Roberts and Max Kleiman-Weiner cofounded Diffeo, an AI start up company later acquired by Salesforce, after meeting at the Hertz Summer Workshop.