Contrary Research Rundown #85

AI's Hunger For Data Continues, plus new memos on Chime, The Browser Company, and more

May 11, 2024

On May 22nd, we're hosting our next Tech Talk with Ramp, Hex, Sourcegraph, Semgrep, Onehouse, and Weaviate in San Francisco. The evening will feature eng demos and conversations with senior leaders from the featured companies, followed by food and drinks. Register for the chance to join!

Research Rundown

The cycle of breakneck AI news is just spinning faster and faster. Everyone just close your eyes, hold your breath, and dive in. Because this week is no exception. As usual, OpenAI is at the center of a lot of the attention. From a ChatGPT feature that can search the web and cite sources, or maybe a Google search competitor, or maybe… none of those things? It’s unclear.

But one thing that was clear is that OpenAI’s relationship with data and copyright really spilled out into the open this week. First up, Stack Overflow and OpenAI announced a partnership for OpenAI to get access to the company’s data. We’ve seen similar partnerships between the likes of Reddit and Google after companies with data pushed to get compensated for their platforms being used as training data for LLMs.

Stack Overflows users didn’t react positively to the news, with many of them deleting their contributions to the site en masse. And Stack Overflow responded by banning those users. As Gergely Orosz points out:

“The reality is that now StackOverflow and Reddit are both sites where they are free to use to provide answers because you are providing training data for AI models with every one of your answers (and past ones). The only way to opt out is to stop contributing to them.”

Stack Overflow wasn’t the only data set in OpenAI’s crosshairs. In a class action lawsuit brought against OpenAI by the Author’s Guild, it was alleged that OpenAI had used 100K+ copyrighted books in training its models, and then deleted the datasets. That’s just one of the many lawsuits from artists, writers, and publishers that are pushing back on their work being used in training data.

OpenAI is attempting to respond by announcing a tool called “Media Manager,” which would “allow ‘creators and content owners to tell [OpenAI] what they own’ and specify ‘how they want their works to be included or excluded from machine learning research and training.’” But details are sparse, and the execution seems complicated. Even companies like Reddit, who have made access deals, are attempting to introduce public content policies to better regulate how the data on its platform is being used.

Ironically, OpenAI is the pot calling the kettle black on the other side of copyright law. The company recently made a copyright complaint against the ChatGPT subreddit for using the OpenAI logo.

The unfortunate reality is that the same copyright battle will continue to play out across writing, art, music, and more, because what we thought was the unstoppable force of the open internet is now running into an immovable object of the exorbitant demand for data that AI companies have. And that demand is only going up as we see ever more competition in the space. Even Microsoft, the sugar daddy of OpenAI, is working on its own in-house LLM called MAI-1, which is explicitly meant to compete with OpenAI.

Arc is the core product of The Browser Company which attempts to reinvent the browser experience, rethinking its core purpose, base primitives, and fundamental usage patterns. To learn more, read our full memo here and check out some open roles below:

Staff Software Engineer, Infrastructure - Remote, North America
Software Engineer, Swift - Remote, North America

Crux aims to enable buyers and sellers to exchange tax credits efficiently and transparently in a marketplace that will drive financing for current and future generations of renewable energy and infrastructure projects. To learn more, read our full memo here and check out some open roles below:

Customer Success - Remote

Chime is a neobank that provides consumer banking services through its mobile app. It started as a mobile banking app that offered no-fee bank accounts, debit cards, and a way to get paychecks two days in advance. To learn more, read our full memo here and check out some open roles below:

Business Intelligence Engineer - San Francisco, CA
Backend Engineer - San Francisco, CA

Check out some standout roles from this week.

OneSchema | San Francisco, CA - Software Engineer
Railway App | Remote - Product Designer, Senior Full-Stack Engineer - Product
Power | Remote - Product Designer

The neobank for startups, Mercury, has announced its expansion into new products, like bill pay and spend management. Not only competing with established companies like Ramp and Brex, this also puts Mercury in competition with fellow recent entrants into the space, like Rippling.
Instacart announced a partnership with Uber Eats to enable restaurant delivery alongside its grocery offerings. Some predictions pointed to the possible advantage this could offer to Instacart by increasing traffic on the company’s app, which would lift advertising revenue.
Fitness wearable WHOOP was featured in a post outlining the company’s story over the course of 12 years, and the competition with products like the Apple Watch.
Databricks has announced the launch of its new Vector Search product, which is “a vector database that is built into the Databricks Data Intelligence Platform,” competing with other vector database companies, like Pinecone.
Wiz recently announced it had raised $1 billion of additional funding at a $12 billion valuation, marking the most funding a cybersecurity company has ever rasied.
Tesla released a new demo for Optimus, a neural network trained to sort battery cells, further advancing the company’s robotics stack.
In TikTok’s ongoing legal battle with the U.S. government, some have pointed out that the Chinese company’s legal filings undermined many of its arguments, including the company’s independence from Beijing and any engineering capabilities to make the app more independent.
It’s been a big week for biotechnology. From breakthrough gene therapy to restore hearing, to the world’s first eye transplant. On top of that, we saw a $1 billion seed round for a new AI biotech company, and AI-generated CRISPR proteins.

At Contrary Research, our vision is to become the best starting place to understand private tech companies. We can't do it alone, nor would we want to. We focus on bringing together a variety of different perspectives.

That's why we're opening applications for our Research Fellowship. In the past, we've worked with software engineers, product managers, investors, and more. If you're interested in researching and writing about tech companies, apply here!

Contrary Research Rundown #85

AI's Hunger For Data Continues, plus new memos on Chime, The Browser Company, and more

Research Rundown

Discussion about this post