<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet href="/styles.xsl" type="text/xsl"?>
<rss version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:podcast="https://podcastindex.org/namespace/1.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Epoch AI Narrations</title>
    <language>en-gb</language>
    <copyright>© 2026 All rights reserved</copyright>
    <itunes:author>Epoch AI</itunes:author>
    <itunes:type>episodic</itunes:type>
    <itunes:explicit>false</itunes:explicit>
    <podcast:locked owner="infra@epoch.ai">yes</podcast:locked>
    <description>Audio narrations of Epoch AI's research into the driving forces, progress, and impacts of artificial intelligence. Episodes are AI-narrated versions of our written publications, available at epoch.ai.</description>
    <image>
      <url>https://files.type3.audio/clients/epoch/cover.jpg?v=2</url>
      <title>Epoch AI Narrations</title>
      <link>https://epoch.ai</link>
    </image>
    <item>
      <title>“AI doesn’t get better at this board game with practice” by Benjamin Ou, Greg Burnham</title>
      <description>&lt;p&gt; Subtitle: Our latest benchmark suggests AI struggles to learn from experience. &lt;/p&gt;  &lt;p&gt; Can AI systems improve at challenging tasks on the fly, performing them over and over and learning from mistakes? It's one of the biggest open questions in AI capabilities right now, with large economic and safety implications. Our latest benchmark, EBR-bench, tests AI systems for this ability by having them play Earthborne Rangers, a complex board game, repeatedly. So far, we see little evidence of AI learning from experience. With EBR-bench as part of our benchmarking suite, we have a new tool for detecting if and when that changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Learning to play games is a proxy for important capabilities&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; An AI system that could pick up unfamiliar tasks on the fly would be much more capable than we’re used to. Even if it didn’t perform well out of the box on some economically relevant task, it could still learn “on the job”. It would also be harder to determine whether it had dangerous capabilities prior to release, since it could gain such capabilities through learning. We think learning to play games is a reasonable proxy for these more impactful kinds of learning. Whether it's a [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:53) Learning to play games is a proxy for important capabilities&lt;/p&gt;&lt;p&gt;(02:41) AI doesn't improve at EBR with repeated play&lt;/p&gt;&lt;p&gt;(04:20) Agents manage tactical execution poorly&lt;/p&gt;&lt;p&gt;(06:38) Agents underexplore strategic options&lt;/p&gt;&lt;p&gt;(07:55) Agents struggle even when given an explicit strategy guide&lt;/p&gt;&lt;p&gt;(09:18) Elicitation gaps may remain&lt;/p&gt;&lt;p&gt;(10:04) Out-of-distribution generalization remains limited--for now&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 7 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          July 1st, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/earthborne-rangers-benchmark?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/earthborne-rangers-benchmark&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/earthborne-rangers-benchmark/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/earthborne-rangers-benchmark/figure-1.png" alt="Line chart titled 'AI systems show no improvement upon repeated play of Earthborne Rangers'. It shows scores out of 21 across 30 playthroughs for Gemini 3.1 Pro, Claude Opus 4.8, and GPT-5.5, all far below the expert human baseline of 20." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/earthborne-rangers-benchmark/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/earthborne-rangers-benchmark/figure-3.png" alt="Line chart of fatigue per round across 30 playthroughs for Gemini 3.1 Pro, GPT-5.5, and Opus 4.8, with reference lines for random and expert human performance. Most points fall below the random line and above the expert human line." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/earthborne-rangers-benchmark/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/earthborne-rangers-benchmark/figure-4.png" alt="Horizontal stacked bar chart showing deck archetype distribution across AI models' explore playthroughs, with top deck percentages ranging from 50% to 100%. Each shade represents a deck archetype." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/earthborne-rangers-benchmark/figure-5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/earthborne-rangers-benchmark/figure-5.png" alt="Bar chart comparing scores out of 21 for Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro under basic and 'with strategy guide' settings, all well below the expert human line at 20, but with the strategy guide setting a few points higher." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 02 Jul 2026 15:50:37 GMT</pubDate>
      <guid isPermaLink="false">06492eaa-3fc0-4fc6-9159-5256055f74a6</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/06492eaa-3fc0-4fc6-9159-5256055f74a6.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Benjamin%2520Ou%252C%2520Greg%2520Burnham&amp;title=%22AI%20doesn%E2%80%99t%20get%20better%20at%20this%20board%20game%20with%20practice%22%20by%20Benjamin%20Ou%2C%20Greg%20Burnham&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fearthborne-rangers-benchmark&amp;created_at=2026-07-02T15%3A50%3A19.822369%2B00%3A00&amp;duration=708" length="8496288" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/earthborne-rangers-benchmark</link>
      <itunes:duration>708</itunes:duration>
    </item>
    <item>
      <title>“What we learned from 1,604 Chinese AI job postings” by Cheryl Wu, Jean-Stanislas Denain, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: Inferring Chinese AI labs’ strategies from their job descriptions. &lt;/p&gt;  &lt;p&gt; What's going on inside Chinese AI companies like Alibaba and DeepSeek? Western observers usually answer this question in two ways: (1) read through their technical papers, and (2) voraciously consume news reports and follow everything about Chinese AI on X. But there's a third approach that people have rarely explored: scrape Chinese AI job postings.&lt;/p&gt;
&lt;p&gt; Chinese labs need to hire the right people, so in their job descriptions they need to reveal what skills or expertise they’re looking for. These give us direct clues into what constraints they face and what they hope to build.&lt;/p&gt;
&lt;p&gt; So like we previously did for Western labs, we scoured over 1,600 job postings across six of the most notable Chinese AI companies: DeepSeek, MiniMax, Moonshot, Z.ai, ByteDance, and Alibaba. Here's what we found.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; Chinese AI labs still rely on Nvidia, but they’re exploring domestic alternatives&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Many people care about whether Chinese companies still use Nvidia because it means they still depend on “Western” AI infrastructure. And at least for now, that seems true.&lt;/p&gt;&lt;p&gt; Consider ByteDance. One of its open roles is called “Inference GPU Performance Optimization Expert”. Whoever [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:15) Chinese AI labs still rely on Nvidia, but they're exploring domestic alternatives&lt;/p&gt;&lt;p&gt;(03:57) Chinese startups are renting domestic cloud compute, and building data centers too&lt;/p&gt;&lt;p&gt;(05:50) Chinese AI startups have pretty varied commercial strategies&lt;/p&gt;&lt;p&gt;(08:08) Startups stay model-centric; platform companies make a wider range of research bets&lt;/p&gt;&lt;p&gt;(09:50) Job postings are more spread out than in the US&lt;/p&gt;&lt;p&gt;(11:07) Chinese AI jobs require less prior experience&lt;/p&gt;&lt;p&gt;(12:41) The complex reality of Chinese AI firms&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 7 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 24th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/what-we-learned-from-1604-chinese-ai-job-postings?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/what-we-learned-from-1604-chinese-ai-job-postings&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/deepseek_data_center.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/deepseek_data_center.png" alt="DeepSeek job posting for a data center operations role based in Inner Mongolia." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/gtm_composition.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/gtm_composition.png" alt="Composition of go-to-market job postings by company, comparing B2B sales versus marketing roles across Z.ai, MiniMax, and Moonshot." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/business_development_america.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/business_development_america.png" alt="Z.ai job posting for US market business development, targeting Fortune Global 500 companies." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/deepseek_model_harness_agent.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/deepseek_model_harness_agent.png" alt="DeepSeek 'Agent Harness R&amp;amp;D Engineer' job description, written largely in English." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/china_postings_map.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/china_postings_map.png" alt="Bubble map of mainland-Chinese cities sized by number of AI job postings, with Beijing, Hangzhou, and Shanghai as the largest hubs." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/hacky_stated_experience_graph.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/hacky_stated_experience_graph.png" alt="Bar chart of the mean minimum years of prior work experience required by firm, with US labs far higher than Chinese labs." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/us_law_job_ads.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/what-we-learned-from-1604-chinese-ai-job-postings/us_law_job_ads.png" alt="US Equal Employment Opportunity guidance indicating that screening applicants by years of experience can raise legal concerns." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 24 Jun 2026 23:49:31 GMT</pubDate>
      <guid isPermaLink="false">6ef14eb5-4721-4956-8c90-e187313531c0</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/6ef14eb5-4721-4956-8c90-e187313531c0.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Cheryl%2520Wu%252C%2520Jean-Stanislas%2520Denain%252C%2520Anson%2520Ho&amp;title=%22What%20we%20learned%20from%201%2C604%20Chinese%20AI%20job%20postings%22%20by%20Cheryl%20Wu%2C%20Jean-Stanislas%20Denain%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fwhat-we-learned-from-1604-chinese-ai-job-postings&amp;created_at=2026-06-24T23%3A49%3A18.147087%2B00%3A00&amp;duration=878" length="10530432" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/what-we-learned-from-1604-chinese-ai-job-postings</link>
      <itunes:duration>878</itunes:duration>
    </item>
    <item>
      <title>“Toward an O*NET for AI R&amp;D” by Jean-Stanislas Denain, Joe Kwon, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: Proposing a new way to track AI research automation. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt; What trends are we extrapolating?&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; A common way that experts forecast AI timelines is so simple it's hard to believe: trend extrapolation. Sure they also use numerical models that bake in things like runaway feedback loops, but the bread and butter of AI forecasting is to draw a line on a graph and extend it as far as you dare. Somehow this works well enough to be a state-of-the-art approach. However, the trends they extrapolate share a common weakness: they lean heavily on easy-to-measure things, not what we directly care about — how close AI is to doing AI research itself.&lt;/p&gt;&lt;p&gt; Many experts want to know when we’ll fully automate AI research, because this would massively speed up AI progress, kicking off an “intelligence explosion”.1 If that's right, it's hugely important to know how close we are. But historically, there haven’t been many points of direct evidence to point to, because full automation of AI R&amp;amp;D has been so hard to measure. Instead, researchers have been forced to rely on proxies.&lt;/p&gt;&lt;p&gt; One such proxy is in key AI inputs like compute, data, and energy. Take Situational Awareness [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:19) What trends are we extrapolating?&lt;/p&gt;&lt;p&gt;(04:14) An O\*NET for AI R&amp;amp;D&lt;/p&gt;&lt;p&gt;(08:28) A first proposal&lt;/p&gt;&lt;p&gt;(11:59) What's next?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 2 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 17th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/toward-an-onet-for-ai-rnd?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/toward-an-onet-for-ai-rnd&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/toward-an-onet-for-ai-rnd/situational_awareness.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/toward-an-onet-for-ai-rnd/situational_awareness.png" alt="Graph showing effective compute over time, extrapolated until the subjective threshold of an automated AI researcher or engineer." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/toward-an-onet-for-ai-rnd/superhuman_coder_extrapolation.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/toward-an-onet-for-ai-rnd/superhuman_coder_extrapolation.png" alt="Probability density of a superhuman coder at various points in time, based on AI 2027's time horizon extension methodology." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/toward-an-onet-for-ai-rnd/onet_task_samples.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/toward-an-onet-for-ai-rnd/onet_task_samples.png" alt="List of 15 tasks from the O*NET occupation 'Computer and Information Research Scientists'." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/toward-an-onet-for-ai-rnd/o-net.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/toward-an-onet-for-ai-rnd/o-net.png" alt="Diagram showing all six primary categories of the O*NET for AI R&amp;amp;D." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/toward-an-onet-for-ai-rnd/aird_onet_sample.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/toward-an-onet-for-ai-rnd/aird_onet_sample.png" alt="Example tasks in the O*NET for AI R&amp;amp;D." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/toward-an-onet-for-ai-rnd/expectation_reality.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/toward-an-onet-for-ai-rnd/expectation_reality.png" alt="Diagram showing how the pre vs post-training split is overly simplified." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 18 Jun 2026 07:32:19 GMT</pubDate>
      <guid isPermaLink="false">c55c19ae-5a3f-4e77-b8f9-279a8ec8ae75</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/c55c19ae-5a3f-4e77-b8f9-279a8ec8ae75.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jean-Stanislas%2520Denain%252C%2520Joe%2520Kwon%252C%2520Anson%2520Ho&amp;title=%22Toward%20an%20O*NET%20for%20AI%20R%26D%22%20by%20Jean-Stanislas%20Denain%2C%20Joe%20Kwon%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Ftoward-an-onet-for-ai-rnd&amp;created_at=2026-06-18T07%3A32%3A26.106094%2B00%3A00&amp;duration=887" length="10643328" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/toward-an-onet-for-ai-rnd</link>
      <itunes:duration>887</itunes:duration>
    </item>
    <item>
      <title>“Are Mythos’ cyber capabilities overhyped?” by Timothée Chauvin, Alexander Barry, Jean-Stanislas Denain, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: Compiling all the public evidence on Mythos Preview's cyber abilities. &lt;/p&gt;  &lt;p&gt; If what Anthropic says is true, then the Claude Mythos family is a massive leap forward in AI's cyber capabilities. When they announced Mythos Preview, they considered it so dangerous that they had to launch a $100+ million initiative to “secure the world's most critical software”. Then on Tuesday, they one-upped themselves by releasing Claude Mythos 5, which improves modestly on cyber benchmarks.1&lt;/p&gt;
&lt;p&gt; But skeptics have argued that Anthropic was exaggerating — or at least, people should chill out about Mythos. For instance, some people have pointed out that GPT-5.5 is on par with Mythos Preview on a range of cyber benchmarks, and yet its launch didn’t lead to a cyber catastrophe.&lt;/p&gt;
&lt;p&gt; So is Mythos actually a big leap for cyber capabilities? To figure this out, we looked at all the public evidence we could get our hands on. Most of this evidence applies to Mythos Preview, but the conclusions should hold for Mythos 5 too. This post describes what we found.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; Discovering vs exploiting code vulnerabilities&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; To start off, let's take a closer look at what Anthropic actually claimed when they released [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:21) Discovering vs exploiting code vulnerabilities&lt;/p&gt;&lt;p&gt;(02:49) Mythos Preview was a major advance in exploit development&lt;/p&gt;&lt;p&gt;(05:40) It's unclear how large of a practical advance Mythos Preview is in vulnerability discovery&lt;/p&gt;&lt;p&gt;(10:34) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 9 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 11th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/are-mythos-cyber-capabilities-overhyped?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/are-mythos-cyber-capabilities-overhyped&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/are-mythos-cyber-capabilities-overhyped/cyber-eci-vs-date.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/are-mythos-cyber-capabilities-overhyped/cyber-eci-vs-date.png" alt="A graph showing a cyber-domain Epoch Capabilities Index over time, where Mythos Preview is around 7 months ahead of trend." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/are-mythos-cyber-capabilities-overhyped/cve-explorer.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/are-mythos-cyber-capabilities-overhyped/cve-explorer.png" alt="A graph showing code vulnerabilities found over time across 21 notable organizations. There is a major spike in the graph that coincides with Mythos Preview's release." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 12 Jun 2026 07:05:57 GMT</pubDate>
      <guid isPermaLink="false">e96c1eca-82eb-4100-a399-06b5b43623d9</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/e96c1eca-82eb-4100-a399-06b5b43623d9.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Timoth%25C3%25A9e%2520Chauvin%252C%2520Alexander%2520Barry%252C%2520Jean-Stanislas%2520Denain%252C%2520Anson%2520Ho&amp;title=%22Are%20Mythos%E2%80%99%20cyber%20capabilities%20overhyped%3F%22%20by%20Timoth%C3%A9e%20Chauvin%2C%20Alexander%20Barry%2C%20Jean-Stanislas%20Denain%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fare-mythos-cyber-capabilities-overhyped&amp;created_at=2026-06-12T07%3A05%3A45.172519%2B00%3A00&amp;duration=737" length="8841600" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/are-mythos-cyber-capabilities-overhyped</link>
      <itunes:duration>737</itunes:duration>
    </item>
    <item>
      <title>“Controlling the capital after AGI” by Phil Trammell, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: A simple taxonomy of the main proposals for post-AGI universal redistribution.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; AGI1 might generate immense economic output, but it could take many people's jobs in the process and leave them with no way to earn a decent living. Those with little savings during the “AGI transition” would then be unable to support themselves on the other side. Less drastically, even if many well-paying jobs remain after AGI, the capital share2 may greatly increase, which would tend to greatly increase inequality.&lt;/p&gt;&lt;p&gt; If this happens, how might the gains be redistributed? More concretely, putting aside the question of how the state raises tax revenues after AGI, and what percentage of GDP is raised, how do existing proposals for redistributing this revenue differ?&lt;/p&gt;&lt;p&gt; Proposals for universal benefits abound, including:&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; 
&lt;p&gt; Universal basic income (UBI): The government pays everyone cash. This is the best known, and has been endorsed by Elon Musk, Vinod Khosla, Geoffrey Hinton, and many others.3 As part of this, the government might impose restrictions on the extent to which people could borrow against their future payments, just as it is illegal today to borrow against your social security, to prevent people from impoverishing themselves in [...]&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:18) Introduction&lt;/p&gt;&lt;p&gt;(02:37) The main axis: control of the capital&lt;/p&gt;&lt;p&gt;(05:26) Why care who controls the capital?&lt;/p&gt;&lt;p&gt;(08:32) Why have the state give people control of capital, instead of letting people buy it themselves?&lt;/p&gt;&lt;p&gt;(11:41) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 4 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 9th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/controlling-the-capital-after-agi?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/controlling-the-capital-after-agi&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/controlling-the-capital-after-agi/wealth-distribution-v2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/controlling-the-capital-after-agi/wealth-distribution-v2.png" alt="A vertical spectrum of wealth distribution proposals ordered by how much control over capital they give individuals, from 'No control over capital' at the top to 'Strong control over capital' at the bottom: philanthropic giving, universal basic income, sovereign wealth funds, universal basic capital, and universal basic capital plus kill switches." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 10 Jun 2026 05:52:49 GMT</pubDate>
      <guid isPermaLink="false">2b1da414-a76a-4cb0-b7b9-88f15b1112c0</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/2b1da414-a76a-4cb0-b7b9-88f15b1112c0.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Phil%2520Trammell%252C%2520Anson%2520Ho&amp;title=%22Controlling%20the%20capital%20after%20AGI%22%20by%20Phil%20Trammell%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fcontrolling-the-capital-after-agi&amp;created_at=2026-06-10T05%3A53%3A25.201703%2B00%3A00&amp;duration=803" length="9629856" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/controlling-the-capital-after-agi</link>
      <itunes:duration>803</itunes:duration>
    </item>
    <item>
      <title>“Is a compute crunch coming?” by Luke Emberson, Jaime Sevilla</title>
      <description>&lt;p&gt; Subtitle: We estimated trends in global inference capacity and found that token demand appears to be growing much faster than supply.&lt;/p&gt;  &lt;p&gt; Much has been made about AI-driven capex in the past year. Hyperscalers have been clamoring to construct massive data centers, spending hundreds of billions in the process. The St. Louis Fed estimates that AI-related investment contributed about 1 percentage point — almost 40% of the total — to US real GDP growth in the first three quarters of 2025, exceeding the IT investment contribution at the height of the dot-com boom. Whether the current AI buildout constitutes a bubble depends largely on whether there will be sufficient demand for the computing infrastructure being built.&lt;/p&gt;
&lt;p&gt; It's tough to estimate future demand for tokens, as it depends heavily on hard-to-forecast trends in capabilities and diffusion. However, we have a much more concrete picture of the supply side. In this article, we do our best to answer how many tokens per second the world could produce with the chips we have today.&lt;/p&gt;
&lt;p&gt; To do this, we dig into the technical details of inference. We model prefill and decode runtimes, account for two common efficiency techniques (chunked prefill and [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:30) Introducing our setting&lt;/p&gt;&lt;p&gt;(05:13) Inference settings&lt;/p&gt;&lt;p&gt;(05:17) Hardware specs&lt;/p&gt;&lt;p&gt;(05:27) What happens during inference?&lt;/p&gt;&lt;p&gt;(06:59) Prefill&lt;/p&gt;&lt;p&gt;(09:46) Decode&lt;/p&gt;&lt;p&gt;(13:48) Chunked prefill&lt;/p&gt;&lt;p&gt;(16:21) Speculative decoding&lt;/p&gt;&lt;p&gt;(18:24) Calibrating against inference benchmarks&lt;/p&gt;&lt;p&gt;(21:18) The present and future of inference&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 17 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 25th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/is-a-compute-crunch-coming?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/is-a-compute-crunch-coming&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/is-a-compute-crunch-coming/prefill-and-decode-times.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/is-a-compute-crunch-coming/prefill-and-decode-times.png" alt="Two line charts showing how Kimi-K2.6’s compute and bandwidth times scale with batch size, for each of prefill and decode. Compute dominates in prefill, while bandwidth dominates during decoding." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/is-a-compute-crunch-coming/naive-vs-chunked-prefill.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/is-a-compute-crunch-coming/naive-vs-chunked-prefill.png" alt="Diagram comparing naive inference versus chunked prefill approaches for AI inference." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/is-a-compute-crunch-coming/uncalibrated-vs-calibrated-throughput-predictions.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/is-a-compute-crunch-coming/uncalibrated-vs-calibrated-throughput-predictions.png" alt="Two scatter plots comparing theoretical estimates versus calibrated GPU throughput predictions. The theoretical estimates are overly optimistic, but calibration results in a reasonably tight and unbiased fit." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/is-a-compute-crunch-coming/llm-inference-supply.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/is-a-compute-crunch-coming/llm-inference-supply.png" alt="Line chart showing projected growth in Kimi K2.6 token supply on the world’s Blackwell chips. The plot covers 2026 to 2032 across three ISL configurations, and predicts that throughput at each ISL will grow at 3.4x per year." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">163869e4-4e69-4be5-9d46-c356f7744187</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/163869e4-4e69-4be5-9d46-c356f7744187.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Luke%2520Emberson%252C%2520Jaime%2520Sevilla&amp;title=%22Is%20a%20compute%20crunch%20coming%3F%22%20by%20Luke%20Emberson%2C%20Jaime%20Sevilla&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fis-a-compute-crunch-coming&amp;created_at=2026-05-27T11%3A45%3A12.253236%2B00%3A00&amp;duration=1707" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/is-a-compute-crunch-coming</link>
      <itunes:duration>1707</itunes:duration>
    </item>
    <item>
      <title>“Frontier labs don’t use most AI compute (yet)” by Josh You</title>
      <description>&lt;p&gt; Subtitle: But Anthropic and OpenAI may rapidly grow their compute share in the next few years. After that, continued scaling would require an economic transformation.&lt;/p&gt;  &lt;p&gt; Disclaimer: the estimates of frontier developer compute discussed below are more tentative than our standard data work.&lt;/p&gt;
&lt;p&gt; OpenAI kicked off the AI boom when it launched ChatGPT in 2022. Frontier LLMs soon accrued hundreds of millions of users and billions in revenue, sparking a massive investment boom in AI compute infrastructure, with Nvidia's AI-related sales spiking more than fourfold in 2023. Global AI computing power has now grown to the equivalent of around 20 million Nvidia H100s, funded by hundreds of billions of dollars in annual capital expenditures.&lt;/p&gt;
&lt;p&gt; Yet while OpenAI launched the compute boom, they don’t dominate AI compute usage. I estimate that the compute OpenAI uses for research, training, and inference as of the end of 2025 made up around 10% to 15% of the world's operational AI compute supply, and this share was even smaller a year ago. Even after adding the other most well-resourced frontier developers — Anthropic, xAI, and the AI labs within Google and Meta — the combined total is probably still under half of [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:25) Most AI compute probably doesn't go to frontier AI&lt;/p&gt;&lt;p&gt;(06:28) Will Anthropic and OpenAI absorb the rest of global AI compute?&lt;/p&gt;&lt;p&gt;(11:39) What happens if frontier labs run out of headroom?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 20 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 20th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/frontier-labs-dont-use-most-ai-compute?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/frontier-labs-dont-use-most-ai-compute&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/frontier-labs-dont-use-most-ai-compute/frontier-compute.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/frontier-labs-dont-use-most-ai-compute/frontier-compute.png" alt="Bar chart showing AI compute distribution in H100-equivalents as of end-2025, with dedicated frontier labs like OpenAI and Anthropic representing a small fraction compared to Google, Meta, and rest of world." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/frontier-labs-dont-use-most-ai-compute/openai-line.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/frontier-labs-dont-use-most-ai-compute/openai-line.png" alt="Line chart showing projected growth of H100-equivalent chips from 2023 to 2025, comparing world total versus OpenAI's share on a logarithmic scale." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 20 May 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">c7b50a0e-b04a-4e93-9eeb-477e151ab309</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/c7b50a0e-b04a-4e93-9eeb-477e151ab309.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Josh%2520You&amp;title=%22Frontier%20labs%20don%E2%80%99t%20use%20most%20AI%20compute%20(yet)%22%20by%20Josh%20You&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Ffrontier-labs-dont-use-most-ai-compute&amp;created_at=2026-05-26T17%3A25%3A25.296167%2B00%3A00&amp;duration=911" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/frontier-labs-dont-use-most-ai-compute</link>
      <itunes:duration>911</itunes:duration>
    </item>
    <item>
      <title>“The economics of superstar AI researchers” by Anson Ho</title>
      <description>&lt;p&gt; Subtitle: What might explain AI researcher pay, and why it matters. &lt;/p&gt;  &lt;p&gt; AI is one of those fields where the best winds up much better off than the rest. Superstar researchers at frontier labs earn over ten times more than most of their colleagues, who earn measly million-dollar salaries. They might even earn over a hundred times more than your average AI postdoc:&lt;/p&gt;
&lt;p markdown="1"&gt;Ballpark estimates of AI researcher compensation. Postdoc compensation is estimated using NSF report data. For tenure-track professors, I anchor on this Taulbee 2024 survey of computer scientists. Compensation for frontier lab researchers is estimated from Levels.fyi for L4-L5 OpenAI researchers, and news reports for superstars.&lt;/p&gt;
&lt;p&gt; So why are the differences in pay so large? The naive explanation is that some researchers are just vastly superior. Perhaps the superstar researchers have excellent research taste in designing algorithms and experiments. Or they have a knack for pulling off “yolo runs” — training runs that implement many ambitious changes all at once, relying on deep intuition, whereas most people would need to systematically test the individual changes to make sure they work. Under this framing, superstars are the “10× researchers” that Silicon Valley so deeply reveres [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:38) The superstar effect&lt;/p&gt;&lt;p&gt;(04:31) Why this applies to AI&lt;/p&gt;&lt;p&gt;(05:34) Race dynamics amplify the effect&lt;/p&gt;&lt;p&gt;(06:24) Reality is complicated, and so is managing an army of AIs&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 7 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 13th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/economics-of-superstar-ai-researchers?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/economics-of-superstar-ai-researchers&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/economics-of-superstar-ai-researchers/ai_researcher_distribution-vert.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/economics-of-superstar-ai-researchers/ai_researcher_distribution-vert.png" alt="Bar chart showing annual AI researcher compensation on a logarithmic scale, ranging from around $50K for postdocs to over $30M for superstar researchers." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/economics-of-superstar-ai-researchers/artist_distribution.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/economics-of-superstar-ai-researchers/artist_distribution.png" alt="Horizontal bar chart showing estimated 2025 Spotify earnings for five artists, with Taylor Swift leading at approximately $65 million, followed by Lana Del Rey, Ed Sheeran, Charli XCX, and Blackpink." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/economics-of-superstar-ai-researchers/superstar_bars.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/economics-of-superstar-ai-researchers/superstar_bars.png" alt="Bar chart showing wage dispersion ratios for five occupations, with actors having the highest ratio at 4.2x and truck drivers the lowest at 1.4x." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 13 May 2026 19:52:55 GMT</pubDate>
      <guid isPermaLink="false">eede2edb-f466-4dfb-a4cb-5c429a44a48f</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/eede2edb-f466-4dfb-a4cb-5c429a44a48f.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho&amp;title=%22The%20economics%20of%20superstar%20AI%20researchers%22%20by%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Feconomics-of-superstar-ai-researchers&amp;created_at=2026-05-18T19%3A53%3A11.459674%2B00%3A00&amp;duration=564" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/economics-of-superstar-ai-researchers</link>
      <itunes:duration>564</itunes:duration>
    </item>
    <item>
      <title>“Introducing the AI Chip Components Explorer” by Venkat Somala</title>
      <description>&lt;p&gt; Subtitle: Our new AI Chip Components explorer tracks how much advanced-node logic, memory, and advanced packaging capacity is consumed by leading AI chip designers.&lt;/p&gt;  &lt;p&gt; AI compute capacity is growing exponentially. But as spending on AI chips climbs into the hundreds of billions, the semiconductor supply chain is increasingly strained. To help researchers, policymakers, and the public understand semiconductor inputs and production constraints, we are launching the AI Chip Components explorer.&lt;/p&gt;
&lt;p&gt; Building on our AI Chip Sales explorer, which tracks completed chips, the AI Chip Components explorer looks further up the supply chain at the components used to build chips. We estimate how much global chip component supply is consumed by the four leading US AI chip designers: Nvidia, AMD, Google, and Amazon. We further break down the consumption of components by chip type, designer, and quarter. Our scope is limited to the chip itself. We do not cover rack-level components or networking equipment, which are also significant inputs to AI infrastructure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; The explorer tracks three critical chip components&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Modern AI chips rely on several specialized inputs: advanced logic wafers that perform the core computation, high-bandwidth memory (HBM) that stores data and feeds it to the compute [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:19) The explorer tracks three critical chip components&lt;/p&gt;&lt;p&gt;(02:17) Packaging was the major bottleneck in late 2024 and early 2025&lt;/p&gt;&lt;p&gt;(03:44) Memory became the bottleneck in 2025&lt;/p&gt;&lt;p&gt;(05:22) Advanced logic was a softer constraint in 2024 and 2025&lt;/p&gt;&lt;p&gt;(06:46) Chip Component Spend More than Doubled from 2024 to 2025&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 8th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/introducing-the-ai-chip-components-explorer?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/introducing-the-ai-chip-components-explorer&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-components-explorer/image5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-components-explorer/image5.png" alt="Stacked bar chart showing quarterly consumption share of global CoWoS wafers from Q1 2024 to Q4 2025" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-components-explorer/image3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-components-explorer/image3.png" alt="Stacked bar chart showing global high-bandwidth memory consumption from Q1 2024 to Q4 2025, with Nvidia's share of total consumption increasing from roughly 45% to over 75%." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-components-explorer/image2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-components-explorer/image2.png" alt="Stacked bar chart showing the share of global advanced logic wafer consumption by major tech companies and other non-AI uses from Q1 2024 to Q4 2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-components-explorer/image4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-components-explorer/image4.png" alt="Stacked bar chart showing total annual component costs in USD for 2024 and 2025, broken down by memory, logic, packaging, and auxiliary components." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-components-explorer/image1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-components-explorer/image1.png" alt="Stacked bar chart showing the share of total component cost across memory, logic, packaging, and auxiliary components from Q1 2024 to Q4 2025, with memory comprising the largest share throughout the period." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 08 May 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">03a2dcaf-af39-443c-a8ec-cacad1f9746f</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/03a2dcaf-af39-443c-a8ec-cacad1f9746f.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Venkat%2520Somala&amp;title=%22Introducing%20the%20AI%20Chip%20Components%20Explorer%22%20by%20Venkat%20Somala&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fintroducing-the-ai-chip-components-explorer&amp;created_at=2026-05-18T15%3A39%3A29.7775%2B00%3A00&amp;duration=529" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/introducing-the-ai-chip-components-explorer</link>
      <itunes:duration>529</itunes:duration>
    </item>
    <item>
      <title>“RIP Classic Reasoning Benchmarks. What’s Next?” by Greg Burnham</title>
      <description>&lt;p&gt; Subtitle: Give up at least one of: text only, short time horizon, easy to grade, and expert human superiority.&lt;/p&gt;  &lt;p&gt; There's a familiar recipe for reasoning benchmarks: tasks are text-only, output is easy to grade, and expert humans can do the tasks in several hours. Unfortunately, this recipe is now obsolete. As an emblematic case, consider GPQA: a benchmark consisting of graduate-level science questions. It had remarkable staying power but by now it's clearly saturated.&lt;/p&gt;

&lt;p&gt; The same is true for many classical reasoning benchmarks, whether in science, math, or coding. What's next? I think the old recipe points to a new recipe. Just relax one of the elements: text only, easy to grade, short time horizon, and expert human superiority. I see each of these categories as extremely fruitful to pursue, and far from saturated. The tradeoff is just that it takes more time and money to create such benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; Keep the classic format, but make it multimodal&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; It's hard to say precisely, but to my eyes AI visual and spatial reasoning lags behind text-only reasoning. Still growing rapidly, but from a lower base. At any rate, it still seems comparably easy to create meaningful multimodal reasoning [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:24) Keep the classic format, but make it multimodal&lt;/p&gt;&lt;p&gt;(02:41) Keep the classic format, but push the time horizons&lt;/p&gt;&lt;p&gt;(04:38) Bite the bullet on hard-to-grade outputs&lt;/p&gt;&lt;p&gt;(06:44) Target well above human expert ability&lt;/p&gt;&lt;p&gt;(08:24) What about common sense?&lt;/p&gt;&lt;p&gt;(09:48) Reasoning benchmarks aren't dead yet&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 2 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 5th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/rip-classic-benchmarks?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/rip-classic-benchmarks&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/rip-classic-benchmarks/gpqa-saturating.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/rip-classic-benchmarks/gpqa-saturating.png" alt="Graph showing AI model accuracy on reasoning benchmarks from 2023 to 2026." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/rip-classic-benchmarks/micke-parts.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/rip-classic-benchmarks/micke-parts.jpg" alt="Which step is the part labeled 6 first used in? AI systems can’t say." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/rip-classic-benchmarks/ptb.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/rip-classic-benchmarks/ptb.png" alt="Results on PostTrainBench. There’s no reason to stop at 51.1%!" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/rip-classic-benchmarks/gsm8k_problem.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/rip-classic-benchmarks/gsm8k_problem.png" alt="Math problem with solution showing calculation for delivery trucks needed for flagstones." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">70ac76ed-5246-4340-98c4-0c1f57584112</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/70ac76ed-5246-4340-98c4-0c1f57584112.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Greg%2520Burnham&amp;title=%22RIP%20Classic%20Reasoning%20Benchmarks.%20What%E2%80%99s%20Next%3F%22%20by%20Greg%20Burnham&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Frip-classic-benchmarks&amp;created_at=2026-05-18T13%3A36%3A23.023886%2B00%3A00&amp;duration=657" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/rip-classic-benchmarks</link>
      <itunes:duration>657</itunes:duration>
    </item>
    <item>
      <title>“What you need to know about AI chips” by Epoch AI</title>
      <description>&lt;p&gt; Subtitle: A look at the specialized hardware driving modern AI — why chips cost tens of thousands of dollars each, and why demand continues to outstrip supply.&lt;/p&gt;  &lt;p&gt; One of the biggest factors shaping AI progress today is access to a very specific kind of computer chip, manufactured almost entirely by a single company in Taiwan. These specialized AI chips, also sometimes called AI accelerators, power every frontier AI product, from chatbots to image generators, and are the most important physical input to the training and deployment of AI systems. Some prominent examples of AI chips are Nvidia's Blackwell and Hopper GPUs (named for the graphics chips they descend from), Google's TPU, and Amazon's Trainium series.&lt;/p&gt;
&lt;p&gt; An Nvidia Blackwell GPU. Credit: Nvidia.&lt;/p&gt;
&lt;p&gt; Who manufactures AI chips, who can buy them, and whether there is enough electricity to power them at scale — these questions are shaping which companies can build the most capable AI, which countries can support an AI industry, and how fast the technology advances.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Why AI companies want more chips than they can get&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; When a company wants to train a new AI model, one of the most important things they need is [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:19) Why AI companies want more chips than they can get&lt;/p&gt;&lt;p&gt;(06:09) AI chips get more cost-effective every year&lt;/p&gt;&lt;p&gt;(09:09) Electricity efficiency is increasing, but so is total consumption&lt;/p&gt;&lt;p&gt;(11:07) AI chips sit at the center of progress in AI&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 1st, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/chips-topic-overview?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/chips-topic-overview&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/chips-topic-overview/nvidia-blackwell-chip.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/chips-topic-overview/nvidia-blackwell-chip.jpg" alt="An image of an Nvidia Blackwell chip." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/chips-topic-overview/euv-tool.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/chips-topic-overview/euv-tool.png" alt="Technicians in cleanroom suits and red hard hats near large semiconductor manufacturing equipment in a sterile fabrication facility" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 01 May 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">caffe483-c54e-4410-a743-d43bf9b57c70</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/caffe483-c54e-4410-a743-d43bf9b57c70.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Epoch%2520AI&amp;title=%22What%20you%20need%20to%20know%20about%20AI%20chips%22%20by%20Epoch%20AI&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fchips-topic-overview&amp;created_at=2026-05-18T15%3A39%3A30.746858%2B00%3A00&amp;duration=737" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/chips-topic-overview</link>
      <itunes:duration>737</itunes:duration>
    </item>
    <item>
      <title>“Diversion and resale: estimating compute smuggling to China” by Isabel Juniewicz</title>
      <description>&lt;p&gt; Subtitle: We estimate that between 290,000 and 1.6 million H100-equivalents (H100e) were smuggled to China through 2025. Our median estimate of 660,000 H100e would be roughly a third of China's total compute.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Key takeaways&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; Substantial quantities of AI chips have been sent to China in violation of US export controls. Evidence of diverted or missing chips, drawn from indictments and investigative reporting, points to nearly 300,000 Nvidia H100-equivalents by the end of 2025. This would equal roughly a quarter of the compute China acquired through legal channels or domestic production. Because much smuggling goes undetected, the true total is likely higher.&lt;/li&gt;
&lt;li&gt; We estimate, with 90% confidence, that between 290,000 and 1.6 million H100-equivalents of compute were smuggled through the end of 2025. Our median estimate of 660,000 represents roughly 3% of the global compute stockpile, comparable to what xAI, a leading US AI lab, had at the time. The upper bound of our estimate would mean that, by the end of 2025, the majority of China's AI compute had been smuggled.&lt;/li&gt;
&lt;li&gt; We are uncertain about many variables, notably the magnitude of undetected smuggling and the proportion of chips allegedly diverted or missing that ultimately reached [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:32) Key takeaways&lt;/p&gt;&lt;p&gt;(01:45) Overview&lt;/p&gt;&lt;p&gt;(07:13) Evidence on smuggled chips&lt;/p&gt;&lt;p&gt;(07:17) Compute diversion&lt;/p&gt;&lt;p&gt;(07:47) Compute resale&lt;/p&gt;&lt;p&gt;(10:02) Estimation methodology&lt;/p&gt;&lt;p&gt;(10:15) Compute diversion&lt;/p&gt;&lt;p&gt;(12:27) Compute resale&lt;/p&gt;&lt;p&gt;(15:15) Combined results&lt;/p&gt;&lt;p&gt;(16:43) Comparison to other estimates&lt;/p&gt;&lt;p&gt;(18:51) Quarterly extrapolation&lt;/p&gt;&lt;p&gt;(19:35) Conclusion&lt;/p&gt;&lt;p&gt;(20:58) Acknowledgements&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 20 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 29th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/chip-smuggling?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/chip-smuggling&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/chip-smuggling/smuggling-diagram.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/chip-smuggling/smuggling-diagram.jpg" alt="Diagram showing smuggling routes for Chinese AI chips through diverters and brokers." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/chip-smuggling-dot-whisker.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/chip-smuggling-dot-whisker.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/chip-smuggling-density.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/chip-smuggling-density.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/chip-smuggling-quarterly.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/chip-smuggling-quarterly.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">4f640280-899e-4c6c-8eab-cc6d6db3c961</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/4f640280-899e-4c6c-8eab-cc6d6db3c961.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Isabel%2520Juniewicz&amp;title=%22Diversion%20and%20resale%3A%20estimating%20compute%20smuggling%20to%20China%22%20by%20Isabel%20Juniewicz&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fchip-smuggling&amp;created_at=2026-05-18T15%3A39%3A31.742704%2B00%3A00&amp;duration=1286" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/chip-smuggling</link>
      <itunes:duration>1286</itunes:duration>
    </item>
    <item>
      <title>“How Fast Could Robot Production Scale Up?” by Jean-Stanislas Denain, Yann Rivière</title>
      <description>&lt;p&gt; Subtitle: We look at reference classes, factory buildout timelines, and upstream component supply to estimate plausible production rates for humanoids, quadrupeds, robotic arms, wheeled robots, and drones.&lt;/p&gt;  &lt;p&gt; Suppose that in the next few years, robotics capabilities take a large leap forward. Humanoid robots or mobile manipulators become able to perform most manual tasks that humans can. The potential market is enormous: billions of people do physical work, and a robot that could substitute for a human worker at a fraction of the cost would face nearly unlimited demand.&lt;/p&gt;
&lt;p&gt; But robots are physical objects. While software can be copied and deployed nearly instantly, each robot must be manufactured from real components in real factories by real workers. Even if capabilities jumped overnight, production would take time to catch up.&lt;/p&gt;
&lt;p&gt; How much time? In this post, we aim to produce numbers useful for people trying to answer that question. We focus on five form factors: humanoids, quadrupeds, robotic arms, wheeled robots, and drones. While the future of robotics may involve form factors that don’t yet exist at scale, or coordinated fleets of different kinds of robots, we believe that our analysis of existing form factors is still useful [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(04:59) Reference classes for robot production scaling&lt;/p&gt;&lt;p&gt;(05:03) Where robot production stands today&lt;/p&gt;&lt;p&gt;(08:38) How fast can production scale following demand shocks?&lt;/p&gt;&lt;p&gt;(12:19) Inside view: what are the actual constraints?&lt;/p&gt;&lt;p&gt;(12:42) Building factories&lt;/p&gt;&lt;p&gt;(16:33) Getting the components&lt;/p&gt;&lt;p&gt;(17:24) Three component tiers&lt;/p&gt;&lt;p&gt;(21:59) Putting it together&lt;/p&gt;&lt;p&gt;(25:40) Acknowledgements&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 22nd, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/how-fast-could-robot-production-scale-up?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/how-fast-could-robot-production-scale-up&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/robot-production-trends.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/robot-production-trends.png" alt="Cumulative production of humanoids, quadrupeds, robotic arms, wheeled robots, and drones on a log scale from 2015 to 2025. Humanoids and quadrupeds are scaling fastest, but wheeled robots and drones dominate in absolute volume." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">d308b388-df03-4bf9-a79b-825ecfae7e9c</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/d308b388-df03-4bf9-a79b-825ecfae7e9c.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jean-Stanislas%2520Denain%252C%2520Yann%2520Rivi%25C3%25A8re&amp;title=%22How%20Fast%20Could%20Robot%20Production%20Scale%20Up%3F%22%20by%20Jean-Stanislas%20Denain%2C%20Yann%20Rivi%C3%A8re&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fhow-fast-could-robot-production-scale-up&amp;created_at=2026-05-18T15%3A39%3A32.748967%2B00%3A00&amp;duration=1569" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/how-fast-could-robot-production-scale-up</link>
      <itunes:duration>1569</itunes:duration>
    </item>
    <item>
      <title>“OpenAI Stargate: where the US sites stand” by Elliot Stewart, Ben Cottier</title>
      <description>&lt;p&gt; Subtitle: The $500 billion AI data center initiative is projected to exceed 9 gigawatts of capacity by 2029, with 0.3 gigawatts already operational in Abilene and six more US sites under active construction.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The United States is in the middle of an unprecedented build-out of AI infrastructure. No project illustrates the scale of that effort more than Stargate, a $500 billion endeavor involving AI developer OpenAI, cloud provider Oracle, and investment company SoftBank.&lt;/p&gt;&lt;p&gt; Stargate has seven locations across the US, all of which are now showing active development. The most advanced—in Abilene, Texas—is already operating at an estimated capacity of 0.3 gigawatts (GW). The six other sites include two more in Texas, as well as facilities in New Mexico, Wisconsin, Michigan, and Ohio. Together, the seven sites add up to over 9 gigawatts of planned capacity, which is comparable to the peak power demand of New York City.1 This will be enough to power the equivalent of 20 million Nvidia H100 GPUs, which was the total amount of AI compute in the world by the end of 2025.2&lt;/p&gt;




































































SiteCurrent &lt;br&gt; capacity (GW)Projected &lt;br&gt; capacity (GW)3Construction &lt;br&gt; beganProjected &lt;br&gt; completionPower &lt;br&gt; sourcesAbilene, Texas0.31.2Q2 2024Q4 2026On-site gas, GridShackelford County [...]&lt;/br&gt;&lt;/br&gt;&lt;/br&gt;&lt;/br&gt;&lt;/br&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:29) Introduction&lt;/p&gt;&lt;p&gt;(02:16) The sites&lt;/p&gt;&lt;p&gt;(02:18) Abilene, Texas&lt;/p&gt;&lt;p&gt;(03:37) Shackelford County, Texas&lt;/p&gt;&lt;p&gt;(04:39) Doña Ana County, New Mexico&lt;/p&gt;&lt;p&gt;(05:32) Milam County, Texas&lt;/p&gt;&lt;p&gt;(06:38) Port Washington, Wisconsin&lt;/p&gt;&lt;p&gt;(07:35) Saline Township, Michigan&lt;/p&gt;&lt;p&gt;(08:28) Lordstown, Ohio&lt;/p&gt;&lt;p&gt;(09:39) The road ahead&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 6 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 17th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/openai-stargate-where-the-us-sites-stand?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/openai-stargate-where-the-us-sites-stand&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/abilene.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/abilene.png" alt="Satellite image of the Abilene, Texas Stargate site. Image © Airbus DS 2026, captured 2026-03-24." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/shackelford.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/shackelford.png" alt="Satellite image of the Shackelford County, Texas Stargate site. Image © 2026 Vantor, captured 2026-04-14." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/new-mexico.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/new-mexico.png" alt="Satellite image of the Doña Ana County, New Mexico Stargate site. Image © Airbus DS 2026, captured 2026-02-06." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/milam.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/milam.png" alt="Satellite image of the Milam County, Texas Stargate site. Image © Airbus DS 2026, captured 2026-03-23." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/wisconsin.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/wisconsin.png" alt="Satellite image of the Port Washington, Wisconsin Stargate site. Image © Airbus DS 2026, captured 2026-02-14." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/michigan.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/michigan.png" alt="Satellite image of the Saline Township, Michigan Stargate site. Image © 2026 Vantor, captured 2026-03-12." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/ohio.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/openai-stargate-where-the-us-sites-stand/ohio.png" alt="Satellite image of the Lordstown, Ohio Stargate site. Image © Airbus DS 2026, captured 2026-02-27." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">7fc1474a-854b-4ec7-8541-8e63a274fe31</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/7fc1474a-854b-4ec7-8541-8e63a274fe31.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Elliot%2520Stewart%252C%2520Ben%2520Cottier&amp;title=%22OpenAI%20Stargate%3A%20where%20the%20US%20sites%20stand%22%20by%20Elliot%20Stewart%2C%20Ben%20Cottier&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fopenai-stargate-where-the-us-sites-stand&amp;created_at=2026-05-18T15%3A39%3A31.371374%2B00%3A00&amp;duration=642" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/openai-stargate-where-the-us-sites-stand</link>
      <itunes:duration>642</itunes:duration>
    </item>
    <item>
      <title>“Have AI Capabilities Accelerated?” by Jean-Stanislas Denain, Alexander Barry</title>
      <description>&lt;p&gt; Subtitle: We investigate progress trends on four capability metrics to determine whether AI capabilities have recently accelerated. Three of four metrics show strong evidence of acceleration, driven by reasoning models.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; We investigated progress trends on four capability metrics to determine whether AI capabilities have recently accelerated. We do this by fitting several candidate curves to historical data (for example, a simple linear trend vs. a hyperbolic trend) and comparing how well each curve predicts data it hasn’t seen yet.&lt;/p&gt;&lt;p&gt; The following interactive plot shows how each candidate curve fits the historical data. Use the tabs to switch between the time series view and the cross-validation accuracy of each curve.&lt;/p&gt;&lt;p&gt; There's a chart here. The chart title reads: en-US-AvaMultilingualNeural__ Performance over time: Epoch Capabilities Index &lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Three of four metrics show acceleration, seemingly driven by reasoning models&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Three of the four metrics (ECI, log METR 50% time horizon, and a math-focused index we constructed from several math benchmarks) show strong evidence that progress has sped up relative to a global linear trend fit to data from 2023 onward.&lt;/p&gt;&lt;p&gt; The best-performing model across these three metrics was a pair of independent linear trends: one for reasoning models and [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:27) Introduction&lt;/p&gt;&lt;p&gt;(01:11) Three of four metrics show acceleration, seemingly driven by reasoning models&lt;/p&gt;&lt;p&gt;(04:14) Methodology&lt;/p&gt;&lt;p&gt;(04:17) AI Capability Metrics&lt;/p&gt;&lt;p&gt;(07:15) Dataset preparation modes&lt;/p&gt;&lt;p&gt;(08:15) Candidate fits&lt;/p&gt;&lt;p&gt;(10:16) Assessing fit quality&lt;/p&gt;&lt;p&gt;(11:44) Constructing "best-performing fits" sets&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 16th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/have-ai-capabilities-accelerated?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/have-ai-capabilities-accelerated&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/detect-accelerations-heatmap.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/detect-accelerations-heatmap.png" alt="Heatmap showing that fitting separate linear trends for reasoning and non-reasoning models makes the best predictions on three of four metrics." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 16 Apr 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">8fc33b04-87e1-4b3a-95b9-4cdca236ea0e</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/8fc33b04-87e1-4b3a-95b9-4cdca236ea0e.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jean-Stanislas%2520Denain%252C%2520Alexander%2520Barry&amp;title=%22Have%20AI%20Capabilities%20Accelerated%3F%22%20by%20Jean-Stanislas%20Denain%2C%20Alexander%20Barry&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fhave-ai-capabilities-accelerated&amp;created_at=2026-05-18T15%3A39%3A34.706671%2B00%3A00&amp;duration=765" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/have-ai-capabilities-accelerated</link>
      <itunes:duration>765</itunes:duration>
    </item>
    <item>
      <title>“MirrorCode: Evidence that AI can already do some weeks-long coding tasks” by Tom Adamczewski, David Rein, David Owen, Florian Brand</title>
      <description>&lt;p&gt; Subtitle: In our new benchmark, MirrorCode, Claude Opus 4.6 autonomously reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks.&lt;/p&gt; 


&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; We present early results from MirrorCode, a benchmark (co-developed with METR) of long-horizon coding tasks derived from real software applications. We find that AI models can autonomously reimplement complex existing software without access to the original program's source code, provided there is a detailed, checkable specification. For example, Claude Opus 4.6 successfully reimplemented gotree — a bioinformatics toolkit with ~16,000 lines of Go and 40+ commands. We guess this same task would take a human engineer without AI assistance 2–17 weeks. We see continued gains from inference scaling on larger projects, suggesting they may be solvable given enough tokens.&lt;/p&gt;&lt;p&gt; AI models are increasingly capable at autonomous coding. Several notable software engineering (SWE) benchmarks have seen rapid progress. However, these usually measure fairly short coding tasks; for example, only about 100 of the 731 SWE-bench Pro tasks involve diffs larger than 100 lines. Meanwhile, recent demos of AI coding (for example, to develop a new C compiler or a new browser) are impressive but hard to evaluate. The completeness of the resulting [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:29) Introduction&lt;/p&gt;&lt;p&gt;(04:01) Methodology&lt;/p&gt;&lt;p&gt;(09:34) Preliminary results&lt;/p&gt;&lt;p&gt;(09:37) Recent AI models can fully reimplement real programs&lt;/p&gt;&lt;p&gt;(12:05) Opus 4.6 solved gotree through perseverance, and its engineering was better than older models&lt;/p&gt;&lt;p&gt;(14:08) Further inference scaling might solve Pkl&lt;/p&gt;&lt;p&gt;(16:13) Limitations&lt;/p&gt;&lt;p&gt;(21:13) Discussion and conclusion&lt;/p&gt;&lt;p&gt;(24:31) Data&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 25 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 10th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/mirrorcode-preliminary-results?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/mirrorcode-preliminary-results&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/mirrorcode-pre-figure1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/mirrorcode-pre-figure1.png" alt="AI can rebuild complex software from behavior alone" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/mirrorcode-pre-figure2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/mirrorcode-pre-figure2.png" alt="Recent AI models perform better in the MirrorCode benchmark" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/mirrorcode-pre-figure3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/mirrorcode-pre-figure3.png" alt="MirrorCode performance scales with inference budget" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/mirrorcode-pre-figure4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/mirrorcode-pre-figure4.png" alt="Performance versus inference budget on the Pkl target program, for Claude Opus 4.6" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">80b33f1e-3e4c-443f-bfa7-7a1bdb5b4b22</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/80b33f1e-3e4c-443f-bfa7-7a1bdb5b4b22.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Tom%2520Adamczewski%252C%2520David%2520Rein%252C%2520David%2520Owen%252C%2520Florian%2520Brand&amp;title=%22MirrorCode%3A%20Evidence%20that%20AI%20can%20already%20do%20some%20weeks-long%20coding%20tasks%22%20by%20Tom%20Adamczewski%2C%20David%20Rein%2C%20David%20Owen%2C%20Florian%20Brand&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fmirrorcode-preliminary-results&amp;created_at=2026-05-18T16%3A09%3A23.915941%2B00%3A00&amp;duration=1556" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/mirrorcode-preliminary-results</link>
      <itunes:duration>1556</itunes:duration>
    </item>
    <item>
      <title>“What does the war in Iran mean for AI?” by Josh You</title>
      <description>&lt;p&gt; Subtitle: A prolonged Hormuz crisis probably won't derail the compute buildout, but it could slow data center expansion and disrupt Gulf investment flows into AI.&lt;/p&gt; 
&lt;p&gt; Disclaimer: My background is in the economics of AI and compute. I’m not an expert in war, diplomacy, or oil and gas, but I’m familiar with what economic inputs matter for AI. I am also writing about a very dynamic situation. So specific claims about the situation in Iran and Hormuz and its impacts on supply chains should be read as tentative and based on relatively quick research.&lt;/p&gt;
&lt;p&gt; Since the US and Israel went to war with Iran at the end of February, shipping through the Strait of Hormuz — the sole sea route out of the Persian Gulf — has mostly shut down. This has disrupted around 10% of the world's supply of oil, as well as exports of natural gas, helium, urea, and aluminum, and others. Iran has also struck targets in the Gulf states, notably oil and gas facilities and a few data centers.&lt;/p&gt;
&lt;p&gt; On April 8, the US and Iran agreed to a two-week ceasefire that would reopen the Strait of Hormuz, though it is not clear [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:10) The Iran War's impact on energy&lt;/p&gt;&lt;p&gt;(04:53) Energy for fabs&lt;/p&gt;&lt;p&gt;(08:22) Energy for data centers&lt;/p&gt;&lt;p&gt;(10:37) Helium&lt;/p&gt;&lt;p&gt;(12:06) Gulf data centers and investment flows&lt;/p&gt;&lt;p&gt;(16:47) Takeaways&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 15 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 10th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/war-in-iran-and-ai?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/war-in-iran-and-ai&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/war-in-iran-and-ai/owid-gas.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/war-in-iran-and-ai/owid-gas.png" alt="Courtesy of Ember Energy Institute and OWID, who also provide a nice world map" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/war-in-iran-and-ai/owid-gulf.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/war-in-iran-and-ai/owid-gulf.png" alt="Courtesy of Ember Energy Institute and Our World in Data" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">6351a123-e978-4265-8a47-5214e0ff7d88</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/6351a123-e978-4265-8a47-5214e0ff7d88.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Josh%2520You&amp;title=%22What%20does%20the%20war%20in%20Iran%20mean%20for%20AI%3F%22%20by%20Josh%20You&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fwar-in-iran-and-ai&amp;created_at=2026-05-18T13%3A37%3A36.005657%2B00%3A00&amp;duration=1094" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/war-in-iran-and-ai</link>
      <itunes:duration>1094</itunes:duration>
    </item>
    <item>
      <title>“AI is a common workplace tool: half of employed AI users now use it for work” by Caroline Falkman Olsson, Yafah Edelman</title>
      <description>&lt;p&gt; Subtitle: We surveyed over 2,000 Americans on how they use AI at work: who uses it, how much, which services, and whether it's replacing or creating tasks.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; What we found&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; AI is becoming a mainstream work tool. Half of employed Americans who used AI in the past week reported using AI tools at least as much for work as for personal tasks.&lt;/li&gt;
&lt;li&gt; AI is changing what people do at work. It has replaced existing tasks for 27% of employed AI work users and created new ones for 21%.&lt;/li&gt;
&lt;li&gt; AI work use is higher among paid subscribers. Employer-paid subscribers are far more likely to use AI for work than free-tier users, and self-payers fall in-between.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt; AI tools have moved from a niche technology to a part of everyday life. In a new Epoch AI/Ipsos survey of over 2,000 U.S. adults, half reported using AI tools in the past week.&lt;/p&gt;
&lt;p&gt; But adoption rate alone does not capture the full picture of how AI is used. Among employed users, it has become a work tool that is already changing the tasks they perform, with substantially higher workplace use among paid subscribers than free-tier users.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; [...]&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:27) What we found&lt;/p&gt;&lt;p&gt;(01:33) AI is now a workplace tool, not just a personal one&lt;/p&gt;&lt;p&gt;(02:41) About this survey&lt;/p&gt;&lt;p&gt;(03:36) AI both creates and replaces tasks at work&lt;/p&gt;&lt;p&gt;(05:09) Work use is higher among paid subscribers&lt;/p&gt;&lt;p&gt;(08:25) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 9th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/half-of-employed-ai-users-now-use-it-for-work?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/half-of-employed-ai-users-now-use-it-for-work&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/ipsos-article-work-personal.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/ipsos-article-work-personal.png" alt="Bar chart showing how employed past-week AI users split their use between work and personal tasks: 51% use AI at least as much for work as for personal, 47% mostly for personal, 2% not sure. Based on a weighted subsample of 665 employed past-week AI users. Source: Epoch AI/Ipsos KnowledgePanel, March 3–5, 2026." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/ipsos-article-automation-augmentation.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/ipsos-article-automation-augmentation.png" alt="Paired bar chart showing how AI reshapes work among employed users who use AI at least partly for work: 27% say AI has replaced existing tasks (automation), 21% say AI has created new tasks (augmentation), with 13% reporting both. Source: Epoch AI/Ipsos KnowledgePanel, March 2026." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/ipsos-article-sub-status-work-personal.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/ipsos-article-sub-status-work-personal.png" alt="Stacked bar chart comparing work vs. personal AI use by subscription status: among paid subscribers (n = 279), 29% use AI mostly for personal, 37% mostly for work, 34% equally for both. Among those with no paid subscription (n = 374), 62% mostly for personal, 18% mostly for work, 20% equally for both. Among employed past-week AI users, excluding 'Not sure' responses. Source: Epoch AI/Ipsos KnowledgePanel, March 3–5, 2026." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/ipsos-article-diverging-paid.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/ipsos-article-diverging-paid.png" alt="Butterfly bar chart showing work vs. personal AI use among paid subscribers by service. Microsoft Copilot leads on both sides: 18.4% for work, 7.5% for personal. ChatGPT follows at 12.6% for work, 4.3% for personal. Google Gemini: 7.5% work, 2.7% personal. Claude: 2.8% work, 0.7% personal. Grok: 1.5% work, 1.3% personal. Perplexity: 1.0% work, 0.7% personal. Any paid service overall: 30.2% work, 12.6% personal. Among employed past-week AI users with valid work/personal response. Source: Epoch AI/Ipsos KnowledgePanel, March 3–5, 2026." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">f10988c0-a9b7-4a90-8d26-bb1bfdea1d7c</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/f10988c0-a9b7-4a90-8d26-bb1bfdea1d7c.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Caroline%2520Falkman%2520Olsson%252C%2520Yafah%2520Edelman&amp;title=%22AI%20is%20a%20common%20workplace%20tool%3A%20half%20of%20employed%20AI%20users%20now%20use%20it%20for%20work%22%20by%20Caroline%20Falkman%20Olsson%2C%20Yafah%20Edelman&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fhalf-of-employed-ai-users-now-use-it-for-work&amp;created_at=2026-05-18T16%3A35%3A49.122427%2B00%3A00&amp;duration=571" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/half-of-employed-ai-users-now-use-it-for-work</link>
      <itunes:duration>571</itunes:duration>
    </item>
    <item>
      <title>“Keeping up with the GPTs” by Anson Ho</title>
      <description>&lt;p&gt; Subtitle: Can Chinese and open model companies compete with the frontier through e.g. distillation and talent?&lt;/p&gt;  &lt;p&gt; If the last decade of AI has taught us one lesson, it's that scaling compute builds better models. This sounds great — until you realize your competitors have ten times more compute than you.&lt;/p&gt;
&lt;p&gt; This is the situation that many Chinese and open model companies find themselves in; relative to frontier companies, they’re “compute-poor”. Just last year, Anthropic spent over ten times more on compute than Minimax and Zhipu AI combined, and the gap is even wider for OpenAI:&lt;/p&gt;
&lt;p markdown="1"&gt;Data from Epoch's data on AI companies and Data Insights.&lt;/p&gt;
&lt;p&gt; You don’t need to be an AI expert to see that this is a huge handicap. With less compute, it's harder to run experiments, train bigger models, and serve many users.&lt;/p&gt;
&lt;p&gt; But compute-poor AI labs have an ace up their sleeve. Even lacking frontier-level compute, they can try to use theirs more efficiently to punch well above their weight. That's how DeepSeek was on the heels of OpenAI despite using a fraction of the training compute (at least on some benchmarks), driving the stock market bananas.1&lt;/p&gt;
&lt;p&gt; The big [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:35) Breaking down the efficiency gains&lt;/p&gt;&lt;p&gt;(02:45) Approach 1: Innovate faster than the compute-rich labs&lt;/p&gt;&lt;p&gt;(05:17) Approach 2: Replicate innovations from frontier labs&lt;/p&gt;&lt;p&gt;(08:44) Approach 3: Leverage the capabilities of frontier models&lt;/p&gt;&lt;p&gt;(14:06) Putting things together&lt;/p&gt;&lt;p&gt;(15:29) What does this mean for the future of AI?&lt;/p&gt;&lt;p&gt;(15:59) Compute-poor = Chinese AI labs?&lt;/p&gt;&lt;p&gt;(20:15) Compute-poor = Open models?&lt;/p&gt;&lt;p&gt;(23:47) The bottom line&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 20 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 7th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/keeping-up-with-the-gpts?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/keeping-up-with-the-gpts&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/frontier-spend.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/frontier-spend.png" alt="Data from Epoch’s data on AI companies and Data Insights." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/openai-compute-spend.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/openai-compute-spend.png" alt="Infographic showing OpenAI's 2024 compute expenses split between R&amp;amp;D and inference categories." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/training_fraction.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/training_fraction.png" alt="Bar graph titled "Final training runs are a small fraction of R&amp;amp;D compute spending" showing three companies' training compute percentages." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/combined_plot.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/combined_plot.png" alt="Graph comparing model accuracy on Math and GPQA diamond benchmarks versus training compute." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/eci_us_china.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/eci_us_china.png" alt="A graph showing "Chinese models have lagged behind US models a roughly constant amount."" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/compute_us_china.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/compute_us_china.png" alt="A graph showing training compute of US versus Chinese AI models over time." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/compute_open_closed.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/compute_open_closed.png" alt="Graph showing training compute over time for open versus closed AI models, titled "Open models have not fallen dramatically behind closed models in training compute."" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/eci_open_closed.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/eci_open_closed.png" alt="A graph titled "Open models have lagged behind closed models a roughly constant amount" showing ECI scores over time." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/nathan_lambert_open_weight_incentives.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/nathan_lambert_open_weight_incentives.png" alt="Nathan Lambert tweets: "An existential risk for near term open-weight models. In the coming years, the only places with business reasons for building them 1) non profits -- good for research/the world 2) nvidia's -- keep their hardware up with ai 3) meta's -- commodotize their complements". The quoted tweet, by Carl Franzen, reads: "Word on the street is that Alibaba is tightening the screws to make money via proprietary cloud and API rather than open source venturebeat.com/technology/did..."." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/pareto-chart.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/pareto-chart.png" alt="Graph showing capabilities versus cheapness pareto frontier for compute-rich and compute-poor labs." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/openrouter.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/keeping-up-with-the-gpts/openrouter.png" alt="Leaderboard showing top 10 LLM models ranked by token usage on OpenRouter." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 07 Apr 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">178fee10-52f2-4050-8fd7-11c97bc3ed71</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/178fee10-52f2-4050-8fd7-11c97bc3ed71.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho&amp;title=%22Keeping%20up%20with%20the%20GPTs%22%20by%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fkeeping-up-with-the-gpts&amp;created_at=2026-05-18T13%3A38%3A31.836373%2B00%3A00&amp;duration=1563" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/keeping-up-with-the-gpts</link>
      <itunes:duration>1563</itunes:duration>
    </item>
    <item>
      <title>“Introducing the AI Chip Owners Explorer” by Josh You, Venkat Somala</title>
      <description>&lt;p&gt; Subtitle: We announce our new AI Chip Owners explorer, showing which companies own the world's leading AI chips.&lt;/p&gt;  &lt;p&gt; Computing capacity (“compute”) is a critical input to the development, training, and deployment of AI systems. How much AI-optimized compute exists in the world, and who owns it? Earlier this year, we launched the AI Chip Sales explorer to track the first question. Today, we’re launching our AI Chip Owners explorer to track the second.&lt;/p&gt;
&lt;p&gt; Our AI Chip Owners explorer contains interactive visualizations of our analysis of the number of leading AI chips owned by the largest US hyperscalers and cloud companies, one frontier AI developer (xAI), and Chinese customers — with breakdowns by chip family, chip model, and shifts in ownership over time. We build upon our estimates the total volumes of Nvidia, Google TPU, Amazon Trainium, AMD, and Huawei chips from the AI Chip Sales, and distribute these chips among major owners using estimates from analysts and industry researchers, company financial disclosures, capital spending, and our analysis of frontier-scale AI data centers.&lt;/p&gt;
&lt;p&gt; The AI Chip Owners explorer is intended as a resource for researchers, policymakers, and anyone tracking the strategic landscape of AI compute. You can [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:36) Hyperscalers own the majority of global AI compute&lt;/p&gt;&lt;p&gt;(03:41) Chinese customers own just 5% of global AI compute&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 6th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/introducing-the-ai-chip-owners-explorer?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/introducing-the-ai-chip-owners-explorer&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-owners-explorer/image1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-owners-explorer/image1.png" alt="Stacked area chart showing "AI chip ownership over time" share of cumulative compute capacity." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-owners-explorer/image2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-owners-explorer/image2.png" alt="A stacked bar chart titled "AI chip ownership" showing cumulative compute capacity by company and chip family." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-owners-explorer/image3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-owners-explorer/image3.png" alt="Stacked area chart showing "AI chip ownership over time" by company share from 2024 Q1 to 2025 Q4." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-owners-explorer/image4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-owners-explorer/image4.png" alt="Stacked bar chart showing "China chip ownership over time" cumulative compute capacity growth through 2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 06 Apr 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">fdb65d70-ebb8-4c92-b93c-78fe3ecafa82</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/fdb65d70-ebb8-4c92-b93c-78fe3ecafa82.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Josh%2520You%252C%2520Venkat%2520Somala&amp;title=%22Introducing%20the%20AI%20Chip%20Owners%20Explorer%22%20by%20Josh%20You%2C%20Venkat%20Somala&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fintroducing-the-ai-chip-owners-explorer&amp;created_at=2026-05-18T16%3A27%3A11.274696%2B00%3A00&amp;duration=398" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/introducing-the-ai-chip-owners-explorer</link>
      <itunes:duration>398</itunes:duration>
    </item>
    <item>
      <title>“What do frontier AI companies’ job postings reveal about their plans?” by Jean-Stanislas Denain, Campbell Hutcheson</title>
      <description>&lt;p&gt; Subtitle: A fast increase in go-to-market roles, and hints about upcoming products. &lt;/p&gt;  &lt;p&gt; AI companies guard their strategies closely. Their hiring pages, however, are public.&lt;/p&gt;
&lt;p&gt; And those posts contain clues about what products a company is developing, who it hopes to sell them to, and which bottlenecks it sees coming. A posting for a “Camera ISP Software Engineer” suggests a device with a camera. A search for “Forward Deployed Engineers” hints at the challenges of deploying AI inside companies. A cluster of roles mentioning robotics implies ambitions well beyond chatbots.&lt;/p&gt;
&lt;p&gt; We analyzed open roles at the leading foundation labs, including OpenAI, Anthropic, xAI and Google DeepMind1. Here is what we found:&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; First, sales and sales-related hiring has increased sharply at both Anthropic and OpenAI over the past year. Anthropic's go-to-market share of open roles grew from 17% to 31% and OpenAI's from 18% to 28%. This increase has been particularly concentrated in technical roles that help clients deploy AI to their companies.&lt;/li&gt;
&lt;li&gt; Second, open roles can provide insight into the product roadmap at the labs. For example, OpenAI and DeepMind are both investing in hardware products, such as robotics and consumer devices. In [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:29) Go-to-market is the top hiring category at OpenAI and Anthropic&lt;/p&gt;&lt;p&gt;(06:09) Job postings shed light on new product bets at OpenAI and DeepMind&lt;/p&gt;&lt;p&gt;(08:46) Job postings also offer clues about how labs secure compute and data&lt;/p&gt;&lt;p&gt;(10:53) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          March 24th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/ai-lab-job-postings?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/ai-lab-job-postings&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/ai-lab-job-postings/main.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/ai-lab-job-postings/main.png" alt="Bar chart showing "Open roles at OpenAI and Anthropic, March 2026" comparing adoption and sales roles." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 24 Mar 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">af301aa9-fc98-475c-86a2-b3e82c89eec8</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/af301aa9-fc98-475c-86a2-b3e82c89eec8.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jean-Stanislas%2520Denain%252C%2520Campbell%2520Hutcheson&amp;title=%22What%20do%20frontier%20AI%20companies%E2%80%99%20job%20postings%20reveal%20about%20their%20plans%3F%22%20by%20Jean-Stanislas%20Denain%2C%20Campbell%20Hutcheson&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fai-lab-job-postings&amp;created_at=2026-05-18T13%3A40%3A41.620446%2B00%3A00&amp;duration=702" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/ai-lab-job-postings</link>
      <itunes:duration>702</itunes:duration>
    </item>
    <item>
      <title>“Final training runs account for a minority of R&amp;D compute spending” by Jean-Stanislas Denain, Cheryl Wu</title>
      <description>&lt;p&gt; Subtitle: New evidence following the MiniMax and Z.ai IPOs. &lt;/p&gt;  &lt;p&gt; In the popular picture of how AI companies use compute, there are two big buckets: training and inference.
But in reality, the R&amp;amp;D side is more complex. The final training run — the one that produces the model with a name — is only the last step in a long, expensive process of exploration. Before that run begins, companies burn through compute on: running experiments at various scales, generating synthetic data, testing which ideas work before committing to a final run, and training models that are never released.&lt;/p&gt;
&lt;p&gt; This distinction matters. When people discuss compute thresholds or the cost of training a frontier model, they often mean the final training run. However, the full cost of developing that model is much higher. And if most of the spending is exploration rather than execution, then a competitor who learns what works from the frontier could replicate the results for a fraction of the original cost.&lt;/p&gt;
&lt;p&gt; So there's more to R&amp;amp;D compute than final training runs, but how much? Last year, we estimated the breakdown of OpenAI's 2024 compute spending at around $5 billion on R&amp;amp;D compute that [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:54) Breaking down MiniMax and Z.ai's compute spending&lt;/p&gt;&lt;p&gt;(06:35) Final training runs are a small fraction of R&amp;amp;D compute spending&lt;/p&gt;&lt;p&gt;(08:30) R&amp;amp;D compute and catch-up growth&lt;/p&gt;&lt;p&gt;(10:42) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 6 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          March 23rd, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/r-and-d-vs-training-compute?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/r-and-d-vs-training-compute&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/r-and-d-vs-training-compute/main.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/r-and-d-vs-training-compute/main.png" alt="A bar graph showing training compute spending as share of R&amp;amp;D compute spending." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 23 Mar 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">f39e2843-4f4b-4934-8b36-21633484b4f0</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/f39e2843-4f4b-4934-8b36-21633484b4f0.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jean-Stanislas%2520Denain%252C%2520Cheryl%2520Wu&amp;title=%22Final%20training%20runs%20account%20for%20a%20minority%20of%20R%26D%20compute%20spending%22%20by%20Jean-Stanislas%20Denain%2C%20Cheryl%20Wu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fr-and-d-vs-training-compute&amp;created_at=2026-05-18T13%3A41%3A39.857923%2B00%3A00&amp;duration=706" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/r-and-d-vs-training-compute</link>
      <itunes:duration>706</itunes:duration>
    </item>
    <item>
      <title>“The least understood driver of AI progress” by Anson Ho</title>
      <description>&lt;p&gt; Subtitle: An opinionated guide to “algorithmic progress” and why it matters. &lt;/p&gt;  &lt;p&gt; AI software progress is one of those things that everyone vaguely knows about, but only a handful of people in the world truly understand its significance. Consider that many of the most fervent debates in AI to date depend enormously on it: How did DeepSeek seem to catch up to OpenAI's o1 within months while using less training compute? When will the world develop AGI? And if we automate AI research, will AI progress accelerate like crazy a la Situational Awareness and AI 2027?&lt;/p&gt;
&lt;p&gt; I don’t know your stances on these questions, but I do know that you can’t have a well-informed opinion on them without understanding software progress. So I figured I should write a post describing the most important things that you need to know, starting from the basics and leading up to the current frontier.&lt;/p&gt;
&lt;p&gt; Here are the main takeaways, one for each section of the post:&lt;/p&gt;
&lt;ol&gt; 
&lt;li&gt; AI software progress is about reducing the training compute you need to get to the same level of capability, through better algorithms or data. This is commonly called “algorithmic progress” including by [...]&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:40) 1. AI software progress: Doing more with what we have&lt;/p&gt;&lt;p&gt;(07:42) 2. How fast is AI software progress?&lt;/p&gt;&lt;p&gt;(14:27) 3. What drives software progress? (Or, why all the estimates we just saw are misleading)&lt;/p&gt;&lt;p&gt;(23:37) 4. How this impacts the software intelligence explosion debate&lt;/p&gt;&lt;p&gt;(30:03) 5. Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 29 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 25th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/the-least-understood-driver-of-ai-progress?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/the-least-understood-driver-of-ai-progress&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/scaling_curve.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/scaling_curve.png" alt="Graph showing relationship between log of training compute and capabilities, with Transformer marked on an upward diagonal line." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/compute_efficiency_gain.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/compute_efficiency_gain.png" alt="Graph showing relationship between training compute and capabilities, titled "Capabilities". Innovation reduces compute needed for same capability level." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/more_capabilities_same_compute.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/more_capabilities_same_compute.png" alt="Graph showing capabilities increase with training compute, steeper after innovation." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/effective_compute.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/effective_compute.png" alt="Graph showing capabilities versus log of training compute with innovation trajectory and efficiency gains." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/efficiency-gains-v1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/efficiency-gains-v1.png" alt="You can find the full data and sources in the Appendix." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/rosetta_stone_algo_progress.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/rosetta_stone_algo_progress.png" alt="The “lines on a graph” approach to estimating the rate of AI software progress. The rate of progress is given by the slope of the line." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/gpt-oss.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/gpt-oss.png" alt="gpt-oss-20b does substantially better than GPT-3 on MMLU, despite using the same amount of training compute. If we look at the relationship between pre-training compute and MMLU performance among non-reasoning GPT models, we can back out a rate of algorithmic improvement from this — this works out to around 2-5× per year." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/compute_efficiency_gain.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/compute_efficiency_gain.png" alt="Graph showing relationship between training compute and capabilities, titled "Capabilities". Innovation reduces compute needed for same capability level." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/scale_dependent_innovation.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/scale_dependent_innovation.png" alt="Graph showing capabilities versus log of training compute, with steeper slope after innovation." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/modern_transformer_vs_lstm.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/modern_transformer_vs_lstm.png" alt="Graph comparing validation loss versus compute for modern transformers and LSTMs, titled "Modern Transformer vs LSTM Scaling."" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/efficiency_gain_breakdown.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/efficiency_gain_breakdown.png" alt="Note that the paper in question uses the Deep Learning Era trendline from Epoch’s “Notable AI models” dataset as the frontier of training compute. This is why the “compute frontier” is close to 10 to the 23rd power FLOP in 2024-2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/returns_to_rnd_idea.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/returns_to_rnd_idea.png" alt="Diagram showing research effort leading to software progress through returns to AI software R&amp;amp;D." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/software-explosion-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/software-explosion-4.png" alt="Estimates of the returns to AI software R&amp;amp;D are very uncertain but straddle 1. This suggests that an intelligence explosion is plausible, but it’s also plausible that the feedback loop of AIs improving themselves simply fizzles out." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/assumed_vs_actual_model.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/assumed_vs_actual_model.png" alt="Diagram comparing assumed versus realistic models for estimating R&amp;amp;D returns with compute scaling." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/really_scale_dependent.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/really_scale_dependent.png" alt="Graph showing capabilities versus log of training compute, illustrating innovation impact on performance." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/big_scale_independent.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/the-least-understood-driver-of-ai-progress/big_scale_independent.png" alt="Graph showing capabilities versus log of training compute with two trend lines and annotation about scale-independent innovations." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 25 Feb 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">c53a5f56-d199-4f45-bea0-d3c133c41987</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/c53a5f56-d199-4f45-bea0-d3c133c41987.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho&amp;title=%22The%20least%20understood%20driver%20of%20AI%20progress%22%20by%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fthe-least-understood-driver-of-ai-progress&amp;created_at=2026-05-18T20%3A11%3A31.237224%2B00%3A00&amp;duration=2081" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/the-least-understood-driver-of-ai-progress</link>
      <itunes:duration>2081</itunes:duration>
    </item>
    <item>
      <title>“Expanding our analysis of biological AI models” by David Atanasov, Niccolò Zanichelli, Jean-Stanislas Denain</title>
      <description>&lt;p&gt; Subtitle: We release a database of over 1,100 biological AI models across nine categories. We analyze their safeguards, accessibility, training data sources, and the foundation models they build on.&lt;/p&gt;  &lt;p&gt; This report presents an expanded database of AI models in biology, commissioned by Sentinel Bio and building on our 2024 collaboration, in which Sentinel Bio funded Epoch AI to collect and organize information about AI models in biology. The goal of this new report is to expand coverage to new categories of biology-relevant AI models and to capture releases since September 2024.&lt;/p&gt;
&lt;p&gt; To build the database, we searched major academic databases and preprint servers for papers introducing AI models in biology, then used language models to filter candidates and extract structured metadata from the remaining papers. The most important models received additional manual review. The full methodology is described in the Appendix.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Key findings&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The final database contains 1,196 models, of which 1,124 were annotated using AI assistance only while 72 received dedicated manual annotation. We also manually checked every entry for which we reported safeguards being used. Here are the main findings from our analysis:&lt;/p&gt;&lt;p&gt; Pre-release risk assessments and risk-related evaluations are rare. Only 2.5% of [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:18) Key findings&lt;/p&gt;&lt;p&gt;(03:52) Data access&lt;/p&gt;&lt;p&gt;(04:14) Methodology overview&lt;/p&gt;&lt;p&gt;(05:45) Analysis&lt;/p&gt;&lt;p&gt;(05:48) 1. Dataset Overview&lt;/p&gt;&lt;p&gt;(06:15) Category distribution&lt;/p&gt;&lt;p&gt;(07:33) Geographic distribution&lt;/p&gt;&lt;p&gt;(08:38) Institutional distribution&lt;/p&gt;&lt;p&gt;(09:47) Notable models&lt;/p&gt;&lt;p&gt;(10:50) 2. Risk management practices&lt;/p&gt;&lt;p&gt;(15:31) 3. Accessibility&lt;/p&gt;&lt;p&gt;(16:58) 4. Building Block Models&lt;/p&gt;&lt;p&gt;(18:38) 5. Training Data, Parameters, and Compute&lt;/p&gt;&lt;p&gt;(18:43) Training datasets&lt;/p&gt;&lt;p&gt;(19:38) Parameters, data size, and compute&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 20th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/expanding-our-analysis-of-biological-ai-models?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/expanding-our-analysis-of-biological-ai-models&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/sentinelbio-category-distribution.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/sentinelbio-category-distribution.png" alt="Protein engineering and small biomolecule design account for over half of models" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/sentinelbio-geographic-distribution.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/sentinelbio-geographic-distribution.png" alt="The US and China produce the majority of biological AI models" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/sentinelbio-institution-distribution.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/sentinelbio-institution-distribution.png" alt="Universities produce the majority of biological AI models" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/sentinelbio-safeguards-notable.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/sentinelbio-safeguards-notable.png" alt="AI models in biology rarely conduct pre-release risk assessments and risk-related evaluations" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/sentinelbio-safeguards-overall.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/sentinelbio-safeguards-overall.png" alt="AI models in biology rarely have safeguards" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/sentinelbio-inference-time.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/sentinelbio-inference-time.png" alt="Frontier LLMs rely more on inference-time filtering than biology-specific models" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/sentinelbio-accessibility.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/sentinelbio-accessibility.png" alt="Most biological AI models share their code or data" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/sentinelbio-finetuning-sources.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/sentinelbio-finetuning-sources.png" alt="ESM-2 is the most common base model for finetuning" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/sentinelbio-training-datasets.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/sentinelbio-training-datasets.png" alt="The Protein Data Bank is the most commonly used training dataset" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 20 Feb 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">a2e17ae9-19ec-4758-a142-8e047bf1b4a8</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/a2e17ae9-19ec-4758-a142-8e047bf1b4a8.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=David%2520Atanasov%252C%2520Niccol%25C3%25B2%2520Zanichelli%252C%2520Jean-Stanislas%2520Denain&amp;title=%22Expanding%20our%20analysis%20of%20biological%20AI%20models%22%20by%20David%20Atanasov%2C%20Niccol%C3%B2%20Zanichelli%2C%20Jean-Stanislas%20Denain&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fexpanding-our-analysis-of-biological-ai-models&amp;created_at=2026-05-18T16%3A27%3A12.196293%2B00%3A00&amp;duration=1313" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/expanding-our-analysis-of-biological-ai-models</link>
      <itunes:duration>1313</itunes:duration>
    </item>
    <item>
      <title>“How persistent is the inference cost burden?” by Jean-Stanislas Denain</title>
      <description>&lt;p&gt; Subtitle: Toby Ord argues that RL scaling primarily increases inference costs, creating a persistent economic burden. While the framing is useful, the cost to reach a given capability level falls fast, and the RL scaling data is thin.&lt;/p&gt;  &lt;p&gt; Toby Ord has written a thoughtful post on how RL and inference compute scale for frontier AI models.&lt;/p&gt;
&lt;p&gt; As I understand it, the core of his argument is&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; (1) RL scaling primarily bears fruit by enabling models to productively use longer outputs, which means you need to scale inference compute to realize the gains&lt;/li&gt;
&lt;li&gt; (2) RL scaling itself delivers poor returns, requiring roughly 10,000x more compute to match what 100x more inference provides.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt; Combined with the fact that inference costs are per-use and can’t be amortized like training costs, this paints a picture of a significant and persistent economic burden as we shift away from pretraining scaling.&lt;/p&gt;
&lt;p&gt; There's a lot I agree with in Toby's analysis, and I find the framing useful. However, I think both claims above may be overstated. On (1): even though inference costs are per-use, the dollar cost to reach a given capability level falls rapidly over time, so [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:41) What I agree with&lt;/p&gt;&lt;p&gt;(02:46) Fixed-capability costs fall fast&lt;/p&gt;&lt;p&gt;(06:33) The returns to RL scaling might be higher&lt;/p&gt;&lt;p&gt;(09:07) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 5 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 16th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/how-persistent-is-the-inference-cost-burden?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/how-persistent-is-the-inference-cost-burden&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-persistent-is-the-inference-cost-burden/ScaleRL.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-persistent-is-the-inference-cost-burden/ScaleRL.png" alt="Graph comparing ScaleRL performance with other methods across GPU hours and pass rates." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 16 Feb 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">b4b89f2a-1ad6-4168-ba16-7a9ed773abd7</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/b4b89f2a-1ad6-4168-ba16-7a9ed773abd7.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jean-Stanislas%2520Denain&amp;title=%22How%20persistent%20is%20the%20inference%20cost%20burden%3F%22%20by%20Jean-Stanislas%20Denain&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fhow-persistent-is-the-inference-cost-burden&amp;created_at=2026-05-18T13%3A42%3A36.408425%2B00%3A00&amp;duration=615" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/how-persistent-is-the-inference-cost-burden</link>
      <itunes:duration>615</itunes:duration>
    </item>
    <item>
      <title>“What do “economic value” benchmarks tell us?” by Florian Brand, Greg Burnham</title>
      <description>&lt;p&gt; Subtitle: These benchmarks track a wide range of digital work. Progress will correlate with economic utility, but tasks are too self-contained to indicate full automation.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; We review three recently-developed benchmarks that aim to measure whether AI systems can perform real-world, digital, non-coding tasks of economic value: Remote Labor Index (RLI), GDPval, and APEX-Agents.&lt;/p&gt;&lt;p&gt; We expect progress on these benchmarks to correlate with real utility. However, the benchmark tasks are well-defined and relatively self-contained. High scores on the benchmarks, therefore, would not imply end-to-end automation of digital professions. Instead, it would imply a shift in how these jobs are done, away from manual execution and toward delegating work to AI, similar to the effect that coding agents have on software engineering today.&lt;/p&gt;&lt;p&gt; The benchmarks also have important differences. We give a short take-away for each.&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; RLI measures AI ability to do multimedia projects that take humans several days. The first batch of evaluations likely under-elicited models, but this has been improved recently, and top scores are still very low (&amp;lt;5%).&lt;/li&gt;
&lt;li&gt; APEX-Agents measures AI ability to do tasks across classically high-paid white collar jobs that take experts a couple hours. Task instructions are self-contained, but the task [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:24) Introduction&lt;/p&gt;&lt;p&gt;(02:13) Example tasks&lt;/p&gt;&lt;p&gt;(04:46) Tasks are sourced in different ways, from different fields&lt;/p&gt;&lt;p&gt;(08:19) Lack of interaction and environment messiness affect task realism&lt;/p&gt;&lt;p&gt;(09:59) Estimated human time-to-complete varies substantially&lt;/p&gt;&lt;p&gt;(11:49) Web access makes evaluation a bit less reliable for GDPval&lt;/p&gt;&lt;p&gt;(13:12) Evaluation strategies differ substantially&lt;/p&gt;&lt;p&gt;(14:42) Models have made different amounts of progress on the benchmarks&lt;/p&gt;&lt;p&gt;(16:02) The chosen scaffolds may under-elicit capabilities&lt;/p&gt;&lt;p&gt;(19:28) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 7 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 13th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/what-do-economic-value-benchmarks-tell-us?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/what-do-economic-value-benchmarks-tell-us&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/what-do-economic-value-benchmarks-tell-us/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/what-do-economic-value-benchmarks-tell-us/figure-1.png" alt="Bar graph titled "Human task completion time varies considerably across 'economic value' benchmarks" showing completion times for three benchmarks." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 13 Feb 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">6e5db8a2-a3ce-4f63-9335-a28e7e5cfa5d</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/6e5db8a2-a3ce-4f63-9335-a28e7e5cfa5d.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Florian%2520Brand%252C%2520Greg%2520Burnham&amp;title=%22What%20do%20%E2%80%9Ceconomic%20value%E2%80%9D%20benchmarks%20tell%20us%3F%22%20by%20Florian%20Brand%2C%20Greg%20Burnham&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fwhat-do-economic-value-benchmarks-tell-us&amp;created_at=2026-05-18T16%3A27%3A13.308507%2B00%3A00&amp;duration=1267" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/what-do-economic-value-benchmarks-tell-us</link>
      <itunes:duration>1267</itunes:duration>
    </item>
    <item>
      <title>“Where Autonomy Works: Evaluating Robot Capabilities in 2026” by Yann Rivière, Jean-Stanislas Denain</title>
      <description>&lt;p&gt; Subtitle: We assess the current state of autonomous robotics by evaluating robot performance on concrete tasks across industrial, household, and navigation domains.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Impressive demos are not hard to come by in autonomous robotics. Forming a precise understanding of real-world capabilities is much harder: a task that looks solved in a demonstration may be brittle in deployment. This report assesses robot performance on concrete tasks across three domains (industrial, household, and navigation). For each task, we review the available evidence on reliability, speed, cost, and the ability to adapt (“transfer”) to new environments and objects.&lt;/p&gt;&lt;p&gt;&lt;strong&gt; Key takeaways&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; Navigation is deployed commercially, while most industrial and household tasks are not. Autonomous robots already deliver food in multiple cities, transport goods in warehouses, and inspect infrastructure in remote environments with high reliability. Most tasks requiring robots to handle, assemble, or manipulate objects remain largely in the lab.&lt;/li&gt;
&lt;li&gt; Manipulation is commercially deployed in controlled environments with simple tasks, but mostly not beyond. Warehouse picking is the clearest example: robots can handle thousands of object types reliably, because the environment is stable, can be designed around the robot, and the task itself is straightforward. The further we move from that [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:27) Introduction&lt;/p&gt;&lt;p&gt;(01:01) Key takeaways&lt;/p&gt;&lt;p&gt;(02:50) Foundation models have become the default for robot manipulation&lt;/p&gt;&lt;p&gt;(05:20) Methodology&lt;/p&gt;&lt;p&gt;(07:21) 1. Industrial applications&lt;/p&gt;&lt;p&gt;(10:13) Tasks we assess&lt;/p&gt;&lt;p&gt;(11:10) Industrial applications: Takeaways&lt;/p&gt;&lt;p&gt;(12:37) 1. Connect cables inside a PC case&lt;/p&gt;&lt;p&gt;(14:39) 2. Assemble IKEA furniture&lt;/p&gt;&lt;p&gt;(16:08) 3. Insert small objects with high precision&lt;/p&gt;&lt;p&gt;(19:01) 4. Sort and handle packages on a conveyor belt&lt;/p&gt;&lt;p&gt;(20:50) 5. Pick and place items of varying fragility and weight&lt;/p&gt;&lt;p&gt;(23:13) 2. Household&lt;/p&gt;&lt;p&gt;(24:42) Tasks we assess&lt;/p&gt;&lt;p&gt;(25:15) Household tasks: Takeaways&lt;/p&gt;&lt;p&gt;(26:59) 1. Cook simple meals&lt;/p&gt;&lt;p&gt;(30:01) 2. Water plants&lt;/p&gt;&lt;p&gt;(31:58) 3. Clean a kitchen&lt;/p&gt;&lt;p&gt;(35:13) 4. Take out the trash&lt;/p&gt;&lt;p&gt;(36:45) 5. Fold basic laundry&lt;/p&gt;&lt;p&gt;(39:59) 6. Tidy a bedroom&lt;/p&gt;&lt;p&gt;(42:18) 3. Navigation&lt;/p&gt;&lt;p&gt;(43:31) Tasks we assess&lt;/p&gt;&lt;p&gt;(44:12) Navigation: Takeaways&lt;/p&gt;&lt;p&gt;(45:23) 1. Transport a human in a building&lt;/p&gt;&lt;p&gt;(47:49) 2. Deliver food&lt;/p&gt;&lt;p&gt;(49:42) 3. Maneuver underwater&lt;/p&gt;&lt;p&gt;(53:30) Looking ahead&lt;/p&gt;&lt;p&gt;(54:35) Acknowledgements&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 7 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 10th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/where-autonomy-works-evaluating-robot-capabilities-in-2026?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/where-autonomy-works-evaluating-robot-capabilities-in-2026&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image8.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image8.png" alt="Chart showing autonomous robot development stages across industrial, household, and navigation tasks by deployment readiness level." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image18.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image18.png" alt="UC Berkeley’s robot assembling a simple IKEA shelf" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image11.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image11.jpg" alt="Figure’s robot handling packages on a conveyor belt" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image12.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image12.png" alt="Vulcan picking a pack of Skittles" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image15.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image15.jpg" alt="Stretch unloading cardboard boxes from a container" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image7.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image7.png" alt="Zippy’s robotic arms cooking with pans and pots" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image6.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image6.png" alt="MindOn’s Unitree G1 robot watering flowers" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image13.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image13.png" alt="Figure’s robot loading dishes and adding detergent" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image3.png" alt="Physical Intelligence’s robotic arms storing cooking utensils" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image2.png" alt="Physical Intelligence’s robotic arms scraping food and cleaning dishes" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image5.png" alt="Figure’s robot folding towels" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image9.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image9.png" alt="Physical Intelligence’s robot moving clothes into a laundry basket" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image1.png" alt="Physical Intelligence’s robot making a bed" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image4.png" alt="WHILL’s autonomous wheelchair transporting someone into an elevator*" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image14.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image14.jpg" alt="Starship’s wheeled robot delivering food for Uber Eats*" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image16.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image16.png" alt="Zipline’s drone delivering food" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image17.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/where-autonomy-works-evaluating-robot-capabilities-in-2026/image17.jpg" alt="Advanced Navigation’s Hydrus navigating underwater" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 10 Feb 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">64334a76-2fdd-4fec-bf02-9407b1a88d18</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/64334a76-2fdd-4fec-bf02-9407b1a88d18.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Yann%2520Rivi%25C3%25A8re%252C%2520Jean-Stanislas%2520Denain&amp;title=%22Where%20Autonomy%20Works%3A%20Evaluating%20Robot%20Capabilities%20in%202026%22%20by%20Yann%20Rivi%C3%A8re%2C%20Jean-Stanislas%20Denain&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fwhere-autonomy-works-evaluating-robot-capabilities-in-2026&amp;created_at=2026-05-18T16%3A09%3A20.442856%2B00%3A00&amp;duration=3302" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/where-autonomy-works-evaluating-robot-capabilities-in-2026</link>
      <itunes:duration>3302</itunes:duration>
    </item>
    <item>
      <title>“How close is AI to taking my job?” by Anson Ho</title>
      <description>&lt;p&gt; Subtitle: Beyond benchmarks as leading indicators for task automation. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt; 1. Searching under the streetlight&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; How can we anticipate when AI will be able to do our jobs? AI researchers have mainly tried to answer this question by building complex AI benchmarks. The problem is that this approach is fundamentally flawed.&lt;/p&gt;&lt;p&gt; A good example of this is OpenAI's GDPval. On paper, it's a cool benchmark that captures AI performance on a wide range of real-world job tasks in the US economy. The benchmark tasks were meticulously constructed to be realistic, involving the hard work of hundreds of experts and likely millions of dollars — placing it among the most expensive economics papers of all time.1 If there's one benchmark that could be the leading indicator of AI job automation, it's GDPval.&lt;/p&gt;&lt;p&gt; Unfortunately, the benchmark seems to have fallen prey to the same issue plaguing most other benchmarks. Shortly after release, AI models have beaten the human baseline — GPT-5.2 reached parity with industry experts, and Claude Opus 4.6 likely does even better. And yet, the actual economic impacts of AI remain muted. The benchmark doesn’t fully reflect the economic effects, and so it's falling short in its role [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:16) 1. Searching under the streetlight&lt;/p&gt;&lt;p&gt;(03:09) 2. Trying to automate my job away (for science)&lt;/p&gt;&lt;p&gt;(04:48) Task 1: Replicating an interactive web interface for an economic model&lt;/p&gt;&lt;p&gt;(09:05) Task 2: Writing an article&lt;/p&gt;&lt;p&gt;(14:43) Task 3: Publishing an article&lt;/p&gt;&lt;p&gt;(19:56) 3. What this all means for my job, and perhaps yours too&lt;/p&gt;&lt;p&gt;(23:18) What about you and your job?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 9 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 6th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/how-close-is-ai-to-taking-my-job?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/how-close-is-ai-to-taking-my-job&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-close-is-ai-to-taking-my-job/announcing-gate-banner.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-close-is-ai-to-taking-my-job/announcing-gate-banner.png" alt="Dashboard showing "GATE — AI and Automation Scenario Explorer" with multiple graphs on automation and compute trends." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-close-is-ai-to-taking-my-job/claude_code_gate_reimplementation.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-close-is-ai-to-taking-my-job/claude_code_gate_reimplementation.png" alt="Interactive economic simulation dashboard showing GWP growth projections and automation trends over time." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-close-is-ai-to-taking-my-job/revenue_graph_feedback.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-close-is-ai-to-taking-my-job/revenue_graph_feedback.png" alt="Bar chart titled "AI Lab Revenue: Forecasters Underestimated by 90%" showing three revenue bars." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-close-is-ai-to-taking-my-job/substack_footnotes.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-close-is-ai-to-taking-my-job/substack_footnotes.png" alt="ChatGPT Agent successfully ports the main text of our last Gradient Update, but messes up the footnotes." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-close-is-ai-to-taking-my-job/daniel_litt_podcast.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-close-is-ai-to-taking-my-job/daniel_litt_podcast.png" alt="The bottleneck is shifting from us Epochians preparing good questions in advance, to being able to understand them and ask good follow ups during real conversation. But there may still be some time before Humanity’s Last Podcast. (Source: Epoch After Hours)" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-close-is-ai-to-taking-my-job/cherny_claude_code.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-close-is-ai-to-taking-my-job/cherny_claude_code.png" alt="Boris Cherny tweets: "1/ I run 5 Claudes in parallel in my terminal. I number my tabs 1-5, and use system notifications to know when a Claude needs input". The tweet includes a screenshot showing a terminal window with multiple tabs running Claude instances, displaying code with TypeScript imports and type definitions, along with bash commands for running typecheck and linting operations, with one tab showing "build completed successfully!" message." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 06 Feb 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">ca72203a-3d35-47d1-ab90-08155c185b7f</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/ca72203a-3d35-47d1-ab90-08155c185b7f.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho&amp;title=%22How%20close%20is%20AI%20to%20taking%20my%20job%3F%22%20by%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fhow-close-is-ai-to-taking-my-job&amp;created_at=2026-05-18T13%3A43%3A32.572499%2B00%3A00&amp;duration=1578" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/how-close-is-ai-to-taking-my-job</link>
      <itunes:duration>1578</itunes:duration>
    </item>
    <item>
      <title>“Can AI companies become profitable?” by Jaime Sevilla, Hannah Petrovic, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: Lessons from GPT-5's economics. &lt;/p&gt; 
&lt;p&gt; This post was written in collaboration with Exponential View.&lt;/p&gt;
&lt;p&gt; Update (March 6, 2026): We’ve revised our estimates based on new information and feedback from people familiar with the matter. In particular, we’ve 1) increased our estimate of inference costs given new information, and 2) lowered our estimate of sales and marketing spending after excluding inference compute costs associated with serving free users. This article reflects these updated figures.&lt;/p&gt;
&lt;p&gt; Are AI models profitable? If you ask Sam Altman and Dario Amodei, the answer seems to be yes — it just doesn’t appear that way on the surface.&lt;/p&gt;
&lt;p&gt; Here's the idea: running each AI model generates enough revenue to cover its own R&amp;amp;D costs. But that surplus gets outweighed by the costs of developing the next big model. So, despite making money on each model, companies can lose money each year.&lt;/p&gt;
&lt;p&gt; This is big if true. In fast-growing tech sectors, investors typically accept losses today in exchange for big profits down the line. So if AI models are already covering their own costs, that would paint a healthy financial outlook for AI companies.&lt;/p&gt;
&lt;p&gt; But we can’t take Altman and [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:29) Part I: How profitable is running AI models?&lt;/p&gt;&lt;p&gt;(08:13) Part II: Are models profitable over their lifecycle?&lt;/p&gt;&lt;p&gt;(11:07) Part III: Will AI models become profitable?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 26 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 28th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/can-ai-companies-become-profitable?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/can-ai-companies-become-profitable&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/can-ai-companies-become-profitable/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/can-ai-companies-become-profitable/figure-1.png" alt="A waterfall chart showing GPT-5 revenue breakdown from total revenue to net loss." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/can-ai-companies-become-profitable/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/can-ai-companies-become-profitable/figure-2.png" alt="Graph comparing OpenAI's GPT-5 R&amp;amp;D costs from April to August 2025 versus profits August to December." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 28 Jan 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">eb0c6eec-2ed1-47da-a774-4db629cc9896</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/eb0c6eec-2ed1-47da-a774-4db629cc9896.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla%252C%2520Hannah%2520Petrovic%252C%2520Anson%2520Ho&amp;title=%22Can%20AI%20companies%20become%20profitable%3F%22%20by%20Jaime%20Sevilla%2C%20Hannah%20Petrovic%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fcan-ai-companies-become-profitable&amp;created_at=2026-05-18T13%3A45%3A40.211649%2B00%3A00&amp;duration=1007" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/can-ai-companies-become-profitable</link>
      <itunes:duration>1007</itunes:duration>
    </item>
    <item>
      <title>“How well did forecasters predict 2025 AI progress?” by Anson Ho</title>
      <description>&lt;p&gt; Subtitle: Mostly right about benchmarks, mixed results on real-world impacts. &lt;/p&gt;  &lt;p&gt; This post was written in collaboration between the AI Futures Project, the AI Digest, and Epoch AI. It analyzes the results of the 2025 AI Digest survey. You can take the 2026 AI forecasting survey here.&lt;/p&gt;
&lt;p&gt; Every other AI paper I read seems to start with some version of “AI progress has been fast”. And sure, that's obviously true — a year ago there was no GPT-5, no DeepSeek-R1, and not even Claude 3.7 Sonnet! But few people seem to say exactly how fast things have been, and whether people saw it coming. So when the AI Digest released a survey for people to forecast AI progress over the last year, I was excited.&lt;/p&gt;
&lt;p&gt; The survey helps track something akin to an “AI 2027 worldview”, where automating AI R&amp;amp;D leads to a surge in AI capabilities and hence a range of risks to humanity, especially from handing power off to AI systems. You can see this in the question topics: about half of the survey is about forecasting performance on benchmarks related to AI R&amp;amp;D. The other half looks at real-world impacts — think AI-enabled [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:48) Demographics: Junior, short-ish timelines, high risk of AI catastrophe&lt;/p&gt;&lt;p&gt;(04:11) Benchmarks related to AI R&amp;amp;D: The median forecast was on the money (for the most part)&lt;/p&gt;&lt;p&gt;(08:46) OpenAI preparedness scores: mixed results on risks&lt;/p&gt;&lt;p&gt;(11:55) AI's prominence: underestimated revenue, and overestimated public perception&lt;/p&gt;&lt;p&gt;(12:32) Frontier lab revenues&lt;/p&gt;&lt;p&gt;(15:32) Public attention on AI&lt;/p&gt;&lt;p&gt;(16:59) Takeaways from the survey&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 16th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/how-well-did-forecasters-predict-2025-ai-progress?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/how-well-did-forecasters-predict-2025-ai-progress&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-well-did-forecasters-predict-2025-ai-progress/demographics_experience.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-well-did-forecasters-predict-2025-ai-progress/demographics_experience.png" alt="A histogram showing distribution of respondents' years of AI experience with median at 2 years." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-well-did-forecasters-predict-2025-ai-progress/demographics_hlmi_years.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-well-did-forecasters-predict-2025-ai-progress/demographics_hlmi_years.png" alt="A histogram showing respondent predictions for when HLMI will be developed, median 2030." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-well-did-forecasters-predict-2025-ai-progress/demographics_hlmi_risk.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-well-did-forecasters-predict-2025-ai-progress/demographics_hlmi_risk.png" alt="Histogram titled "Respondents vary significantly on the chances of HLMI risk" showing probability distribution." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-well-did-forecasters-predict-2025-ai-progress/frontiermath_hlmi_year.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-well-did-forecasters-predict-2025-ai-progress/frontiermath_hlmi_year.png" alt="Box plots comparing FrontierMath forecast performance by expected HLMI development timeline." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-well-did-forecasters-predict-2025-ai-progress/eli_ege_timelines_crux.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-well-did-forecasters-predict-2025-ai-progress/eli_ege_timelines_crux.png" alt="Note that at the time of writing the tweet, Eli had a median AGI timeline of around 2031, which still counts as having short timelines, even if he doesn’t fall into the “HLMI-by-2030” group of respondents." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/how-well-did-forecasters-predict-2025-ai-progress/ai-2025-forecast-new.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/how-well-did-forecasters-predict-2025-ai-progress/ai-2025-forecast-new.png" alt="For the most updated version of this, see the “Summary” tab of the original forecasting survey." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 16 Jan 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">dac60b8d-622e-4d2d-9efc-ef4650bd6a44</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/dac60b8d-622e-4d2d-9efc-ef4650bd6a44.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho&amp;title=%22How%20well%20did%20forecasters%20predict%202025%20AI%20progress%3F%22%20by%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fhow-well-did-forecasters-predict-2025-ai-progress&amp;created_at=2026-05-18T20%3A12%3A22.6747%2B00%3A00&amp;duration=1200" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/how-well-did-forecasters-predict-2025-ai-progress</link>
      <itunes:duration>1200</itunes:duration>
    </item>
    <item>
      <title>“Epoch AI 2025 impact report” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: In 2025, Epoch AI published over a hundred outputs, more than doubled its reach and raised over ten million dollars.&lt;/p&gt;  &lt;p&gt; In 2025, we saw AI continue to increase in scale and importance. AI companies reached annual revenues totalling tens of billions of dollars, and are building data centers that individually cost comparable amounts. Leading benchmarks show capabilities accelerating, propped up by the establishment of reasoning models, such as OpenAI's oN model series. And we have seen an incredible diffusion of capabilities, with Chinese open weight models such as DeepSeek R1 closing in the gap with US frontier models released only months before.&lt;/p&gt;
&lt;p&gt; Epoch AI has responded with new and expanded initiatives to advance its mission of sharing up-to-date information about – and making sense of – the trajectory of AI. We are excited to share a recap of our work in 2025, and our plans for 2026.&lt;/p&gt;
&lt;p&gt; We are raising $3 million to execute a more ambitious version of our plans. Donations can be made directly through our website. For those considering a substantial contribution, or commissioning a project, please contact us at donate@epoch.ai.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Highlights from 2025&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt; AI data centers &amp;amp; compute clusters&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; AI [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:29) Highlights from 2025&lt;/p&gt;&lt;p&gt;(01:32) AI data centers &amp;amp; compute clusters&lt;/p&gt;&lt;p&gt;(02:29) The Benchmarking Hub &amp;amp; the Epoch Capabilities Index (ECI)&lt;/p&gt;&lt;p&gt;(03:56) FrontierMath Tier 4&lt;/p&gt;&lt;p&gt;(05:17) Growth and AI Transition Endogenous (GATE) model&lt;/p&gt;&lt;p&gt;(06:19) Data Insights &amp;amp; Gradient Updates&lt;/p&gt;&lt;p&gt;(07:44) AI in 2030&lt;/p&gt;&lt;p&gt;(08:44) Epoch AI by the numbers&lt;/p&gt;&lt;p&gt;(08:47) Outputs&lt;/p&gt;&lt;p&gt;(09:32) Reach&lt;/p&gt;&lt;p&gt;(10:22) Finances and organization&lt;/p&gt;&lt;p&gt;(10:53) Press and citations&lt;/p&gt;&lt;p&gt;(13:17) Paid engagements&lt;/p&gt;&lt;p&gt;(14:43) Events and other engagements&lt;/p&gt;&lt;p&gt;(15:46) Testimonials from our audience&lt;/p&gt;&lt;p&gt;(19:32) Governance and Transparency&lt;/p&gt;&lt;p&gt;(20:23) Our plans for 2026&lt;/p&gt;&lt;p&gt;(21:02) Data &amp;amp; Trends&lt;/p&gt;&lt;p&gt;(22:54) Evaluations &amp;amp; Benchmarks&lt;/p&gt;&lt;p&gt;(24:55) Research &amp;amp; Consultations&lt;/p&gt;&lt;p&gt;(26:03) Website and communications&lt;/p&gt;&lt;p&gt;(26:55) Support our work&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 16th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/epoch-impact-report-2025?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/epoch-impact-report-2025&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/epoch-impact-report-2025/datacenters.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/epoch-impact-report-2025/datacenters.png" alt="Datacenters" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/epoch-impact-report-2025/benchmarking.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/epoch-impact-report-2025/benchmarking.png" alt="Benchmarking Hub" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/epoch-impact-report-2025/fm-tier-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/epoch-impact-report-2025/fm-tier-4.png" alt="FrontierMath Tier 4" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/epoch-impact-report-2025/gate.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/epoch-impact-report-2025/gate.png" alt="GATE" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/epoch-impact-report-2025/dis-and-gus.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/epoch-impact-report-2025/dis-and-gus.png" alt="Data Insights and Gradient Updates" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/epoch-impact-report-2025/ai-2030.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/epoch-impact-report-2025/ai-2030.png" alt="AI in 2030" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/federal-reserve.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/federal-reserve.svg" alt="FEDERAL RESERVE (FEDS NOTES) logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/financial-times.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/financial-times.svg" alt="FINANCIAL TIMES logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/imf.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/imf.svg" alt="IMF logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/economist.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/economist.svg" alt="THE ECONOMIST logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/iea.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/iea.svg" alt="INTERNATIONAL ENERGY AGENCY (IEA) logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/bank-of-england.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/bank-of-england.svg" alt="BANK OF ENGLAND logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/oecd-ai.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/oecd-ai.svg" alt="OECD.AI (AI WONK) logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/us-congress.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/us-congress.svg" alt="U.S. CONGRESS / CRS (CONGRESS.GOV) logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/ted.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/ted.svg" alt="TED TALKS logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/ai-2027.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/ai-2027.png" alt="AI 2027 logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 16 Jan 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">dc9943e9-77bf-4191-82db-a6c065aef1a0</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/dc9943e9-77bf-4191-82db-a6c065aef1a0.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Epoch%20AI%202025%20impact%20report%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fepoch-impact-report-2025&amp;created_at=2026-05-18T16%3A27%3A14.106549%2B00%3A00&amp;duration=1747" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/epoch-impact-report-2025</link>
      <itunes:duration>1747</itunes:duration>
    </item>
    <item>
      <title>“Introducing the AI Chip Sales Data Explorer” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: We announce our new AI Chip Sales data explorer, which uses financial reports, company disclosures, and more to estimate compute, power usage, and spending over time for a wide variety of AI chips.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Discussions about AI progress increasingly hinge on computing capacity – aka compute – which is essential in order to develop, train, and deploy AI systems. But public data on the total capacity of AI computing hardware can be fragmented and incomplete.&lt;/p&gt;&lt;p&gt; To address this,
we are releasing a new AI Chip Sales data explorer, estimating and visualizing both the number and capacity of AI accelerators that have been sold or delivered in recent years. We leverage data and evidence from earnings reports, company disclosures, analyst coverage, and media reporting to produce estimates of AI chip counts across major vendors: Nvidia, Google, Amazon, AMD, and Huawei, broken down by AI chip model.&lt;/p&gt;&lt;p&gt; We believe this release provides the most complete publicly available picture to date on the global stock of AI compute.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Compute&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; We find that cumulative global AI compute capacity has reached the equivalent of more than 15 million Nvidia H100 GPUs, measured using each chip's respective peak specifications in 8-bit operations [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:26) Introduction&lt;/p&gt;&lt;p&gt;(01:23) Compute&lt;/p&gt;&lt;p&gt;(02:18) Costs and power&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 13th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/introducing-the-ai-chip-sales-data-explorer?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/introducing-the-ai-chip-sales-data-explorer&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-sales-data-explorer/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-sales-data-explorer/figure-1.png" alt="Stacked area chart titled "AI Chip Sales" showing cumulative compute capacity by company." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-sales-data-explorer/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-sales-data-explorer/figure-2.png" alt="A stacked bar chart showing AI chip sales share of cost by chip type over quarters." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-sales-data-explorer/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-sales-data-explorer/figure-3.png" alt="A stacked bar chart showing AI chip sales cost by designer across quarters." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-sales-data-explorer/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2026/introducing-the-ai-chip-sales-data-explorer/figure-4.png" alt="Stacked bar chart showing cumulative AI chip sales power by designer across quarters." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 13 Jan 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">ff64b35f-9382-457e-a18e-2b47b609072c</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/ff64b35f-9382-457e-a18e-2b47b609072c.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Introducing%20the%20AI%20Chip%20Sales%20Data%20Explorer%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fintroducing-the-ai-chip-sales-data-explorer&amp;created_at=2026-05-18T16%3A27%3A15.171171%2B00%3A00&amp;duration=216" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/introducing-the-ai-chip-sales-data-explorer</link>
      <itunes:duration>216</itunes:duration>
    </item>
    <item>
      <title>“An FAQ on Reinforcement Learning Environments” by Jean-Stanislas Denain, Chris Barber</title>
      <description>&lt;p&gt; Subtitle: We interviewed 18 people across RL environment startups, neolabs, and frontier labs about the state of the field and where it's headed.&lt;/p&gt;  &lt;p&gt; This post is a collaboration between guest author Chris Barber and JS Denain from Epoch AI.&lt;/p&gt;
&lt;p&gt; Reinforcement learning (RL) environments have become central to how frontier AI labs train their models. In September 2025, The Information reported that Anthropic had discussed spending over $1 billion on RL environments over the following year. As Andrej Karpathy put it in his 2025 year-in-review: by training LLMs on a wide range of verifiable tasks across different environments, “the LLMs spontaneously develop strategies that look like ‘reasoning’ to humans.”&lt;/p&gt;
&lt;p&gt; This wave of RL for capabilities started with OpenAI's o1, which was trained on math and coding problems with verifiable answers. Since then, labs have expanded the range of tasks they train on, all the while scaling the amount of compute spent on RL training.&lt;/p&gt;
&lt;p&gt; Without diverse, high-quality environments and tasks to train on, throwing more compute at RL risks wasting much of it. As a result, creating those tasks and environments has become a key bottleneck for scaling capabilities, and a growing market that remains largely [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:47) What are RL environments and tasks?&lt;/p&gt;&lt;p&gt;(05:50) How are RL environments used by labs?&lt;/p&gt;&lt;p&gt;(08:06) Which companies build RL Environments?&lt;/p&gt;&lt;p&gt;(10:08) How much do environments and tasks cost?&lt;/p&gt;&lt;p&gt;(12:33) What domains do RL environments cover?&lt;/p&gt;&lt;p&gt;(16:22) What are the top priorities and challenges?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 10 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 12th, 2026 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/state-of-rl-envs?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/state-of-rl-envs&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2026/state-of-rl-envs/example_rl.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2026/state-of-rl-envs/example_rl.png" alt="Diagram showing reinforcement learning agent interacting with simulated Slack workspace environment." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 12 Jan 2026 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">5f10b116-fe3d-48dd-927f-5e221c306e48</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/5f10b116-fe3d-48dd-927f-5e221c306e48.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jean-Stanislas%2520Denain%252C%2520Chris%2520Barber&amp;title=%22An%20FAQ%20on%20Reinforcement%20Learning%20Environments%22%20by%20Jean-Stanislas%20Denain%2C%20Chris%20Barber&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fstate-of-rl-envs&amp;created_at=2026-05-18T13%3A47%3A36.235206%2B00%3A00&amp;duration=1272" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/state-of-rl-envs</link>
      <itunes:duration>1272</itunes:duration>
    </item>
    <item>
      <title>“How far can decentralized training over the internet scale?” by Jaime Sevilla</title>
      <description>&lt;p&gt; Subtitle: Decentralized training over the internet promises to scale training to the limits of the internet.&lt;/p&gt;  &lt;p&gt; Previously, I discussed decentralized training in the context of hyperscalers. Microsoft, Google and other giants are building interconnected gigawatt scale datacenters, which could be used to train models at an unprecedented computational scale. The decentralization could sidestep the difficulty of securing 10 gigawatts of power in a single location by splitting one massive run into ten more manageable gigawatt-scale blocks.&lt;/p&gt;
&lt;p&gt; But when people think of decentralized training, they don’t first think of gigantic datacenters, owned by the same company, training models across large distances. Instead, they imagine thousands of small datacenters, or individual consumers, pooling their spare compute over the internet to orchestrate a training run larger than any single actor could manage alone.&lt;/p&gt;
&lt;p&gt; Many companies are pursuing this vision: Pluralis Research, Prime Intellect and Nous Research have already successfully decentrally trained models at scale. But in practice, training decentrally over the internet has lagged far behind more centralized training. Even their largest models (Pluralis’ 8B Protocol Model, Prime Intellect's INTELLECT-1, and Nous’ Consilience 40B) have been trained with 1,000x less compute than today's frontier models (such as xAI's Grok [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:42) Is decentralized training over the internet feasible?&lt;/p&gt;&lt;p&gt;(04:04) Decentralized data parallelism&lt;/p&gt;&lt;p&gt;(08:26) Decentralized model parallelism&lt;/p&gt;&lt;p&gt;(10:28) Decentralized RL training&lt;/p&gt;&lt;p&gt;(12:49) Putting it all together: decentralized internet training at frontier scale is likely feasible&lt;/p&gt;&lt;p&gt;(15:15) Can decentralized developers amass the necessary compute?&lt;/p&gt;&lt;p&gt;(20:22) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 6 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 29th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/how-far-can-decentralized-training-over-the-internet-scale?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/how-far-can-decentralized-training-over-the-internet-scale&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-decentralized-training-over-the-internet-scale/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-decentralized-training-over-the-internet-scale/figure-1.png" alt="Slide showing three categories of decentralized training techniques with bullet points and examples." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-decentralized-training-over-the-internet-scale/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-decentralized-training-over-the-internet-scale/figure-2.png" alt="Figure from Douillard et al. (2023). Each node holds a replica of the model, and trains independently for a number of inner steps before synchronizing across nodes." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-decentralized-training-over-the-internet-scale/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-decentralized-training-over-the-internet-scale/figure-3.png" alt="Figure from the INTELLECT-2 release post, illustrating asynchronous RL training." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-decentralized-training-over-the-internet-scale/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-decentralized-training-over-the-internet-scale/figure-4.png" alt="Graph showing training compute growth from 2021 to 2026, comparing centralized and decentralized models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-decentralized-training-over-the-internet-scale/figure-5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-decentralized-training-over-the-internet-scale/figure-5.png" alt="Bar graph comparing effective throughput of decentralized AI networks to Bitcoin equivalent." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 29 Dec 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">08393229-b670-49bd-9499-640989e670ea</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/08393229-b670-49bd-9499-640989e670ea.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla&amp;title=%22How%20far%20can%20decentralized%20training%20over%20the%20internet%20scale%3F%22%20by%20Jaime%20Sevilla&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fhow-far-can-decentralized-training-over-the-internet-scale&amp;created_at=2026-05-18T13%3A49%3A37.683763%2B00%3A00&amp;duration=1393" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/how-far-can-decentralized-training-over-the-internet-scale</link>
      <itunes:duration>1393</itunes:duration>
    </item>
    <item>
      <title>“Why benchmarking is hard” by Florian Brand, Jean-Stanislas Denain</title>
      <description>&lt;p&gt; Subtitle: Running benchmarks involves many moving parts, each of which can influence the final score. The two most impactful components are scaffolds and API providers.&lt;/p&gt;  &lt;p&gt; This post is part of our Gradient Updates newsletter, which shares more opinionated or informal takes about big questions in AI progress. These posts solely represent the views of the authors, and do not necessarily reflect the views of Epoch AI as a whole.&lt;/p&gt;
&lt;p&gt; Benchmarks play a crucial role in the AI landscape: They inform everyone, from AI researchers to the general public, about the current state of capabilities and the overall rate of progress. Third-party organizations, such as Epoch AI, independently run and collate benchmark results on a page like the benchmarking hub.&lt;/p&gt;
&lt;p&gt; However, benchmarking isn’t easy: at each stage of the benchmarking pipeline, there are many moving parts and degrees of freedom that can affect the final result: this makes it hard to compare any two evaluation scores. Moreover, each stage can introduce bugs or mistakes that make the results costly to obtain or invalid.&lt;/p&gt;

&lt;p&gt; In this post, we dive into the different steps of the benchmarking process, which we split into two main parts:&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; Benchmark [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:03) Main takeaways&lt;/p&gt;&lt;p&gt;(02:33) The Benchmark Setup&lt;/p&gt;&lt;p&gt;(02:55) Prompts &amp;amp; Sampling Parameters&lt;/p&gt;&lt;p&gt;(06:12) Scaffolds continue to have an outsized impact&lt;/p&gt;&lt;p&gt;(08:01) Execution Environment&lt;/p&gt;&lt;p&gt;(09:15) Scoring&lt;/p&gt;&lt;p&gt;(10:01) Model Access&lt;/p&gt;&lt;p&gt;(10:15) API &amp;amp; SDK&lt;/p&gt;&lt;p&gt;(11:10) API Aggregator&lt;/p&gt;&lt;p&gt;(11:41) Model Provider&lt;/p&gt;&lt;p&gt;(15:29) Model Deployment&lt;/p&gt;&lt;p&gt;(16:04) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 5 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 23rd, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/why-benchmarking-is-hard?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/why-benchmarking-is-hard&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/why-benchmarking-is-hard/benchmark-evaluation-pipeline-v3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/why-benchmarking-is-hard/benchmark-evaluation-pipeline-v3.png" alt="Diagram showing benchmarking stages with choices impacting final scores in two columns." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/why-benchmarking-is-hard/swebench_comparison-colors.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/why-benchmarking-is-hard/swebench_comparison-colors.png" alt="Bar chart showing SWE-bench Verified scores across three agent scaffolds for GPT-5 and Kimi K2 models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/why-benchmarking-is-hard/GLM_4.6_GPQA_Diamond-def.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/why-benchmarking-is-hard/GLM_4.6_GPQA_Diamond-def.png" alt="Graph showing GLM-4.6's accuracy scores on GPQA Diamond across different API providers." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/why-benchmarking-is-hard/Kimi_K2_0905_Instruct_GPQA_Diamond.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/why-benchmarking-is-hard/Kimi_K2_0905_Instruct_GPQA_Diamond.png" alt="A scatter plot showing accuracy scores for Kimi K2 0905 Instruct across different API providers." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 23 Dec 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">0e04ffc5-41ad-487a-8322-1dd992bc77e8</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/0e04ffc5-41ad-487a-8322-1dd992bc77e8.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Florian%2520Brand%252C%2520Jean-Stanislas%2520Denain&amp;title=%22Why%20benchmarking%20is%20hard%22%20by%20Florian%20Brand%2C%20Jean-Stanislas%20Denain&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fwhy-benchmarking-is-hard&amp;created_at=2026-05-18T20%3A12%3A50.660009%2B00%3A00&amp;duration=1046" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/why-benchmarking-is-hard</link>
      <itunes:duration>1046</itunes:duration>
    </item>
    <item>
      <title>“Top 10 Data Insights and Gradient Updates of 2025” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: In 2025 we released over 70 short form investigations of AI. We review the 10 most popular ones on our website.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; In 2025, we ramped up our public communication to keep pace with rapid developments in AI.&lt;/p&gt;&lt;p&gt; Our Data Insights offer short, visual, self-contained investigations of key trends and metrics in AI.&lt;/p&gt;&lt;p&gt; Gradient Updates is our outlet for leading-edge commentary by specific authors (also offered as a newsletter on Substack), without necessarily representing the views of Epoch AI as a whole.&lt;/p&gt;&lt;p&gt; Over the year, we published 36 Data Insights and 37 Gradient Updates.&lt;/p&gt;&lt;p&gt; Here, we bring you our top 10 most popular Data Insights and Gradient Updates in 2025.1&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Most popular Data Insights&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt; LLM inference prices have fallen rapidly but unequally across tasks&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; 🠊 In short: Between April 2023 and March 2025, we saw a &amp;gt;10x and larger drop in the price per token at an equivalent performance level.&lt;/p&gt;&lt;p&gt; There's a chart here. The chart title reads: en-US-AvaMultilingualNeural__ LLM inference prices have fallen 9x to 900x per year, depending on the task &lt;/p&gt;&lt;p&gt; 🠊 Why this matters: API cost reductions indicate a more competitive market and large gains in efficiency, making AI more affordable to [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:23) Introduction&lt;/p&gt;&lt;p&gt;(01:06) Most popular Data Insights&lt;/p&gt;&lt;p&gt;(01:10) LLM inference prices have fallen rapidly but unequally across tasks&lt;/p&gt;&lt;p&gt;(01:55) Frontier AI performance becomes accessible on consumer hardware within a year&lt;/p&gt;&lt;p&gt;(03:03) Most of OpenAI's 2024 compute went to experiments&lt;/p&gt;&lt;p&gt;(03:52) The stock of computing power from NVIDIA chips is doubling every 10 months&lt;/p&gt;&lt;p&gt;(04:38) GPT-5 and GPT-4 were both major leaps in benchmarks from the previous generation&lt;/p&gt;&lt;p&gt;(05:21) Most popular Gradient Updates&lt;/p&gt;&lt;p&gt;(05:25) How much energy does ChatGPT use?&lt;/p&gt;&lt;p&gt;(06:20) How has DeepSeek improved the Transformer architecture?&lt;/p&gt;&lt;p&gt;(07:29) How far can reasoning models scale?&lt;/p&gt;&lt;p&gt;(08:21) How big could an "AI Manhattan Project" get?&lt;/p&gt;&lt;p&gt;(09:20) Most AI value will come from broad automation, not from R&amp;amp;D&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 23rd, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/top-10-data-insights-and-gradient-updates-of-2025?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/top-10-data-insights-and-gradient-updates-of-2025&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/data-insights/llm-inference-price-trends.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/data-insights/llm-inference-price-trends.png" alt="LLM inference prices have fallen 9x to 900x/year, depending on the task" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/data-insights/consumer-gpu-model-gap-gpqa-diamond.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/data-insights/consumer-gpu-model-gap-gpqa-diamond.png" alt="Models that fit on a single consumer GPU trail the absolute frontier by less than a year." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/data-insights/openai-compute-spend.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/data-insights/openai-compute-spend.png" alt="Most of OpenAI’s 2024 compute went to experiments" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/nvidia-chip-production.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/nvidia-chip-production.png" alt="Total installed NVIDIA computing power by GPU generation" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/data-insights/gpt-capabilities-progress.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/data-insights/gpt-capabilities-progress.png" alt="GPT-5 is an incremental update to the frontier, but a major leap from GPT-4" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/top-10-data-insights-and-gradient-updates-of-2025/how-much-energy-does-chatgpt-use-figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/top-10-data-insights-and-gradient-updates-of-2025/how-much-energy-does-chatgpt-use-figure-1.png" alt="Bar graph titled "Energy consumption per ChatGPT query is small compared to everyday electricity use"" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/top-10-data-insights-and-gradient-updates-of-2025/how-has-deepseek-improved-the-transformer-architecture-figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/top-10-data-insights-and-gradient-updates-of-2025/how-has-deepseek-improved-the-transformer-architecture-figure-1.png" alt="Architecture diagram of DeepSeekMoE transformer model with MLA routing system." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/top-10-data-insights-and-gradient-updates-of-2025/how-far-can-reasoning-models-scale-figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/top-10-data-insights-and-gradient-updates-of-2025/how-far-can-reasoning-models-scale-figure-1.png" alt="Graph showing AI training compute scaling trends and projected convergence by 2025-2026." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/top-10-data-insights-and-gradient-updates-of-2025/how-big-could-an-ai-manhattan-project-get-figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/top-10-data-insights-and-gradient-updates-of-2025/how-big-could-an-ai-manhattan-project-get-figure-3.png" alt="Bar chart showing investment levels as percentage of GDP from initial through 2027, titled "Manhattan Project levels of investment would capture most of US NVIDIA compute."" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/top-10-data-insights-and-gradient-updates-of-2025/most-ai-value-will-come-from-broad-automation-not-from-r-d-figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/top-10-data-insights-and-gradient-updates-of-2025/most-ai-value-will-come-from-broad-automation-not-from-r-d-figure-1.png" alt="A stacked bar chart showing GDP growth rate and R&amp;amp;D contributions across periods." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 23 Dec 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">81d9130b-35aa-4599-b63c-e7f1454f7813</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/81d9130b-35aa-4599-b63c-e7f1454f7813.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Top%2010%20Data%20Insights%20and%20Gradient%20Updates%20of%202025%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Ftop-10-data-insights-and-gradient-updates-of-2025&amp;created_at=2026-05-18T16%3A27%3A16.252721%2B00%3A00&amp;duration=693" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/top-10-data-insights-and-gradient-updates-of-2025</link>
      <itunes:duration>693</itunes:duration>
    </item>
    <item>
      <title>“The changing drivers of LLM adoption” by Jean-Stanislas Denain, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: Public data as well as our original polling suggest LLM adoption is roughly on trend, but the underlying drivers are shifting.&lt;/p&gt;  &lt;p&gt; In the world of AI, half a year is a very long time. Back in July, we saw LLMs being adopted faster than almost any other technology in history. Five months later we’re still seeing rapid growth, but we’re also seeing early winds of change — both in who uses AI and how they do so.&lt;/p&gt;
&lt;p&gt; Using the latest public data,1 and a poll of US adults we conducted with Blue Rose Research, this post shares an updated picture of the state of LLM adoption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; How quickly are consumers adopting LLMs?&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt; More people are using LLMs — but they’re increasingly using different LLMs, different products, and in different places&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Through the first half of 2025, ChatGPT's user base grew at a remarkable pace, from under 400 million weekly active users in January to nearly 800 million by August — roughly 50 million new users per month. Since then, growth may have slowed slightly, though it's a bit soon to tell how much of this is noise versus a lasting trend change:&lt;/p&gt;&lt;p&gt; Does this mean [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:52) How quickly are consumers adopting LLMs?&lt;/p&gt;&lt;p&gt;(00:56) More people are using LLMs -- but they're increasingly using different LLMs, different products, and in different places&lt;/p&gt;&lt;p&gt;(03:57) Consumers are using LLMs much more intensely on AI apps, while web traffic has stagnated&lt;/p&gt;&lt;p&gt;(07:13) AI company revenues have continued to grow incredibly fast, in line with previous trends&lt;/p&gt;&lt;p&gt;(08:07) How embedded is AI in daily tasks and jobs?&lt;/p&gt;&lt;p&gt;(08:24) AI has entered the workplace beyond formal enterprise adoption&lt;/p&gt;&lt;p&gt;(10:11) Most consumers use AI to seek information&lt;/p&gt;&lt;p&gt;(12:41) AI use is stratified by income and job type, and less so by gender&lt;/p&gt;&lt;p&gt;(14:44) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 8 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 19th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/the-changing-drivers-of-llm-adoption?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/the-changing-drivers-of-llm-adoption&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-changing-drivers-of-llm-adoption/GPT-users.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-changing-drivers-of-llm-adoption/GPT-users.png" alt="Graph showing weekly active users of ChatGPT from 2024 to 2026." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-changing-drivers-of-llm-adoption/GPT-vs-gemini.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-changing-drivers-of-llm-adoption/GPT-vs-gemini.png" alt="Line graph showing weekly visits to major LLM chatbots from September to December 2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-changing-drivers-of-llm-adoption/work-side-by-side.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-changing-drivers-of-llm-adoption/work-side-by-side.png" alt="Two bar graphs showing AI usage at work and workplace AI chatbot access among registered voters." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-changing-drivers-of-llm-adoption/use-cases.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-changing-drivers-of-llm-adoption/use-cases.png" alt="Bar graph showing AI usage by registered voters for different tasks in the past week." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-changing-drivers-of-llm-adoption/income.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-changing-drivers-of-llm-adoption/income.png" alt="Bar chart showing AI service usage across income groups over past 7 days." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-changing-drivers-of-llm-adoption/age.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-changing-drivers-of-llm-adoption/age.png" alt="Bar graph showing AI service usage by age group over past 7 days." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 19 Dec 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">47d5aa16-8b8c-4c73-baaa-113fc9ffbde7</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/47d5aa16-8b8c-4c73-baaa-113fc9ffbde7.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jean-Stanislas%2520Denain%252C%2520Anson%2520Ho&amp;title=%22The%20changing%20drivers%20of%20LLM%20adoption%22%20by%20Jean-Stanislas%20Denain%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fthe-changing-drivers-of-llm-adoption&amp;created_at=2026-05-18T14%3A04%3A21.648713%2B00%3A00&amp;duration=950" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/the-changing-drivers-of-llm-adoption</link>
      <itunes:duration>950</itunes:duration>
    </item>
    <item>
      <title>“Is almost everyone wrong about America’s AI power problem?” by Anson Ho, Yafah Edelman, Josh You, Jean-Stanislas Denain</title>
      <description>&lt;p&gt; Subtitle: Why power is less of a bottleneck than you think.&lt;/p&gt;  &lt;p&gt; In AI circles, there's a common argument that goes: “The US is horrible at building power, but China's great at it. And since power is so important for the AI race, China wins by default.”&lt;/p&gt;
&lt;p&gt; This line of reasoning is everywhere. NVIDIA CEO Jensen Huang used it to argue that “China is going to win the AI race” last month. It features in Situational Awareness, a series of essays about how the world's in a fierce race to AGI, which received a seal of endorsement from Ivanka Trump. There's even an entire Dwarkesh podcast episode called “China is killing the US on energy. Does that mean they’ll win AGI?”.&lt;/p&gt;
&lt;p&gt; But we think this argument is overstated — power bottlenecks likely won’t dramatically or permanently impede the data center buildout in the US. Claims about America's AI power predicament are partially based on a misunderstanding, and there are multiple promising approaches to meet America's AI's power demands. That means that people are overrating the strength of the power bottleneck, and how much that impacts the “race to AGI”.&lt;/p&gt;
&lt;p&gt; So why do we believe this, and [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:28) America's AI power predicament -- or not?&lt;/p&gt;&lt;p&gt;(04:26) Plucking fruit from the power tree&lt;/p&gt;&lt;p&gt;(04:45) Natural gas: Digging power out of the ground&lt;/p&gt;&lt;p&gt;(09:15) Solar: Raining power down from the sky&lt;/p&gt;&lt;p&gt;(13:13) Demand response: Acquiring power "out of thin air"&lt;/p&gt;&lt;p&gt;(15:55) Adding up the numbers&lt;/p&gt;&lt;p&gt;(18:20) So is everyone wrong about this?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 16 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 17th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/is-almost-everyone-wrong-about-americas-ai-power-problem?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/is-almost-everyone-wrong-about-americas-ai-power-problem&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/is-almost-everyone-wrong-about-americas-ai-power-problem/us_china_power.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/is-almost-everyone-wrong-about-americas-ai-power-problem/us_china_power.png" alt="Graph showing "American and Chinese Power Generation" comparing USA and China electricity generation trends with AI demand projections." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/is-almost-everyone-wrong-about-americas-ai-power-problem/real-power-price.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/is-almost-everyone-wrong-about-americas-ai-power-problem/real-power-price.png" alt="Data from the Federal Reserve Bank of St Louis, dividing average electricity prices by the Consumer Price Index (with January 2025 = 100)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/is-almost-everyone-wrong-about-americas-ai-power-problem/forecasted-total-capacity.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/is-almost-everyone-wrong-about-americas-ai-power-problem/forecasted-total-capacity.png" alt="Graph showing "Forecasted total capacity of U.S. AI data centers" with multiple projection scenarios through 2030." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/is-almost-everyone-wrong-about-americas-ai-power-problem/natural_gas_buildout.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/is-almost-everyone-wrong-about-americas-ai-power-problem/natural_gas_buildout.png" alt="Bar graph showing "US gas-fired generation capacity additions (GW)" over time with historical and forecast data." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/is-almost-everyone-wrong-about-americas-ai-power-problem/learning_curves.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/is-almost-everyone-wrong-about-americas-ai-power-problem/learning_curves.png" alt="Line graph showing electricity price changes from new power plants between 2009 and 2024." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/is-almost-everyone-wrong-about-americas-ai-power-problem/computational-costs.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/is-almost-everyone-wrong-about-americas-ai-power-problem/computational-costs.png" alt="Bar chart comparing power costs versus capital expenditures on compute per gigawatt." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 17 Dec 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">cf1f9dee-01b1-47de-8461-d4e275e6bee5</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/cf1f9dee-01b1-47de-8461-d4e275e6bee5.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho%252C%2520Yafah%2520Edelman%252C%2520Josh%2520You%252C%2520Jean-Stanislas%2520Denain&amp;title=%22Is%20almost%20everyone%20wrong%20about%20America%E2%80%99s%20AI%20power%20problem%3F%22%20by%20Anson%20Ho%2C%20Yafah%20Edelman%2C%20Josh%20You%2C%20Jean-Stanislas%20Denain&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fis-almost-everyone-wrong-about-americas-ai-power-problem&amp;created_at=2026-05-18T14%3A35%3A53.409623%2B00%3A00&amp;duration=1364" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/is-almost-everyone-wrong-about-americas-ai-power-problem</link>
      <itunes:duration>1364</itunes:duration>
    </item>
    <item>
      <title>“A Rosetta Stone for AI benchmarks” by Anson Ho, Jean-Stanislas Denain, David Atanasov, Samuel Albanie, Rohin Shah</title>
      <description>&lt;p&gt; Subtitle: Most benchmarks saturate too quickly to study long-run AI trends. We solve this using a statistical framework that stitches benchmarks together, with big implications for algorithmic progress and AI forecasting.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; We rely on benchmarks to measure AI capabilities, but even the best benchmarks are just narrow glimpses into what AI can do.&lt;/p&gt;&lt;p&gt; Consider a benchmark. If a model is really bad, it will score 0% on the benchmark. But the same is true for a model that's extremely bad — the benchmark offers no signal to distinguish these two models, even though one is much better than the other.&lt;/p&gt;&lt;p&gt; Similarly, a model that's really good will score 100% — but so will a model that's extremely good. We can’t tell these good models apart either.&lt;/p&gt;&lt;p&gt; We can only compare models when they’re in the middle — not too good and not too bad. And since models improve so quickly, their time in the middle is really short, so we can’t see long-run trends in whether AI progress is speeding up, slowing down, or hitting a wall.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; A New Approach&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; So how do we solve this? We propose a new approach: using a statistical model, we [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:29) Introduction&lt;/p&gt;&lt;p&gt;(01:33) A New Approach&lt;/p&gt;&lt;p&gt;(03:55) What this framework tells us&lt;/p&gt;&lt;p&gt;(07:10) Next steps&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 2 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 2nd, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/a-rosetta-stone-for-ai-benchmarks?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/a-rosetta-stone-for-ai-benchmarks&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-1.png" alt="Graph showing "Benchmark performance" with model capability on x-axis and percentage on y-axis." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-2.png" alt="Diagram showing how disparate AI benchmarks unify into a single quantitative difficulty scale." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-3.png" alt="Graph showing benchmark performance versus model capability minus benchmark difficulty, with three distinct regimes labeled." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-4.png" alt="Two bar graphs showing AI model capability scores and benchmark difficulty scores with error bars." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-5.png" alt="Scatter plot titled "Frontier model capabilities have grown steadily over time" showing estimated capability versus release date." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-6.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-6.png" alt="Graph showing AI capability forecast from 2023 to 2029 with confidence interval." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-7.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-7.png" alt="Graph showing training compute decreasing over time as algorithmic quality improves from July 2024 to January 2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-8.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/a-rosetta-stone-for-ai-benchmarks/figure-8.png" alt="Graph showing model capability acceleration detection over time from 2020 to 2028." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 02 Dec 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">49cadc91-7b27-41f2-b85b-dc28b187314d</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/49cadc91-7b27-41f2-b85b-dc28b187314d.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho%252C%2520Jean-Stanislas%2520Denain%252C%2520David%2520Atanasov%252C%2520Samuel%2520Albanie%252C%2520Rohin%2520Shah&amp;title=%22A%20Rosetta%20Stone%20for%20AI%20benchmarks%22%20by%20Anson%20Ho%2C%20Jean-Stanislas%20Denain%2C%20David%20Atanasov%2C%20Samuel%20Albanie%2C%20Rohin%20Shah&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fa-rosetta-stone-for-ai-benchmarks&amp;created_at=2026-05-18T16%3A27%3A17.166212%2B00%3A00&amp;duration=557" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/a-rosetta-stone-for-ai-benchmarks</link>
      <itunes:duration>557</itunes:duration>
    </item>
    <item>
      <title>“Benchmark Scores = General Capability + Claudiness” by Greg Burnham</title>
      <description>&lt;p&gt; Subtitle: Is this because skills generalize very well, or because developers are pushing on all benchmarks at once?&lt;/p&gt;  &lt;p&gt; The Gemini 3 release included a massive table showing how the model was state-of-the-art on nineteen diverse benchmarks. Such tables are commonplace by now, but they add up to an odd statistical situation. Benchmarks ostensibly measure different things, but since models tend to improve on many benchmarks at once, the dataset of benchmark scores is dominated by a single “General Capability” dimension.&lt;/p&gt;
&lt;p&gt; In this post, I’ll describe the statistics of this dataset, look into what's left when you factor out this dominant dimension (hint: it's “Claudiness”), and discuss how this relates to an important question about cross-task generalization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; Benchmarking data is dominated by a single underlying dimension&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; This is one of the lessons of our recent work on the Epoch Capabilities Index (ECI), which combines thirty-nine benchmarks into a single capabilities score. If benchmarks were generally uncorrelated with each other, you’d expect to see large residuals: the benchmark scores predicted by a model's ECI number wouldn’t match the model's actual benchmark scores. As it turns out, we see a very good match. In other words, our nominally high-dimensional [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:00) Benchmarking data is dominated by a single underlying dimension&lt;/p&gt;&lt;p&gt;(03:08) Benchmarking data shows a smaller "Claudiness" dimension&lt;/p&gt;&lt;p&gt;(04:16) Is the "general capability" dimension deep, or contingent?&lt;/p&gt;&lt;p&gt;(06:09) A trillion dollar question&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 3 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 20th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/benchmark-scores-general-capability-claudiness?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/benchmark-scores-general-capability-claudiness&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/benchmark-scores-general-capability-claudiness/residuals.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/benchmark-scores-general-capability-claudiness/residuals.png" alt="Scatter plot titled "The Epoch Capabilities Index (ECI) predicts benchmark scores very well in aggregate" showing positive correlation." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/benchmark-scores-general-capability-claudiness/pca.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/benchmark-scores-general-capability-claudiness/pca.png" alt="Scree plot titled "A single dimension explains nearly half of the variance in benchmark scores."" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 20 Nov 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">a0c0fe61-5514-4385-9f01-a65914602578</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/a0c0fe61-5514-4385-9f01-a65914602578.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Greg%2520Burnham&amp;title=%22Benchmark%20Scores%20%3D%20General%20Capability%20%2B%20Claudiness%22%20by%20Greg%20Burnham&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fbenchmark-scores-general-capability-claudiness&amp;created_at=2026-05-18T20%3A09%3A24.772021%2B00%3A00&amp;duration=496" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/benchmark-scores-general-capability-claudiness</link>
      <itunes:duration>496</itunes:duration>
    </item>
    <item>
      <title>“The software intelligence explosion debate needs experiments” by Anson Ho, Parker Whitfill</title>
      <description>&lt;p&gt; Subtitle: The existing debate rests on data and assumptions that are shakier than most people realize. To make progress, we need better evidence, and experiments are the best way to get it on the margin.&lt;/p&gt;  &lt;p&gt; Suppose you had a million AIs, each surpassing humanity's best AI researchers. If they all worked on advancing AI, how much would AI progress accelerate?&lt;/p&gt;
&lt;p&gt; This might sound like science fiction, but it may be the most consequential question about the future of AI. The problem is that the experts disagree wildly on the answer.&lt;/p&gt;
&lt;p&gt; Some foresee a positive feedback loop. These AIs are smart enough to find new algorithms to make smarter AIs, which make even smarter AIs, and so on. Very soon, we could see multiple years of AI progress compressed into a single year just through software advances — a “software intelligence explosion”.1&lt;/p&gt;
&lt;p&gt; Others agree that AI progress would speed up, but think that something will block the explosive feedback loop. For example, increasing difficulty in finding new algorithms might bottleneck AI self-improvement, or software improvements might depend heavily on physical resources like compute, which can’t be scaled as easily.&lt;/p&gt;
&lt;p&gt; And we really need to know [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:55) Flawed data&lt;/p&gt;&lt;p&gt;(06:42) Flawed models&lt;/p&gt;&lt;p&gt;(09:32) To make progress, we need experiments&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 19 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 14th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/the-software-intelligence-explosion-debate-needs-experiments?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/the-software-intelligence-explosion-debate-needs-experiments&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-software-intelligence-explosion-debate-needs-experiments/software-explosion-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-software-intelligence-explosion-debate-needs-experiments/software-explosion-4.png" alt="Estimates of the returns to software R&amp;amp;D for three different domains of AI: (1) computer vision, (2) reinforcement learning, and (3) language models. Technical details are provided in the appendix." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 14 Nov 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">f60aa9cc-79c3-4285-aade-76259ed6cf82</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/f60aa9cc-79c3-4285-aade-76259ed6cf82.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho%252C%2520Parker%2520Whitfill&amp;title=%22The%20software%20intelligence%20explosion%20debate%20needs%20experiments%22%20by%20Anson%20Ho%2C%20Parker%20Whitfill&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fthe-software-intelligence-explosion-debate-needs-experiments&amp;created_at=2026-05-18T11%3A24%3A57.43223%2B00%3A00&amp;duration=807" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/the-software-intelligence-explosion-debate-needs-experiments</link>
      <itunes:duration>807</itunes:duration>
    </item>
    <item>
      <title>“Introducing the Frontier Data Centers Hub” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: We announce our new Frontier Data Centers Hub, a database tracking large AI data centers using satellite and permit data to show compute, power use, and construction timelines.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Companies are building AI data centers at an unprecedented scale. These facilities have the power capacity of small countries and cost tens of billions to construct. Yet until now, the details of their true capacity and progress have remained opaque. To help the public, researchers, policymakers, and investors understand the scale of this new infrastructure wave, Epoch AI has created the Frontier Data Centers Hub.&lt;/p&gt;&lt;p&gt; This open database tracks the construction and capacity of major AI data centers using satellite imagery, public permits, and other open sources. It's the most detailed public resource to date on how much power, land, and hardware the largest AI companies are deploying — and when.&lt;/p&gt;&lt;p&gt; The 13 large U.S. data centers tracked in the hub account for a substantial share of total compute stock globally: about 2.5 million (~15%) of the roughly 15 million H100-equivalents that have been delivered to customers in the past several years as of late 2025.&lt;/p&gt;&lt;p&gt; You can read more about how AI data centers work in our [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:24) Introduction&lt;/p&gt;&lt;p&gt;(01:40) Power&lt;/p&gt;&lt;p&gt;(04:23) Compute&lt;/p&gt;&lt;p&gt;(06:21) Cost&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 4th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/introducing-the-frontier-data-centers-hub?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/introducing-the-frontier-data-centers-hub&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/introducing-the-frontier-data-centers-hub/anthropic-amazon-new-carlisle.gif" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/introducing-the-frontier-data-centers-hub/anthropic-amazon-new-carlisle.gif" alt="Build-out of Anthropic/Amazon New Carlisle.
Image source: Copernicus Sentinel data 2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/introducing-the-frontier-data-centers-hub/openai-stargate-abilene.gif" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/introducing-the-frontier-data-centers-hub/openai-stargate-abilene.gif" alt="Build-out of OpenAl Stargate Abilene.
Image source: Copernicus Sentinel data 2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/introducing-the-frontier-data-centers-hub/openai-stargate-abilene.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/introducing-the-frontier-data-centers-hub/openai-stargate-abilene.png" alt="High resolution satellite image of OpenAl Stargate Abilene in September 26, 2025: Building 1 and Building 2 likely fully operational.
Image source: Copernicus Sentinel data 2025. From Airbus, delivered by Apollo Mapping." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 04 Nov 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">7a2dfb97-1349-4791-acde-bb99f8642d29</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/7a2dfb97-1349-4791-acde-bb99f8642d29.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Introducing%20the%20Frontier%20Data%20Centers%20Hub%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fintroducing-the-frontier-data-centers-hub&amp;created_at=2026-05-18T16%3A29%3A48.358541%2B00%3A00&amp;duration=521" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/introducing-the-frontier-data-centers-hub</link>
      <itunes:duration>521</itunes:duration>
    </item>
    <item>
      <title>“What you need to know about AI data centers” by Ben Cottier, Yafah Edelman</title>
      <description>&lt;p&gt; Subtitle: AI companies are planning a buildout of data centers that will rank among the largest infrastructure projects in history. We examine their power demands, what makes AI data centers special, and what all this means for AI policy and the future of AI.&lt;/p&gt; &lt;p&gt; This report accompanies our Frontier Data Center Hub.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; It's difficult to appreciate the historic scale of AI data centers. They represent some of the largest infrastructure projects humanity has ever created.&lt;/p&gt;&lt;p&gt; To get a sense of the scale, consider that OpenAI's Stargate Abilene data center will need:&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; Enough electricity to serve the population of Seattle1&lt;/li&gt;
&lt;li&gt; More than 250× the computing power of the supercomputer that trained GPT-42&lt;/li&gt;
&lt;li&gt; A plot of land larger than 450 soccer fields3&lt;/li&gt;
&lt;li&gt; $32 billion in construction and IT equipment costs&lt;/li&gt;
&lt;li&gt; A few thousand construction workers4&lt;/li&gt;
&lt;li&gt; Around two years for construction5&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt; And that's just a small part of the picture. Companies are currently building many other data centers like Stargate Abilene.6 By the end of 2027, AI data centers could collectively see hundreds of billions in investment — rivalling the Apollo program and Manhattan Project.&lt;/p&gt;&lt;p&gt; This raises many questions, such as:&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:32) Introduction&lt;/p&gt;&lt;p&gt;(01:56) Power - the most important thing to know about an AI data center&lt;/p&gt;&lt;p&gt;(02:02) Power determines where AI data centers are built&lt;/p&gt;&lt;p&gt;(05:07) Where power comes from&lt;/p&gt;&lt;p&gt;(07:07) What's so special about AI data centers?&lt;/p&gt;&lt;p&gt;(07:11) AI data centers have exceptionally high power densities&lt;/p&gt;&lt;p&gt;(08:52) Huge power densities call for unique cooling systems&lt;/p&gt;&lt;p&gt;(11:03) What does this all mean for AI progress and policy?&lt;/p&gt;&lt;p&gt;(11:08) AI's broad climate impact isn't very big (yet)&lt;/p&gt;&lt;p&gt;(12:31) Companies probably won't need to decentralize AI training over the next two years&lt;/p&gt;&lt;p&gt;(13:45) Gigawatt-scale AI data centers are hard to secure&lt;/p&gt;&lt;p&gt;(15:06) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 25 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 4th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/what-you-need-to-know-about-ai-data-centers?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/what-you-need-to-know-about-ai-data-centers&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-you-need-to-know-about-ai-data-centers/map.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-you-need-to-know-about-ai-data-centers/map.png" alt="Some of the most compute-intensive AI data centers in the US are planned to be built in the Midwest and the Southern US, where power is typically easier to access. These states tend to have greater natural gas availability and less regulatory “red tape”." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-you-need-to-know-about-ai-data-centers/server-rack.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-you-need-to-know-about-ai-data-centers/server-rack.jpg" alt="An NVIDIA NVL72 server rack holds 72 Blackwell GPUs, which are arranged close together in rows on the rack. Source: Travis P Ball/Sipa USA via Reuters Connect" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-you-need-to-know-about-ai-data-centers/cooling-plate.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-you-need-to-know-about-ai-data-centers/cooling-plate.png" alt="Example of a plate that’s mounted onto a GPU to help with liquid cooling. Coolant flows in through one outlet and out the other, before transferring heat to water in the main cooling infrastructure of the data center. Source: jetcool" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-you-need-to-know-about-ai-data-centers/stargate.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-you-need-to-know-about-ai-data-centers/stargate.png" alt="Zooming in on OpenAI’s Stargate Abilene data center, we see rows of cooling equipment (in black) surrounding data center buildings (in white/orange). Source: from Airbus, delivered by Apollo Mapping." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 04 Nov 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">35478e9f-37a7-46d5-978a-14e4c2624725</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/35478e9f-37a7-46d5-978a-14e4c2624725.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ben%2520Cottier%252C%2520Yafah%2520Edelman&amp;title=%22What%20you%20need%20to%20know%20about%20AI%20data%20centers%22%20by%20Ben%20Cottier%2C%20Yafah%20Edelman&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fwhat-you-need-to-know-about-ai-data-centers&amp;created_at=2026-05-18T16%3A27%3A19.226647%2B00%3A00&amp;duration=982" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/what-you-need-to-know-about-ai-data-centers</link>
      <itunes:duration>982</itunes:duration>
    </item>
    <item>
      <title>“What does OSWorld tell us about AI’s ability to use computers?” by Florian Brand, Greg Burnham</title>
      <description>&lt;p&gt; Subtitle: We review OSWorld, a prominent computer use benchmark. Its tasks are relatively simple, many don’t require GUIs, and success often hinges on interpreting ambiguous instructions. It is also not stable over time.&lt;/p&gt;    OSWorld Computer use   &lt;p&gt; OSWorld is a benchmark for evaluating large language models on computer use tasks. A model is given task instructions and an Ubuntu virtual machine and must execute actions to perform the task.&lt;/p&gt;  &lt;ul data-astro-cid-dga2kyfb=""&gt; &lt;li data-astro-cid-dga2kyfb=""&gt; 
 Size: 361 computer use tasks &lt;/li&gt; &lt;li data-astro-cid-dga2kyfb=""&gt; 
 Data sourcing: Humans, forums, tutorials, etc. &lt;/li&gt; &lt;li data-astro-cid-dga2kyfb=""&gt; 
 Scoring method: Evaluation function &lt;/li&gt; &lt;li data-astro-cid-dga2kyfb=""&gt; 
 Contamination risk: Medium &lt;/li&gt; &lt;/ul&gt; 
&lt;p&gt; If AI systems are to be digital coworkers, they will need to be able to use computers. In this article we review OSWorld, a popular benchmark designed to measure progress toward this milestone. What is in this benchmark, how should we interpret progress, and what will it mean when AI systems score near-perfect, i.e. have saturated it?&lt;/p&gt;
&lt;p&gt; Main Takeaways&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; Saturation on OSWorld means a model can execute simple, realistic tasks in Linux-based environments using popular open-source applications. These include things like adding page numbers to a document or exporting a CSV file from [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:21) OSWorld setup and evaluation&lt;/p&gt;&lt;p&gt;(04:56) OSWorld task instructions are continually updated, making through-time comparisons of uncertain value&lt;/p&gt;&lt;p&gt;(05:42) Saturation on OSWorld means a model can do simple, realistic tasks in Linux-based environments using popular open-source applications&lt;/p&gt;&lt;p&gt;(06:14) Most tasks are simple&lt;/p&gt;&lt;p&gt;(07:49) Tasks use a variety of applications&lt;/p&gt;&lt;p&gt;(08:30) The titular OS is Linux and applications are open source, but this is probably not a major issue&lt;/p&gt;&lt;p&gt;(09:31) Terminal use and Python scripting can go a long way&lt;/p&gt;&lt;p&gt;(09:47) About 15% of tasks can be completed using only a terminal&lt;/p&gt;&lt;p&gt;(10:59) About 30% of tasks can substitute terminal use and Python scripting for much GUI use&lt;/p&gt;&lt;p&gt;(12:58) Many instructions are borderline-ambiguous and discerning the instruction's intent is a significant part of succeeding&lt;/p&gt;&lt;p&gt;(16:05) About 10% of OSWorld tasks have serious errors&lt;/p&gt;&lt;p&gt;(18:11) About 10% of OSWorld tasks rely on live data from the Internet, and thus their difficulty may change over time&lt;/p&gt;&lt;p&gt;(19:10) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 9 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          October 30th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/what-does-osworld-tell-us-about-ais-ability-to-use-computers?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/what-does-osworld-tell-us-about-ais-ability-to-use-computers&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-does-osworld-tell-us-about-ais-ability-to-use-computers/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-does-osworld-tell-us-about-ais-ability-to-use-computers/figure-1.png" alt="Example starting state for a task, with a LibreOffice Impress file open.
The task instruction is, “Make a duplicate of the last two slides for me, please.”" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/os-world-task-steps.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/os-world-task-steps.png" alt="Most OSWorld tasks can be completed in fewer than 10 steps" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/os-world-applications.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/os-world-applications.png" alt="OSWorld tasks involve a diverse set of applications" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-does-osworld-tell-us-about-ais-ability-to-use-computers/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-does-osworld-tell-us-about-ais-ability-to-use-computers/figure-2.png" alt="Terminal window showing Python package installation for pandas and openpyxl." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-does-osworld-tell-us-about-ais-ability-to-use-computers/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-does-osworld-tell-us-about-ais-ability-to-use-computers/figure-3.png" alt="Slide 3 of the associated file, with the relevant textbox selected." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-does-osworld-tell-us-about-ais-ability-to-use-computers/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-does-osworld-tell-us-about-ais-ability-to-use-computers/figure-4.png" alt="Slide 4 of the associated file, with the relevant group object selected, showing its two constituent textboxes." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 30 Oct 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">f65a3139-6064-4ed8-a9f0-854c1b184aa0</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/f65a3139-6064-4ed8-a9f0-854c1b184aa0.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Florian%2520Brand%252C%2520Greg%2520Burnham&amp;title=%22What%20does%20OSWorld%20tell%20us%20about%20AI%E2%80%99s%20ability%20to%20use%20computers%3F%22%20by%20Florian%20Brand%2C%20Greg%20Burnham&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fwhat-does-osworld-tell-us-about-ais-ability-to-use-computers&amp;created_at=2026-05-18T16%3A52%3A33.213425%2B00%3A00&amp;duration=1218" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/what-does-osworld-tell-us-about-ais-ability-to-use-computers</link>
      <itunes:duration>1218</itunes:duration>
    </item>
    <item>
      <title>“Could decentralized training solve AI’s power problem?” by Jaime Sevilla, Anton Troynikov</title>
      <description>&lt;p&gt; Subtitle: We illustrate a decentralized 10 gigawatts training run across a dozen sites spanning thousands of kilometers. Developers are likely to scale datacenters to multi-gigawatt levels before adopting decentralized training.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; In their quest to make smarter AI, companies vie to build — and power — the largest datacenters.&lt;/p&gt;&lt;p&gt; xAI built the 350 megawatts Colossus datacenter in Memphis, which they plan to expand to 1.5 gigawatts, while OpenAI's 240 megawatts Abilene Stargate datacenter is planned to reach 1.2 gigawatts. At full scale, these single datacenters will rival the most power-hungry facilities in the world today, such as the 1.2 gigawatts Maaden or the 1.6 gigawatts Bahrain aluminium smelters.&lt;/p&gt;&lt;p&gt; And if trends continue, companies might soon exceed this, with training clusters projected to reach 10 gigawatts by the end of the decade. This is larger than the capacity of the US's largest power plant, the Grand Colulee Dam, and nearly matches the total installed power capacity for all NVIDIA AI chips at the end of 2024.1&lt;/p&gt;&lt;p&gt; But utilities are already struggling to supply the power AI hyperscalers demand. John Ketchum, CEO of the largest US utility, NextEra, stated last year that while some sites could readily support one gigawatt [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:28) Introduction&lt;/p&gt;&lt;p&gt;(03:45) Planning a ten-gigawatt training run&lt;/p&gt;&lt;p&gt;(05:37) What kind of model would we train?&lt;/p&gt;&lt;p&gt;(07:17) How would we decentralize training?&lt;/p&gt;&lt;p&gt;(09:15) Can decentralized training maintain sufficient throughput at very large scales?&lt;/p&gt;&lt;p&gt;(10:25) How long to process each batch?&lt;/p&gt;&lt;p&gt;(11:47) How much time will we spend on the network?&lt;/p&gt;&lt;p&gt;(13:07) Would the propagation latency be low enough?&lt;/p&gt;&lt;p&gt;(14:48) Could bandwidth be a bottleneck?&lt;/p&gt;&lt;p&gt;(17:26) Is the network cost feasible?&lt;/p&gt;&lt;p&gt;(20:09) Will companies actually turn to decentralized training at scale?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 28 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          October 28th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/could-decentralized-training-solve-ais-power-problem?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/could-decentralized-training-solve-ais-power-problem&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/could-decentralized-training-solve-ais-power-problem/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/could-decentralized-training-solve-ais-power-problem/figure-1.png" alt="Map showing fiber optic network loop connecting hypothetical data centers at power plants across the United States." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/could-decentralized-training-solve-ais-power-problem/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/could-decentralized-training-solve-ais-power-problem/figure-2.png" alt="A log-log graph showing critical batch size versus training tokens for language models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/could-decentralized-training-solve-ais-power-problem/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/could-decentralized-training-solve-ais-power-problem/figure-3.png" alt="Learn more about our assumptions." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/could-decentralized-training-solve-ais-power-problem/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/could-decentralized-training-solve-ais-power-problem/figure-4.png" alt="Infographic comparing network installation costs of $410 million to $90 billion data center construction costs." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/could-decentralized-training-solve-ais-power-problem/figure-5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/could-decentralized-training-solve-ais-power-problem/figure-5.png" alt="Comparison of pros and cons for decentralized versus centralized training at gigawatt scale." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 28 Oct 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">ac18fca9-4f2b-446a-8cdf-fb73e746814a</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/ac18fca9-4f2b-446a-8cdf-fb73e746814a.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla%252C%2520Anton%2520Troynikov&amp;title=%22Could%20decentralized%20training%20solve%20AI%E2%80%99s%20power%20problem%3F%22%20by%20Jaime%20Sevilla%2C%20Anton%20Troynikov&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fcould-decentralized-training-solve-ais-power-problem&amp;created_at=2026-05-18T16%3A52%3A36.401732%2B00%3A00&amp;duration=1707" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/could-decentralized-training-solve-ais-power-problem</link>
      <itunes:duration>1707</itunes:duration>
    </item>
    <item>
      <title>“Less than 70% of FrontierMath is within reach for today’s models” by Greg Burnham</title>
      <description>&lt;p&gt; Subtitle: 57% of problems have been solved at least once.&lt;/p&gt;  &lt;p&gt; The best we have seen a model perform on a single run of FrontierMath is 29%.1 If you want to use a model to solve FrontierMath-style problems, that's the number to consider.&lt;/p&gt;
&lt;p&gt; But there's another way to gauge state-of-the-art performance: how many FrontierMath problems have been solved by any model, on any run, even once? This tells us more about what is “within reach” for today's models. It's also more forward-looking: if today's models can generate the right ideas to solve a problem at all, then that makes it more likely that tomorrow's models will be able to solve the problem reliably.2 In other words, we can get a view of the future that's a bit more concrete than just extrapolating accuracy trends.&lt;/p&gt;
&lt;p&gt; To make matters more interesting, there's some empirical evidence that if you run an LLM on a benchmark N times, the percentage of problems correctly solved at least once (known as pass@N) increases proportionally to log(N). If that's true in general, then, since log(N) is unbounded, we should expect pass@N to approach 100% as the models are given more tries. Could FrontierMath's saturation [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:26) GPT-5's pass@N caps out below 50%&lt;/p&gt;&lt;p&gt;(03:14) GPT-5 Pass@N on FrontierMath Tiers 1-3&lt;/p&gt;&lt;p&gt;(04:21) Pass@the-kitchen-sink likely caps out below 70%&lt;/p&gt;&lt;p&gt;(06:48) ChatGPT Agent Pass@N on FrontierMath Tiers 1-3&lt;/p&gt;&lt;p&gt;(08:52) This gives us something to watch as models improve on FrontierMath&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 8 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          October 17th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/less-than-70-percent-of-frontiermath-is-within-reach-for-todays-models?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/less-than-70-percent-of-frontiermath-is-within-reach-for-todays-models&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/less-than-70-percent-of-frontiermath-is-within-reach-for-todays-models/pass-n-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/less-than-70-percent-of-frontiermath-is-within-reach-for-todays-models/pass-n-1.png" alt="Graph showing GPT-5 performance on FrontierMath problems by sample size, with logarithmic fit." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/less-than-70-percent-of-frontiermath-is-within-reach-for-todays-models/pass-n-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/less-than-70-percent-of-frontiermath-is-within-reach-for-todays-models/pass-n-2.png" alt="Bar chart showing FrontierMath problem solve rates across three difficulty tiers." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/less-than-70-percent-of-frontiermath-is-within-reach-for-todays-models/pass-n-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/less-than-70-percent-of-frontiermath-is-within-reach-for-todays-models/pass-n-3.png" alt="Graph showing AI accuracy progress on FrontierMath benchmark from 2024 to 2030." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 17 Oct 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">f8026796-23a0-4bc0-bb4d-b6b6d112e971</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/f8026796-23a0-4bc0-bb4d-b6b6d112e971.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Greg%2520Burnham&amp;title=%22Less%20than%2070%25%20of%20FrontierMath%20is%20within%20reach%20for%20today%E2%80%99s%20models%22%20by%20Greg%20Burnham&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fless-than-70-percent-of-frontiermath-is-within-reach-for-todays-models&amp;created_at=2026-05-18T20%3A13%3A11.257546%2B00%3A00&amp;duration=643" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/less-than-70-percent-of-frontiermath-is-within-reach-for-todays-models</link>
      <itunes:duration>643</itunes:duration>
    </item>
    <item>
      <title>“OpenAI is projecting unprecedented revenue growth” by Greg Burnham</title>
      <description>&lt;p&gt; Subtitle: No company has gone from 10 billion dollars to 100 billion dollars as fast as OpenAI projects to do.&lt;/p&gt;  &lt;p&gt; Epoch's new AI companies database shows the remarkable level and pace of growth of OpenAI's revenue. It first exceeded $1 billion in 2023 and will exceed $10 billion in 2025. This is impressive, but not unprecedented — a few other companies have matched this growth rate historically.&lt;/p&gt;
&lt;p&gt; OpenAI's projections, however, are a different story. According to The Information, in Q3 2025 OpenAI projected its 2028 revenue to be $100 billion. I couldn’t find any examples of a company growing its revenue from around $10 billion to $100 billion in such a short period of time.&lt;/p&gt;
&lt;p&gt; What happens if OpenAI falls short of these projections? At a minimum, it would likely have to scale back its plans for large compute build-outs. The recently-announced deals with Nvidia, AMD, and Broadcom imply expenditures of roughly $1.3 trillion within the next decade, and some of this is presumably expected to be financed by revenue or debt raised against revenue.1&lt;/p&gt;
&lt;p&gt; But the second-order effects of a miss could be larger. This is because investors and other companies are increasingly betting [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:18) OpenAI's revenue grew very quickly from $1B to $10B&lt;/p&gt;&lt;p&gt;(03:51) No company has grown its revenue from $10 billion to $100 billion in three years&lt;/p&gt;&lt;p&gt;(06:34) Can OpenAI do it?&lt;/p&gt;&lt;p&gt;(09:29) What if not?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 8 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          October 15th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/openai-is-projecting-unprecedented-revenue-growth?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/openai-is-projecting-unprecedented-revenue-growth&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/openai-is-projecting-unprecedented-revenue-growth/revenue-2-margins.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/openai-is-projecting-unprecedented-revenue-growth/revenue-2-margins.png" alt="Graph comparing revenue growth trajectories of OpenAI, Google, Uber, Cheniere, and Moderna over five years." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/openai-is-projecting-unprecedented-revenue-growth/revenue-3-margins.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/openai-is-projecting-unprecedented-revenue-growth/revenue-3-margins.png" alt="A line graph showing OpenAI's projected revenue growth compared to major tech companies." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/openai-is-projecting-unprecedented-revenue-growth/revenue-1-margins.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/openai-is-projecting-unprecedented-revenue-growth/revenue-1-margins.png" alt="A line graph showing OpenAI's annualized revenue growth from 2024 to 2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 15 Oct 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">04865af2-4f44-475f-b1c0-e171faeaf2f4</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/04865af2-4f44-475f-b1c0-e171faeaf2f4.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Greg%2520Burnham&amp;title=%22OpenAI%20is%20projecting%20unprecedented%20revenue%20growth%22%20by%20Greg%20Burnham&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fopenai-is-projecting-unprecedented-revenue-growth&amp;created_at=2026-05-18T14%3A04%3A25.621063%2B00%3A00&amp;duration=615" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/openai-is-projecting-unprecedented-revenue-growth</link>
      <itunes:duration>615</itunes:duration>
    </item>
    <item>
      <title>“Evaluating Gemini 2.5 Deep Think’s math capabilities” by Greg Burnham</title>
      <description>&lt;p&gt; Subtitle: It has improved at using background knowledge and doing precise computations. It can be a helpful research assistant and may take a more conceptual approach to geometry. It shows limited creativity and sometimes struggles with citations.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; We evaluated the math capabilities of Gemini 2.5 Deep Think (hereafter, Deep Think), the publicly available version of the model that got a gold medal-equivalent score on the International Mathematical Olympiad (IMO). What are its strengths and weaknesses, both in absolute terms and relative to other models? To our knowledge, this is the most comprehensive third-party evaluation conducted to-date on such a “high compute” model setting.&lt;/p&gt;&lt;p&gt; Note: This work was commissioned by Google. Epoch maintained editorial control over the output. We offer timely and in-depth evaluation as a service to model developers; email info@epoch.ai for details.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Executive Summary&lt;/strong&gt;&lt;/p&gt;&lt;ol&gt; 
&lt;li&gt; Deep Think set a new record on FrontierMath Tiers 1–3 (29%) and Tier 4 (10%), representing an improved ability to solve short-answer math problems that require deep background knowledge and precise execution of computations. ()&lt;/li&gt;
&lt;li&gt; Two professional mathematicians characterized Deep Think as a generally helpful research assistant, broadly on par with the best available models. ()&lt;/li&gt;
&lt;li&gt; While this [...]&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:28) Introduction&lt;/p&gt;&lt;p&gt;(01:13) Executive Summary&lt;/p&gt;&lt;p&gt;(02:32) Methodology&lt;/p&gt;&lt;p&gt;(03:46) Deep Think's performance on FrontierMath indicates advances in background knowledge and executing complex computations&lt;/p&gt;&lt;p&gt;(06:48) Deep Think Performance vs. Problem Ratings&lt;/p&gt;&lt;p&gt;(10:44) Mathematicians characterized Deep Think as a helpful research assistant, though one noted a weakness at citing the literature&lt;/p&gt;&lt;p&gt;(16:01) Deep Think did well on the 2025 IMO, but failed to solve two older IMO problems requiring more creative and intricate proofs&lt;/p&gt;&lt;p&gt;(26:17) Deep Think's approach to geometry is more conceptual than we have seen from other models&lt;/p&gt;&lt;p&gt;(31:06) We observed Deep Think making one mistake that is reminiscent of classical human cognitive biases&lt;/p&gt;&lt;p&gt;(32:59) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 19 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          October 9th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/deep-think-math?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/deep-think-math&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image1.png" alt="Bar chart comparing AI model accuracy on FrontierMath benchmark across difficulty tiers." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image2.png" alt="A bar graph showing Deep Think accuracy scores across FrontierMath difficulty tiers." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image3.png" alt="Mathematical problem about finite words, permutations, and symmetric groups." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image4.png" alt="Mathematical text defining a function F on words and asking about expected descents in permutations." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image5.png" alt="Citation and clarification for Bousquet-Mélou paper on sorted permutations from 2000." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image6.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image6.png" alt="Mathematical equations showing recurrence relations for expected number of descents with solution formulas." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image7.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image7.png" alt="Exercise problem about drawing a simply connected region and logarithm branches." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image8.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image8.png" alt="Graph showing "Simply connected region Ω for k=2" with circular paths and labeled points." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image9.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image9.png" alt="Two grids showing purple paths with arrows, marked with letter M." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image10.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image10.png" alt="Strategy explanation for Turbo's game with three attempts described." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image11.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image11.png" alt="Mathematical text explaining adversarial strategy requiring minimum 2023 attempts in worst case." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image12.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image12.png" alt="Mathematical problem about aquaesulian functions on rational numbers." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image13.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image13.png" alt="Mathematical proof discussing aquaesulian functions and proving an upper bound." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image14.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image14.png" alt="Mathematical text describing characterization of aquaesulian functions, including additive property equation." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image15.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image15.png" alt="Text snippet discussing re-examining an argument about symmetry property." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image16.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image16.png" alt="Text excerpt on dark background discussing an argument in a thought block." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image17.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image17.png" alt="Graph showing a parabola opening upward and a straight line intersecting it." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image18.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image18.png" alt="Mathematical solution showing rotation of parabola and intersection point analysis." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image19.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image19.png" alt="Mathematical text describing geometric properties of triangle AMN and its circumcircle." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image20.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image20.png" alt="Mathematical equations and calculations showing circle and distance relationships." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/deep-think-math/image21.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/deep-think-math/image21.png" alt="Educational text explaining how to calculate the number of trucks needed to transport flagstones." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 09 Oct 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">8fd2aa78-ae18-4c43-b2b8-3530c1cfe7e6</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/8fd2aa78-ae18-4c43-b2b8-3530c1cfe7e6.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Greg%2520Burnham&amp;title=%22Evaluating%20Gemini%202.5%20Deep%20Think%E2%80%99s%20math%20capabilities%22%20by%20Greg%20Burnham&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fdeep-think-math&amp;created_at=2026-05-18T16%3A52%3A37.655696%2B00%3A00&amp;duration=2069" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/deep-think-math</link>
      <itunes:duration>2069</itunes:duration>
    </item>
    <item>
      <title>“How many digital workers could OpenAI deploy?” by Jean-Stanislas Denain, Anson Ho, Jaime Sevilla</title>
      <description>&lt;p&gt; Subtitle: OpenAI has the inference compute to deploy tens of millions of digital workers, but only on a narrow set of tasks – for now.&lt;/p&gt;  &lt;p&gt; The core argument for how AI could drive explosive economic growth is that you can dramatically scale up the number of AI “digital workers”. The idea is that growth is constrained by labor, so rapidly expanding the workforce would hugely accelerate growth rates.&lt;/p&gt;
&lt;p&gt; This is where AI comes in. While you can’t double the human population each year, you can double the number of AI chips – as we’ve seen with NVIDIA and OpenAI.1 So if AI can fully substitute for human workers, the workforce could grow many times faster than today. As a result, the economy could grow many times faster too.&lt;/p&gt;
&lt;p&gt; If this framing is right, then to know how far we are from explosive growth, we need to answer three questions. First, how many AI “digital workers” can be deployed today? Second, how far is AI from fully substituting for human workers? And third, how are both of these changing over time?&lt;/p&gt;
&lt;p&gt; In this post, we’ll take a stab at the first question: On the tasks that [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:16) Estimating the number of digital workers that frontier labs can deploy&lt;/p&gt;&lt;p&gt;(06:26) What do these numbers tell us?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 14 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          October 3rd, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/how-many-digital-workers-could-openai-deploy?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/how-many-digital-workers-could-openai-deploy&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-many-digital-workers-could-openai-deploy/digital_worker_distribution.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-many-digital-workers-could-openai-deploy/digital_worker_distribution.png" alt="Distribution graph titled "Simulated Workers" showing statistical data with 400K to 320M range." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-many-digital-workers-could-openai-deploy/gu-digital-workers-margins.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-many-digital-workers-could-openai-deploy/gu-digital-workers-margins.png" alt="Infographic comparing GPT-5 daily token generation capacity to human token processing on automatable tasks." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 03 Oct 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">044ad9d0-df22-4b02-a262-641720874955</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/044ad9d0-df22-4b02-a262-641720874955.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jean-Stanislas%2520Denain%252C%2520Anson%2520Ho%252C%2520Jaime%2520Sevilla&amp;title=%22How%20many%20digital%20workers%20could%20OpenAI%20deploy%3F%22%20by%20Jean-Stanislas%20Denain%2C%20Anson%20Ho%2C%20Jaime%20Sevilla&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fhow-many-digital-workers-could-openai-deploy&amp;created_at=2026-05-18T14%3A04%3A26.825628%2B00%3A00&amp;duration=626" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/how-many-digital-workers-could-openai-deploy</link>
      <itunes:duration>626</itunes:duration>
    </item>
    <item>
      <title>“Introducing the AI Companies Data Hub” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: Our new AI Companies Data Hub tracks key economic and operational data, including frontier AI companies’ revenue, funding, valuations, staff counts, compute spending, and product usage. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The AI industry has changed rapidly in recent years, with frontier companies like OpenAI and Anthropic seeing fast exponential growth in their revenues and valuations. This growth has important implications for AI's trajectory: AI companies are continually improving their technology by scaling up compute and labor inputs, and their revenue and usage track how AI is already impacting the world.&lt;/p&gt;&lt;p&gt; To help researchers, policymakers, and the public understand these trends, we have created our AI Companies Data Hub. The hub tracks financial and operational metrics—revenue, funding, staff, usage, and compute spend—for the key companies developing frontier AI models, along with interactive visualizations. This supplements our data hubs on AI models and GPU clusters and provides a more holistic view of the resource inputs and economic impact of the AI industry.&lt;/p&gt;&lt;p&gt; Our data shows that the combined revenues of OpenAI and Anthropic grew around 10x since early 2024. OpenAI's annualized revenue reached $13 billion in August 2025, up from $5 billion at the beginning of the year, while Anthropic's revenue [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          September 30th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/introducing-the-ai-companies-data-hub?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/introducing-the-ai-companies-data-hub&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/introducing-the-ai-companies-data-hub/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/introducing-the-ai-companies-data-hub/figure-1.png" alt="Line graph showing annualized revenue growth trends for four AI companies." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/introducing-the-ai-companies-data-hub/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/introducing-the-ai-companies-data-hub/figure-2.png" alt="Step chart showing cumulative funding over time for OpenAI, Anthropic, xAI, and Mistral AI." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/introducing-the-ai-companies-data-hub/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/introducing-the-ai-companies-data-hub/figure-3.png" alt="Line graph showing staff count growth of AI companies from 2017 to 2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/introducing-the-ai-companies-data-hub/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/introducing-the-ai-companies-data-hub/figure-4.png" alt="Bar chart showing annual compute spend in USD by AI companies from 2022 to 2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 30 Sep 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">3f8a0158-889f-4189-94e5-a6ca7c268d80</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/3f8a0158-889f-4189-94e5-a6ca7c268d80.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Introducing%20the%20AI%20Companies%20Data%20Hub%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fintroducing-the-ai-companies-data-hub&amp;created_at=2026-05-18T16%3A52%3A38.353004%2B00%3A00&amp;duration=248" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/introducing-the-ai-companies-data-hub</link>
      <itunes:duration>248</itunes:duration>
    </item>
    <item>
      <title>“Why GPT-5 used less training compute than GPT-4.5 (but GPT-6 probably won’t)” by Yafah Edelman, Jean-Stanislas Denain, Jaime Sevilla, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: OpenAI focused on scaling post-training on a smaller model. &lt;/p&gt;  &lt;p&gt; Out of all the GPT models, GPT-5 is the odd one out. Unlike all previous versions of GPT, it was likely trained on less compute than its immediate predecessor, GPT-4.5.1&lt;/p&gt;
&lt;p markdown="1"&gt;While the exact numbers are uncertain, GPT-4.5 very likely used more training compute than GPT-5.&lt;/p&gt;
&lt;p&gt; But this leads to a puzzle: Models trained with more compute tend to be better, so why did OpenAI train GPT-5 with less compute than GPT-4.5? And what will this mean for future OpenAI models?&lt;/p&gt;
&lt;p&gt; In this post, we’ll argue that the answers to these questions are the following:&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; GPT-5 used less training compute than GPT-4.5 because OpenAI focused on scaling post-training. New post-training techniques made it possible to outperform GPT-4.5 with less training compute, but these methods likely weren’t yet mature enough to be applied at GPT-4.5's compute scale. Doing so would’ve taken more time (and compute), which OpenAI likely chose not to do due to strong market pressures.&lt;/li&gt;
&lt;li&gt; OpenAI's next flagship model (“GPT-6”) will probably be trained on more compute than GPT-4.5: When OpenAI figures out how to productively scale post-training, they’ll likely shift [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:22) GPT-5 used less training compute than GPT-4.5 because OpenAI focused on scaling post-training&lt;/p&gt;&lt;p&gt;(05:58) GPT-6 will probably be trained on more compute than GPT-4.5&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 12 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          September 26th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/why-gpt5-used-less-training-compute-than-gpt45-but-gpt6-probably-wont?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/why-gpt5-used-less-training-compute-than-gpt45-but-gpt6-probably-wont&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/why-gpt5-used-less-training-compute-than-gpt45-but-gpt6-probably-wont/gpt-5-gu-big.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/why-gpt5-used-less-training-compute-than-gpt45-but-gpt6-probably-wont/gpt-5-gu-big.png" alt="While the exact numbers are uncertain, GPT-4.5 very likely used more training compute than GPT-5." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 26 Sep 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">68be4ecf-a173-4890-a5ec-200a9b777c68</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/68be4ecf-a173-4890-a5ec-200a9b777c68.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Yafah%2520Edelman%252C%2520Jean-Stanislas%2520Denain%252C%2520Jaime%2520Sevilla%252C%2520Anson%2520Ho&amp;title=%22Why%20GPT-5%20used%20less%20training%20compute%20than%20GPT-4.5%20(but%20GPT-6%20probably%20won%E2%80%99t)%22%20by%20Yafah%20Edelman%2C%20Jean-Stanislas%20Denain%2C%20Jaime%20Sevilla%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fwhy-gpt5-used-less-training-compute-than-gpt45-but-gpt6-probably-wont&amp;created_at=2026-05-18T14%3A04%3A27.530307%2B00%3A00&amp;duration=518" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/why-gpt5-used-less-training-compute-than-gpt45-but-gpt6-probably-wont</link>
      <itunes:duration>518</itunes:duration>
    </item>
    <item>
      <title>“The huge potential implications of long-context inference” by Jean-Stanislas Denain, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: Continual learning, scaling RL, and research feedback loops. &lt;/p&gt;  &lt;p&gt; On paper, modern LLMs can ingest many books’ worth of text in one go. For example, Gemini 2.5 Pro has a “context window” of 1 million tokens, enough to stuff in ten copies of Harry Potter and the Philosopher's Stone.1 But what if we could do lots of inference with much longer contexts? What if LLMs could take in 10 billion tokens of context, and we had the hardware and algorithms to make this usable in practice?&lt;/p&gt;
&lt;p&gt; The naive use case is being able to take in ever-longer documents. But we think the implications of long context inference could be much greater:&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; It provides an angle of attack on the ability to continually learn new knowledge after the model is deployed, one of the biggest bottlenecks to the real-world utility of current AI systems.&lt;/li&gt;
&lt;li&gt; It supports a ton of RL scaling: doing more reasoning, verifying model outputs, and generating high-quality RL environments.&lt;/li&gt;
&lt;li&gt; But there are also bottlenecks. As RL scales to longer runs, research iteration cycles slow down. And you’ll also need a lot of hardware and algorithmic progress so that long-context inference [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:59) Extremely long context inference provides an angle of attack on continual learning&lt;/p&gt;&lt;p&gt;(06:23) Being able to do lots of long context inference supports more RL scaling&lt;/p&gt;&lt;p&gt;(09:32) Bottlenecks: Slower research iteration times and potentially rising costs&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 7 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          September 19th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/the-huge-potential-implications-of-long-context-inference?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/the-huge-potential-implications-of-long-context-inference&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-huge-potential-implications-of-long-context-inference/context_windows_science.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-huge-potential-implications-of-long-context-inference/context_windows_science.png" alt="A scatter plot showing growth in LLM context window and benchmark performance over time from 2023-2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-huge-potential-implications-of-long-context-inference/karpathy_continual_learning.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-huge-potential-implications-of-long-context-inference/karpathy_continual_learning.png" alt="Algorithm description explaining a meta-prompt learning system for task improvement." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-huge-potential-implications-of-long-context-inference/insight_output_length.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-huge-potential-implications-of-long-context-inference/insight_output_length.png" alt="Model responses to benchmark questions are getting longer over time, especially in reasoning models that are typically trained with RL. This increases the demand for long-context inference." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 19 Sep 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">e3bb6f76-4a16-4b78-864e-959451f051ab</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/e3bb6f76-4a16-4b78-864e-959451f051ab.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jean-Stanislas%2520Denain%252C%2520Anson%2520Ho&amp;title=%22The%20huge%20potential%20implications%20of%20long-context%20inference%22%20by%20Jean-Stanislas%20Denain%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fthe-huge-potential-implications-of-long-context-inference&amp;created_at=2026-05-18T14%3A04%3A28.36414%2B00%3A00&amp;duration=722" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/the-huge-potential-implications-of-long-context-inference</link>
      <itunes:duration>722</itunes:duration>
    </item>
    <item>
      <title>“What will AI look like in 2030?” by David Owen</title>
      <description>&lt;p&gt; Subtitle: If scaling persists to 2030, AI investments will reach hundreds of billions of dollars and require gigawatts of power. Benchmarks suggest AI could improve productivity in valuable areas such as scientific R&amp;D.&lt;/p&gt; &lt;p&gt; This report was commissioned by Google DeepMind. All points of views and conclusions expressed are those of the authors and do not necessarily reflect the position or endorsement of Google DeepMind.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; What will happen if AI scaling persists to 2030? We are releasing a report that examines what this scale-up would involve in terms of compute, investment, data, hardware, and energy. We further examine the future AI capabilities this scaling will enable, particularly in scientific R&amp;amp;D, which is a focus for leading AI developers. We argue that AI scaling is likely to continue through 2030, despite requiring unprecedented infrastructure, and will deliver transformative capabilities across science and beyond.&lt;/p&gt;&lt;p&gt; Scaling is likely to continue until 2030: On current trends, frontier AI models in 2030 will require investments of hundreds of billions of dollars, and gigawatts of electrical power. Although these are daunting challenges, they are surmountable. Such investments will be justified if AI can generate corresponding economic returns by increasing productivity. If AI lab [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:39) Introduction&lt;/p&gt;&lt;p&gt;(02:25) Scaling is likely to continue to 2030&lt;/p&gt;&lt;p&gt;(06:17) AI will accelerate scientific R&amp;amp;D across several domains&lt;/p&gt;&lt;p&gt;(13:06) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 3 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          September 16th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/what-will-ai-look-like-in-2030?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/what-will-ai-look-like-in-2030&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-will-ai-look-like-in-2030/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-will-ai-look-like-in-2030/figure-1.png" alt="Graph showing hardware costs of AI supercomputers from 2019 to 2025, doubling yearly." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-will-ai-look-like-in-2030/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-will-ai-look-like-in-2030/figure-2.png" alt="SWE-Bench-Verified: a coding benchmark based on solving real-world GitHub issues with associated unit tests. Results include those reported from model cards, including those with private methodology such as Claude Sonnet 4.RE-Bench: a research engineering benchmark based on tasks similar to take-home assessments for job candidates, taking approximately eight hours for humans." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-will-ai-look-like-in-2030/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-will-ai-look-like-in-2030/figure-3.png" alt="Results show general-purpose LLMs only, excluding domain-specific systems like AlphaProof and AlphaGeometry2.AIME: a high school mathematics exam used for determining entry to the US Mathematical Olympiad, integer answers.USAMO: US Mathematical Olympiad, a high school mathematics exam with proof-based answers.FrontierMath: a mathematics benchmark focused on challenging questions up to expert level, but still offering straightforwardly-verifiable answers (numeric or simple expressions)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-will-ai-look-like-in-2030/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-will-ai-look-like-in-2030/figure-4.png" alt="PoseBusters-v2: a benchmark for protein-ligand docking (spatial interaction). We only include blind results, where the protein’s binding pocket is not provided.ProtocolQA: a benchmark for questions about biology wet lab protocols, here evaluated without multiple choice answers.Protein-protein interactions: there is significant progress predicting protein-protein interactions, but predictions for arbitrary pairs have a high false positive rate. Our illustration of progress is highly uncertain, and would depend on benchmark details." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-will-ai-look-like-in-2030/figure-5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-will-ai-look-like-in-2030/figure-5.png" alt="Timeline showing AI progress on weather prediction from 2015 to 2030 and beyond." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 16 Sep 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">25620b48-70a6-4a58-b3f4-f1480d3082e6</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/25620b48-70a6-4a58-b3f4-f1480d3082e6.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=David%2520Owen&amp;title=%22What%20will%20AI%20look%20like%20in%202030%3F%22%20by%20David%20Owen&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fwhat-will-ai-look-like-in-2030&amp;created_at=2026-05-18T15%3A39%3A27.881116%2B00%3A00&amp;duration=828" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/what-will-ai-look-like-in-2030</link>
      <itunes:duration>828</itunes:duration>
    </item>
    <item>
      <title>“Three challenges facing compute-based AI policies” by Venkat Somala, Anson Ho, Séb Krier</title>
      <description>&lt;p&gt; Subtitle: 'Training compute' is constantly evolving, and compute-based AI policies must adapt to remain relevant. &lt;/p&gt;  &lt;p&gt; This week's post is a collaboration between writers from Google DeepMind's AI Policy Perspectives substack, and Epoch AI.&lt;/p&gt;
&lt;p&gt; When the EU AI Act was drafted, pre-training compute was a reasonable proxy for model capabilities. At the time, pre-training accounted for 90-99% of total training compute, and the relationship was relatively reliable: more compute meant larger models pre-trained on more data, which consistently translated to stronger capabilities.&lt;/p&gt;
&lt;p&gt; This simple proxy has been steadily breaking down. While pre-training compute remains a primary driver of capabilities, modern AI development leans heavily on distillation, synthetic data generation, reward models, and reasoning post-training. These methods can consume significant compute and drive capability gains, yet are often unaccounted for in current regulatory frameworks.1&lt;/p&gt;
&lt;p&gt; The standard approach for measuring compute, used by the now-defunct Biden AI executive order, is to sum compute across two stages: “pre-training” and “post-training.” If the sum crosses some predefined threshold, the model is subject to additional scrutiny.2 But as training methods continue to evolve, this metric risks measuring an increasingly narrow slice of the factors that produce advanced capabilities.&lt;/p&gt;
&lt;p&gt; The [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:07) 1. Not all uses of compute contribute equally to model capabilities&lt;/p&gt;&lt;p&gt;(06:36) 2. AI labs can use compute for methods besides pre/post-training&lt;/p&gt;&lt;p&gt;(07:08) Knowledge distillation&lt;/p&gt;&lt;p&gt;(08:47) Synthetic data generation&lt;/p&gt;&lt;p&gt;(10:11) Reward models&lt;/p&gt;&lt;p&gt;(11:17) The diversity in training methods challenges standardized compute metrics&lt;/p&gt;&lt;p&gt;(12:54) 3. When deployed, an AI model's downstream capabilities depend on more than the compute used to train it&lt;/p&gt;&lt;p&gt;(14:44) What does this mean for AI public policy?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 7 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          September 11th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/three-issues-undermining-compute-based-ai-policies?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/three-issues-undermining-compute-based-ai-policies&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/three-issues-undermining-compute-based-ai-policies/o1_outperforms_gpt4o.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/three-issues-undermining-compute-based-ai-policies/o1_outperforms_gpt4o.png" alt="o1-high substantially outperforms GPT-4o on GPQA Diamond and MATH level 5 after a small amount of reasoning training. If this additional compute used for o1 post-training had instead been spent on further pre-training GPT-4o, the capability gains would have been negligible. This suggests that reasoning training FLOP yield far higher marginal returns than pre-training at this scale." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/three-issues-undermining-compute-based-ai-policies/gu_messy_compute_2_v2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/three-issues-undermining-compute-based-ai-policies/gu_messy_compute_2_v2.png" alt="While prior regulations have largely focused on a simple pre/post-training distinction, in practice training compute can come in myriad forms that are intertwined in a complex pipeline." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/three-issues-undermining-compute-based-ai-policies/gu_messy_compute_v5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/three-issues-undermining-compute-based-ai-policies/gu_messy_compute_v5.png" alt="Tool use boosts model performance without additional training compute. As shown here, o3 + tools significantly outperforms o3, highlighting how compute-based thresholds miss the impact of scaffolding and tools. Note that the estimates of training compute are speculative." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 11 Sep 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">de522c6d-f815-4a8d-988d-d3a2fd201104</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/de522c6d-f815-4a8d-988d-d3a2fd201104.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Venkat%2520Somala%252C%2520Anson%2520Ho%252C%2520S%25C3%25A9b%2520Krier&amp;title=%22Three%20challenges%20facing%20compute-based%20AI%20policies%22%20by%20Venkat%20Somala%2C%20Anson%20Ho%2C%20S%C3%A9b%20Krier&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fthree-issues-undermining-compute-based-ai-policies&amp;created_at=2026-05-18T14%3A04%3A29.842015%2B00%3A00&amp;duration=1088" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/three-issues-undermining-compute-based-ai-policies</link>
      <itunes:duration>1088</itunes:duration>
    </item>
    <item>
      <title>“Compute scaling will slow down due to increasing lead times” by Yafah Edelman, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: A heavily underappreciated dynamic when thinking about AI timelines.&lt;/p&gt;  &lt;p&gt; The massive compute scaling that has driven AI progress since 2020 is likely to slow down soon, due to increasing economic uncertainty and longer development cycles.&lt;/p&gt;
&lt;p&gt; While investors could theoretically scale compute by several orders of magnitude, the required hundreds of billions, combined with uncertain returns, will push them toward incremental scaling — investing, deploying products to gauge returns, then reevaluating further investment. Additionally, as the required compute grows larger, the time between project initiation and product deployment (i.e. “lead time”) lengthens significantly, creating a feedback loop that naturally slows the pace of compute scaling.&lt;/p&gt;
&lt;p&gt; In particular, our current best guess is that every additional 10× increase in compute scale lengthens lead times by around a year. For example, OpenAI currently likely has over $15 billion worth of compute, and this compute stock has been growing by around 2.2× each year.1 At that pace, current trends would predict a trillion dollar cluster around 2030 — but longer lead times would delay this to around 2035.&lt;/p&gt;
&lt;p markdown="1"&gt;The “extrapolation with lead times” is determined by taking the direct extrapolation, and adjusting it such that each additional 10× [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:31) Uncertainties about investment returns prevent a "YOLO scaleup"&lt;/p&gt;&lt;p&gt;(04:23) Lead times are getting longer, causing investments (and hence compute scaling) to slow down&lt;/p&gt;&lt;p&gt;(11:28) What does this mean for AI progress over the next few years?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 14 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          September 5th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/compute-scaling-will-slow-down-due-to-increasing-lead-times?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/compute-scaling-will-slow-down-due-to-increasing-lead-times&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/compute-scaling-will-slow-down-due-to-increasing-lead-times/gu_lead_times_investment_1_v2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/compute-scaling-will-slow-down-due-to-increasing-lead-times/gu_lead_times_investment_1_v2.png" alt="The “extrapolation with lead times” is determined by taking the direct extrapolation, and adjusting it such that each additional 10× increase in compute stock increases lead times by a year. These accumulate so that the total delay is larger at greater compute scales." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/compute-scaling-will-slow-down-due-to-increasing-lead-times/gu_lead_times_investment_v2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/compute-scaling-will-slow-down-due-to-increasing-lead-times/gu_lead_times_investment_v2.png" alt="Plot of estimated lead times at different levels of compute investment. Data is shown in Table 1. If a given primary constraint has a range of compute investment, we take the geometric mean." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 05 Sep 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">a8c359f0-cb21-46bf-a18b-73f94f584171</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/a8c359f0-cb21-46bf-a18b-73f94f584171.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Yafah%2520Edelman%252C%2520Anson%2520Ho&amp;title=%22Compute%20scaling%20will%20slow%20down%20due%20to%20increasing%20lead%20times%22%20by%20Yafah%20Edelman%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fcompute-scaling-will-slow-down-due-to-increasing-lead-times&amp;created_at=2026-05-18T14%3A37%3A17.766521%2B00%3A00&amp;duration=852" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/compute-scaling-will-slow-down-due-to-increasing-lead-times</link>
      <itunes:duration>852</itunes:duration>
    </item>
    <item>
      <title>“Why future AI agents will be trained to work together” by Anson Ho, Jean-Stanislas Denain</title>
      <description>&lt;p&gt; Subtitle: Many multi-agent setups are based on fancy prompts, but this is unlikely to persist. &lt;/p&gt; 
&lt;p&gt; We’re moving towards a world where multi-agent systems will be near-ubiquitous, and where they won’t just look like prompt engineering on steroids.&lt;/p&gt;
&lt;p&gt; Over the last few years, we’ve increasingly seen AI systems spin up multiple LLM instances to solve problems. OpenAI has a multi-agent team which was involved in their recent IMO gold medal.1 Grok 4 Heavy involves multiple agents working in parallel on the same task. Claude Research coordinates multiple instances of Claude 4. Claude Code uses the Task tool to delegate subtasks to subagents. And over the last year, all of OpenAI, Anthropic, and Google DeepMind have had job postings looking for expertise in multi-agent systems.&lt;/p&gt;
&lt;p markdown="1"&gt;Anthropic's Claude Research is based on a multi-agent setup. (Image source)&lt;/p&gt;
&lt;p&gt; We expect this trend toward multi-agent systems to continue: as task lengths increase, the benefits of parallelization will become too large to ignore. Importantly, while these LLM instances will work in parallel, they often won’t work independently: they’ll need to coordinate to avoid stepping on each other's toes, and performance improves when they can share key context and learnings. Currently [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:51) The enormous gains from parallelization&lt;/p&gt;&lt;p&gt;(06:39) Parallel LLM instances will interact and coordinate&lt;/p&gt;&lt;p&gt;(08:13) Moving away from hard-coded multi-agent systems&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 2 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          August 22nd, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/why-future-ai-agents-will-be-trained-to-work-together?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/why-future-ai-agents-will-be-trained-to-work-together&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/why-future-ai-agents-will-be-trained-to-work-together/claude_research_multi_agent.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/why-future-ai-agents-will-be-trained-to-work-together/claude_research_multi_agent.png" alt="Anthropic’s Claude Research is based on a multi-agent setup. (Image source)" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/why-future-ai-agents-will-be-trained-to-work-together/claude_research_prompt.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/why-future-ai-agents-will-be-trained-to-work-together/claude_research_prompt.png" alt="Guidelines for determining subagent count based on query complexity levels." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 22 Aug 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">4c3a6095-7112-4579-b988-79f2d19581e1</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/4c3a6095-7112-4579-b988-79f2d19581e1.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho%252C%2520Jean-Stanislas%2520Denain&amp;title=%22Why%20future%20AI%20agents%20will%20be%20trained%20to%20work%20together%22%20by%20Anson%20Ho%2C%20Jean-Stanislas%20Denain&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fwhy-future-ai-agents-will-be-trained-to-work-together&amp;created_at=2026-05-18T14%3A04%3A31.920952%2B00%3A00&amp;duration=746" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/why-future-ai-agents-will-be-trained-to-work-together</link>
      <itunes:duration>746</itunes:duration>
    </item>
    <item>
      <title>“How much power will frontier AI training demand in 2030?” by Josh You, David Owen</title>
      <description>&lt;p&gt; Subtitle: The power required to train the largest frontier models is growing by more than 2x per year, and is on trend to reaching multiple gigawatts by 2030.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The electrical power required to train individual frontier AI models has been growing rapidly over time, driven by the growth in total training compute and the size of training clusters. Previously, we found that the power required to train a frontier model has been more than doubling every year. If trends continue, how high could these power demands become?&lt;/p&gt;&lt;p&gt; In a new white paper, “Scaling Intelligence: The Exponential Growth of AI's Power Needs”, written in collaboration with EPRI, we analyze the factors driving power growth for frontier training, and forecast this growth out to 2030. We conclude that the largest individual frontier training runs in 2030 will likely draw 4-16 gigawatts (GW) of power, or enough to power millions of US homes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Forecasting power demands using model training compute&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Power demands for frontier training runs have historically grown at a rate of 2.2x per year, with the largest runs now exceeding 100 megawatts. This has primarily been driven by frontier training compute, which has been growing at 4 [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:23) Introduction&lt;/p&gt;&lt;p&gt;(01:14) Forecasting power demands using model training compute&lt;/p&gt;&lt;p&gt;(05:16) Implications for the energy sector&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          August 11th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/power-demands-of-frontier-ai-training?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/power-demands-of-frontier-ai-training&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/power-demands-of-frontier-ai-training/projected-power-growth.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/power-demands-of-frontier-ai-training/projected-power-growth.png" alt="Figure 1: Historic trend and forecast for the power demand of the largest individual frontier training runs. The shaded interval is the 10th and 90th percentiles of our forecast based on trends in training compute, efficiency, and training run duration growth. The main source of upside uncertainty relative to the historic trend is whether limits to training duration motivate accelerated scaling in training hardware." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/power-demands-of-frontier-ai-training/forecasted-total-capacity.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/power-demands-of-frontier-ai-training/forecasted-total-capacity.png" alt="Figure 2: Projections of growth in total US AI data center capacity, based on several estimates or extrapolations from current trends, assuming the US maintains a 50% share of worldwide AI capacity. The current US baseline of 5 GW is an estimate. See the full report for details." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 11 Aug 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">22cf3c61-ac9e-41a4-b1b7-655ef1127bd6</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/22cf3c61-ac9e-41a4-b1b7-655ef1127bd6.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Josh%2520You%252C%2520David%2520Owen&amp;title=%22How%20much%20power%20will%20frontier%20AI%20training%20demand%20in%202030%3F%22%20by%20Josh%20You%2C%20David%20Owen&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fpower-demands-of-frontier-ai-training&amp;created_at=2026-05-18T16%3A52%3A39.240132%2B00%3A00&amp;duration=466" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/power-demands-of-frontier-ai-training</link>
      <itunes:duration>466</itunes:duration>
    </item>
    <item>
      <title>“We didn’t learn much from the IMO” by Greg Burnham</title>
      <description>&lt;p&gt; Subtitle: The problems gave AI only a slim chance to show new capabilities. &lt;/p&gt;  &lt;p&gt; A few weeks ago I laid out what I thought the IMO might tell us about AI math capabilities. The IMO has now happened, with Google and OpenAI both announcing experimental LLMs that solved the same 5 of the IMO's 6 problems—just enough for a gold medal. What did we learn?&lt;/p&gt;
&lt;p&gt; There was understandably a lot of excitement about the gold medals, but I think a closer look shows that this achievement tells us little about capabilities progress. This is due to bad luck: the 5 solved problems happen to be no harder than problems AI systems could already solve, and the one unsolved problem was much harder than anything any system has solved.&lt;/p&gt;
&lt;p&gt; I’ll use this post to make the case that we didn’t learn much from the IMO. I’ll take both an “outside” view, using statistics about the problems and the performance of prior AI systems, and an “inside” view, taking a closer look at the specific problems and the AI solutions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; Viewed from outside, the sample of problems looks uninformative&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The main issue is the difficulty distribution of [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:12) Viewed from outside, the sample of problems looks uninformative&lt;/p&gt;&lt;p&gt;(02:49) AI systems were already performing at this level&lt;/p&gt;&lt;p&gt;(03:12) Prior to the IMO, Deep Think had already solved a "medium-hard" problem, and older models seemed close to doing so&lt;/p&gt;&lt;p&gt;(04:30) Some currently-available models did decently on the IMO&lt;/p&gt;&lt;p&gt;(05:31) We can't even conclude that LLMs caught up to AlphaProof&lt;/p&gt;&lt;p&gt;(06:22) An inside view confirms that these easy-to-medium problems didn't require new capabilities&lt;/p&gt;&lt;p&gt;(12:21) The IMO was at least an interesting case study in reliability on hard-to-verify domains&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 8 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          August 7th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/we-didnt-learn-much-from-the-imo?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/we-didnt-learn-much-from-the-imo&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/we-didnt-learn-much-from-the-imo/GU_2025_IMO_problems_2_v2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/we-didnt-learn-much-from-the-imo/GU_2025_IMO_problems_2_v2.png" alt="Bar graph showing AI performance on 2025 IMO problems by difficulty rating." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/we-didnt-learn-much-from-the-imo/imo_p1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/we-didnt-learn-much-from-the-imo/imo_p1.png" alt="Mathematical problem about sunny lines and nonnegative integers k." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/we-didnt-learn-much-from-the-imo/imo_p1_configuration.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/we-didnt-learn-much-from-the-imo/imo_p1_configuration.png" alt="Diagram showing vertical lines with blue circles and one diagonal line." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/we-didnt-learn-much-from-the-imo/google_theorem.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/we-didnt-learn-much-from-the-imo/google_theorem.png" alt="Mathematical theorem and proof about line configurations covering points in a plane." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/we-didnt-learn-much-from-the-imo/imo_p3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/we-didnt-learn-much-from-the-imo/imo_p3.png" alt="Mathematical definition of a bonza function with divisibility condition." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/we-didnt-learn-much-from-the-imo/imo_p6.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/we-didnt-learn-much-from-the-imo/imo_p6.png" alt="Mathematical problem about placing rectangular tiles on a 2025 by 2025 grid." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 07 Aug 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">3314871e-0e21-4387-9911-bf6d11b1a322</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/3314871e-0e21-4387-9911-bf6d11b1a322.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Greg%2520Burnham&amp;title=%22We%20didn%E2%80%99t%20learn%20much%20from%20the%20IMO%22%20by%20Greg%20Burnham&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fwe-didnt-learn-much-from-the-imo&amp;created_at=2026-05-18T20%3A13%3A33.607612%2B00%3A00&amp;duration=825" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/we-didnt-learn-much-from-the-imo</link>
      <itunes:duration>825</itunes:duration>
    </item>
    <item>
      <title>“Quantifying the algorithmic improvement from reasoning models” by Anson Ho, Arden Berg</title>
      <description>&lt;p&gt; Subtitle: Reasoning models were as big of an improvement as the Transformer, at least on some benchmarks. &lt;/p&gt; 
&lt;p&gt; Almost a year ago, OpenAI introduced o1, the world's first “reasoning model”. Compared to its likely predecessor GPT-4o, o1 is more heavily optimized to do multi-step reasoning when solving problems. So it's perhaps no surprise that it does much better on common math and science benchmarks.&lt;/p&gt;
&lt;p markdown="1"&gt;o1 performs far better than GPT-4o on GPQA diamond (PhD-level multiple-choice science questions) and MATH level 5 (high-school math competition problems).1 Data is taken from Epoch AI's benchmarking hub.&lt;/p&gt;
&lt;p&gt; By itself, this performance improvement was already a big deal. But what's even more important was how it was achieved: This wasn’t achieved by using a lot more training compute. Instead, this was the byproduct of a major algorithmic innovation. o1 went through a period of “reasoning training”, where its chain-of-thought was fine-tuned on reasoning traces and optimized using reinforcement learning. This allows the model to spend more time “reasoning” before responding to user queries.&lt;/p&gt;
&lt;p&gt; But how can we quantify the importance of this algorithmic innovation? One way to do this is to interpret its importance in terms of a hypothetical increase [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:02) On GPQA, MATH, and Mock AIME, early reasoning models yielded on the order of a 10x increase in compute equivalent gain&lt;/p&gt;&lt;p&gt;(06:35) How much should we believe these estimates?&lt;/p&gt;&lt;p&gt;(08:37) A wide range of common benchmarks saw performance improvements due to reasoning&lt;/p&gt;&lt;p&gt;(12:29) Big open questions remain about generalization and test-time scaling&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 11 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          August 2nd, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/quantifying-the-algorithmic-improvement-from-reasoning-models?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/quantifying-the-algorithmic-improvement-from-reasoning-models&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/o1_outperforms_gpt4o.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/o1_outperforms_gpt4o.png" alt="o1 performs far better than GPT-4o on GPQA diamond (PhD-level multiple-choice science questions) and MATH level 5 (high-school math competition problems). __T3A_FOOTNOTE_REMOVED__ Data is taken from Epoch AI’s benchmarking hub." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/o1_4o_gpqa_ceg.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/o1_4o_gpqa_ceg.png" alt="Graph showing training compute versus performance on GPQA diamond benchmark, comparing GPT-4o and o1-high models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/o1_ceg_1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/o1_ceg_1.png" alt="Probably the most speculative pair of reasoning/non-reasoning models here is GPT-4o to o3. For example, it’s very possible that o3 was based on GPT-4.1. Nevertheless, we show the implied CEG assuming that the relevant non-reasoning model is GPT-4o, since we also consider this plausible. Note that we don’t have data on o1-high’s performance on Mock AIME." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/claude_ceg_1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/claude_ceg_1.png" alt="The CEG when transitioning from Claude 3.5 Sonnet (Oct 2024) to Claude 3.7 Sonnet, both with and without extended thinking. The versions that include extended thinking specify the token budget, e.g. “Claude 3.7 Sonnet (16K thinking)” has a budget of 16k output tokens." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/reasoning_vs_nonreasoning_token_usage.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/reasoning_vs_nonreasoning_token_usage.png" alt="Data on the average number of output tokens is taken from Epoch’s benchmarking runs." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/reasoning_vs_nonreasoning.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/reasoning_vs_nonreasoning.png" alt="Bar chart titled "Reasoning models outperformed non-reasoning models on most benchmarks in Epoch's benchmarking hub."" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/geobench_vs_aime.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/quantifying-the-algorithmic-improvement-from-reasoning-models/geobench_vs_aime.png" alt="Reasoning models seem to show little improvement on GeoBench, but substantially exceed the non-reasoning model trend on OTIS Mock AIME 2024-2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Sat, 02 Aug 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">fd275231-d561-4d70-a468-8ab6ce341bc3</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/fd275231-d561-4d70-a468-8ab6ce341bc3.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho%252C%2520Arden%2520Berg&amp;title=%22Quantifying%20the%20algorithmic%20improvement%20from%20reasoning%20models%22%20by%20Anson%20Ho%2C%20Arden%20Berg&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fquantifying-the-algorithmic-improvement-from-reasoning-models&amp;created_at=2026-05-18T14%3A04%3A32.991923%2B00%3A00&amp;duration=917" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/quantifying-the-algorithmic-improvement-from-reasoning-models</link>
      <itunes:duration>917</itunes:duration>
    </item>
    <item>
      <title>“Why China isn’t about to leap ahead of the West on compute” by Veronika Blablová, Robi Rahman</title>
      <description>&lt;p&gt; Subtitle: Chinese hardware is closing the gap, but major bottlenecks remain. &lt;/p&gt;  &lt;p&gt; We keep hearing that China is catching up with the West in AI compute. A great example of this comes from NVIDIA's CEO Jensen Huang, who recently claimed that China has made “enormous progress” in the last few years, and that “China is right behind us. We’re very, very close.”&lt;/p&gt;
&lt;p&gt; And China has indeed been making a ton of progress. As we’ll see, Chinese hardware has been closing the gap across a range of metrics relating to computational power and data transfer, both of which are crucial aspects of AI workloads.&lt;/p&gt;
&lt;p&gt; But despite this progress, we don’t think China is about to leap ahead of the West on AI compute. China's top developers—including Alibaba, ByteDance, Baidu, and DeepSeek—still rely primarily on NVIDIA chips. And major roadblocks still remain before China can leap ahead.&lt;/p&gt;
&lt;p&gt; The first bottleneck lies in chip manufacturing. U.S. export controls of chipmaking equipment make it more costly for China to produce chips at the massive scale needed for frontier model training and inference.&lt;/p&gt;
&lt;p&gt; The second bottleneck lies in China's weaker software ecosystem. Unlike NVIDIA's CUDA stack, Chinese chips operate [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:53) On paper, China's hardware is closing the gap&lt;/p&gt;&lt;p&gt;(05:47) China still has to overcome major bottlenecks in domestic AI compute&lt;/p&gt;&lt;p&gt;(05:53) In practice, reliance on Western chips persists&lt;/p&gt;&lt;p&gt;(07:01) Manufacturing Limitations and Export Controls&lt;/p&gt;&lt;p&gt;(10:09) Software Ecosystem Gaps&lt;/p&gt;&lt;p&gt;(11:21) Will China overcome these bottlenecks?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 20 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          July 26th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/why-china-isnt-about-to-leap-ahead-of-the-west-on-compute?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/why-china-isnt-about-to-leap-ahead-of-the-west-on-compute&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/why-china-isnt-about-to-leap-ahead-of-the-west-on-compute/gu-diffusion-1-v1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/why-china-isnt-about-to-leap-ahead-of-the-west-on-compute/gu-diffusion-1-v1.png" alt="A line graph titled "Chinese chips narrowly trail Western chips since 2020" showing FP16 performance trends." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/why-china-isnt-about-to-leap-ahead-of-the-west-on-compute/gu-china-2-v1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/why-china-isnt-about-to-leap-ahead-of-the-west-on-compute/gu-china-2-v1.png" alt="Bar chart titled "Chinese developers train most of their models on Western chips" showing LLM hardware origins by year." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Sat, 26 Jul 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">93f462ce-ee80-4f2d-8382-7736e7c7d20f</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/93f462ce-ee80-4f2d-8382-7736e7c7d20f.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Veronika%2520Blablov%25C3%25A1%252C%2520Robi%2520Rahman&amp;title=%22Why%20China%20isn%E2%80%99t%20about%20to%20leap%20ahead%20of%20the%20West%20on%20compute%22%20by%20Veronika%20Blablov%C3%A1%2C%20Robi%20Rahman&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fwhy-china-isnt-about-to-leap-ahead-of-the-west-on-compute&amp;created_at=2026-05-18T14%3A39%3A01.639687%2B00%3A00&amp;duration=810" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/why-china-isnt-about-to-leap-ahead-of-the-west-on-compute</link>
      <itunes:duration>810</itunes:duration>
    </item>
    <item>
      <title>“Evaluating Grok 4’s math capabilities” by Greg Burnham</title>
      <description>&lt;p&gt; Subtitle: It's good at involved computations, improving at proofs from a low base, and useful for literature search. It still favors low-level grinds and leans on background knowledge.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; xAI commissioned Epoch AI to evaluate Grok 4's math capabilities. What are its strengths and weaknesses, absolutely and relative to other models? This report goes beyond headline numbers, aiming to characterize how Grok 4 approaches mathematical tasks. Such qualitative investigation informs a broader understanding of progress: it helps identify signs of novel capabilities before they show up in headline numbers, and suggests additional benchmarks that would be useful going forward.&lt;/p&gt;&lt;p&gt; Note: while this work was compensated, Epoch maintains full editorial control over the output. We offer timely and in-depth evaluation as a service to model developers; email info@epoch.ai for details.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Executive Summary&lt;/strong&gt;&lt;/p&gt;&lt;ol&gt; 
&lt;li&gt; Grok 4 is state-of-the-art at “grinding out” solutions on medium-hard high school math competitions. ()&lt;/li&gt;
&lt;li&gt; Grok 4 is near the state-of-the-art at solving proof-based problems from challenging high school math competitions, though much headroom remains on proofs in general. ()&lt;/li&gt;
&lt;li&gt; Professional mathematicians say Grok 4 may be the best available model for mathematical literature search. ()&lt;/li&gt;
&lt;li&gt; Grok 4 shows an interesting tendency [...]&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:23) Introduction&lt;/p&gt;&lt;p&gt;(01:12) Executive Summary&lt;/p&gt;&lt;p&gt;(02:35) Methodology&lt;/p&gt;&lt;p&gt;(04:35) Grok 4 is state-of-the-art at "grinding out" solutions on medium-hard high school math competitions&lt;/p&gt;&lt;p&gt;(06:15) Solving these problems requires moderate knowledge and high diligence&lt;/p&gt;&lt;p&gt;(07:11) What does it look like to "grind out" a problem?&lt;/p&gt;&lt;p&gt;(09:11) Grok 4 is at the frontier of "grinding out" problems&lt;/p&gt;&lt;p&gt;(12:01) While a few problems on these competitions remain unsolved by AI, that probably won't remain the case for long&lt;/p&gt;&lt;p&gt;(14:14) Disregard Settings Involving Coding Tools&lt;/p&gt;&lt;p&gt;(15:06) Grok 4 is near the frontier of solving proof-based problems, but much headroom remains&lt;/p&gt;&lt;p&gt;(16:23) Self-Reported Grading Makes Interpretation Harder&lt;/p&gt;&lt;p&gt;(17:25) Solving these problems requires deeper mathematical skills&lt;/p&gt;&lt;p&gt;(18:38) Grok 4 Heavy made novel progress on a challenging USAMO problem, thanks in part to its background knowledge&lt;/p&gt;&lt;p&gt;(23:36) Grok 4 did not shine on the 2025 IMO&lt;/p&gt;&lt;p&gt;(28:58) Mathematicians say Grok 4's proof-writing abilities are hit or miss&lt;/p&gt;&lt;p&gt;(30:42) Sense Check: Grok 4 did fine on FrontierMath&lt;/p&gt;&lt;p&gt;(32:48) Grok 4 is good at mathematical literature search&lt;/p&gt;&lt;p&gt;(36:42) Grok 4 shows a tendency to catch its own mistakes&lt;/p&gt;&lt;p&gt;(36:58) Grok 4 gets my favorite "trick" question&lt;/p&gt;&lt;p&gt;(39:45) Grok 4 also usually gets a counterintuitive geometry problem&lt;/p&gt;&lt;p&gt;(42:24) That said, Grok 4 can still make simple mistakes&lt;/p&gt;&lt;p&gt;(43:26) Grok 4's mathematical reasoning is not very human-like&lt;/p&gt;&lt;p&gt;(43:52) Grok 4 relies on Cartesian coordinates where humans would use spatial intuition&lt;/p&gt;&lt;p&gt;(45:21) Grok 4 doesn't solve problems with off-the-beaten-path thinking&lt;/p&gt;&lt;p&gt;(49:33) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 13 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          July 25th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/grok-4-math?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/grok-4-math&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image1.png" alt="Graph showing "Top-scoring Models on Medium-Hard High School Math Competitions" with accuracy over time." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image2.png" alt="Graph showing a teal parabola opening upward and an orange parabola opening downward." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image3.png" alt="Geometric diagram showing triangles with points P and Q, and probability text explanation." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image4.png" alt="Geometric diagram showing rectangle ABCD with diagonal, points P and Q, and accompanying probability calculation text." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image5.png" alt="Geometry problem showing hexagon with points P and Q on a dashed line." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image6.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image6.png" alt="Mathematical text discussing probability calculation for a regular hexagon with equations and integrals." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image7.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image7.png" alt="Two triangles divided into smaller green and pink triangular sections." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image8.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image8.png" alt="Two geometric diagrams showing triangles subdivided into smaller labeled triangles." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image9.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image9.png" alt="Table comparing AI model performance with accuracy, cost, and scores across six categories." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image10.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image10.png" alt="Bar chart showing FrontierMath evaluation results comparing accuracy across AI models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image11.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image11.png" alt="Translucent green and red geometric plastic toy pieces stacked together." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image12.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image12.png" alt="Math problem showing relationship between cups and plates with equations." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image13.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image13.png" alt="Geometric diagram of triangle ABC with points D, E, F, G, M, N and quadrilateral regions." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/grok-4-math/image14.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/grok-4-math/image14.png" alt="Two grids showing purple paths marked with M, arrows indicating downward direction." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 25 Jul 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">fdd03f8d-4467-41a9-94db-0cff2062269b</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/fdd03f8d-4467-41a9-94db-0cff2062269b.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Greg%2520Burnham&amp;title=%22Evaluating%20Grok%204%E2%80%99s%20math%20capabilities%22%20by%20Greg%20Burnham&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fgrok-4-math&amp;created_at=2026-05-18T16%3A55%3A29.530328%2B00%3A00&amp;duration=3051" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/grok-4-math</link>
      <itunes:duration>3051</itunes:duration>
    </item>
    <item>
      <title>“After the ChatGPT moment: Measuring AI’s adoption” by Arden Berg, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: How quickly has AI been diffusing through the economy?&lt;/p&gt;  &lt;p&gt; In February 2023, ChatGPT made headlines for purportedly being the fastest-growing consumer app in history. It reached 100 million users within two months, years faster than both Instagram and Netflix, making it a clear example of speedy technology adoption.&lt;/p&gt;
&lt;p&gt; Two years on, work on AI has been awarded two Nobel Prizes, and major AI companies have collectively grown their annualized revenues over ten-fold to reach multi-billion-dollar scales. Two years is a long time in the world of AI.&lt;/p&gt;
&lt;p&gt; With all these changes, it's time to take a new look at the evidence on AI diffusion. How fast has AI been diffusing throughout the economy? How many people are using AI systems in the US, and how are they doing so?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; AI is being adopted faster than most technologies in history&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt; Technologies are being adopted more quickly over time&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; To put the speed of AI adoption into context, we can first look at data on other technologies as a reference point. Conveniently for us, Nicholas Felton and Karl Hartig prepared a graph that shows this for a range of technologies, ranging from electricity to the internet.&lt;/p&gt;&lt;p&gt; [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:05) AI is being adopted faster than most technologies in history&lt;/p&gt;&lt;p&gt;(01:11) Technologies are being adopted more quickly over time&lt;/p&gt;&lt;p&gt;(04:06) AI system adoption is likely faster than these historical trends would predict&lt;/p&gt;&lt;p&gt;(07:18) Average AI use has likely been increasing, but it's unclear by how much&lt;/p&gt;&lt;p&gt;(07:49) Most users don't use state-of-the-art AI systems very much, and the fraction of users that do has likely been declining&lt;/p&gt;&lt;p&gt;(10:12) The average number of tokens processed per user has probably been growing a lot&lt;/p&gt;&lt;p&gt;(11:41) Surveys provide mixed evidence about increases in the frequency of AI use&lt;/p&gt;&lt;p&gt;(12:47) Overall verdict&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 11 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          July 17th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/after-the-chatgpt-moment-measuring-ais-adoption?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/after-the-chatgpt-moment-measuring-ais-adoption&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/history-of-products.gif" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/history-of-products.gif" alt="The percentage of US households that use a technology over time, for a range of different technologies. Most of these technologies are consumer electronics that have been especially notable over the 20th century. Source: Andrew Gelman" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/gu-diffusion-1-v2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/gu-diffusion-1-v2.png" alt="Fitted and extrapolated diffusion times at different levels of adoption, as measured by the percentage of US households with the technology (e.g. electricity, microwaves). __T3A_FOOTNOTE_REMOVED__ The line of best fit for time to 10% does slope slightly upward, indicating increasing times to 10% adoption for later technologies, but this is unlikely to be significant given that we’re looking at a very small dataset. For example, using all of the technologies from Nicholas Felton and Karl Hartig (not just those that reach at least 70% adoption), we find zero slope in the line of best fit at 10% adoption." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/gu-diffusion-2-v1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/gu-diffusion-2-v1.png" alt="Line graph titled "Portion of the U.S. using ChatGPT weekly" showing growth from January 2023 to January 2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/apps-since-launch-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/apps-since-launch-1.png" alt="ChatGPT has been adopted more quickly than a wide range of other widely-used products, such as Instagram and Spotify. Source: AI Impacts." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/gu-diffusion-3-v1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/gu-diffusion-3-v1.png" alt="We use data from Ramp on the percentage of customers paying for AI by sector and weight it by the makeup of US business entities in order to mitigate bias from a tech and finance-heavy customer base. We’re still somewhat wary of this data as Ramp customers may be more broadly early adopters." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/gu-diffusion-4-v1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/gu-diffusion-4-v1.png" alt="Bar graph showing token usage for Claude versions from July 2024 to July 2025." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/pew-survey.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/after-the-chatgpt-moment-measuring-ais-adoption/pew-survey.png" alt="Table showing frequency of AI interaction across three survey periods." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 17 Jul 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">f8d16e3a-e21c-40a2-a98a-f5b853fa19c5</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/f8d16e3a-e21c-40a2-a98a-f5b853fa19c5.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Arden%2520Berg%252C%2520Anson%2520Ho&amp;title=%22After%20the%20ChatGPT%20moment%3A%20Measuring%20AI%E2%80%99s%20adoption%22%20by%20Arden%20Berg%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fafter-the-chatgpt-moment-measuring-ais-adoption&amp;created_at=2026-05-18T14%3A39%3A05.704624%2B00%3A00&amp;duration=854" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/after-the-chatgpt-moment-measuring-ais-adoption</link>
      <itunes:duration>854</itunes:duration>
    </item>
    <item>
      <title>“How to run SWE-bench Verified in one hour on one machine” by Tom Adamczewski</title>
      <description>&lt;p&gt; Subtitle: We are releasing a public registry of optimized Docker images for SWE-bench. This allows us to run SWE-bench Verified in 62 minutes on a single GitHub actions VM.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; We are releasing a public registry of Docker images for SWE-bench, to help the community run more efficient and reproducible SWE-bench evaluations. By making better use of layer caching, we reduced the total size of the registry to 67 GiB for all 2290 SWE-bench images (10x reduction), and to 30 GiB for 500 SWE-bench Verified images (6x reduction). This allows us to run SWE-bench Verified in 62 minutes on a single GitHub actions VM with 32 cores and 128GB of RAM.&lt;/p&gt;&lt;p&gt; We’re hiring an experienced engineer to lead our benchmarking efforts and be my new manager. Details at the bottom of the post.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Background&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; SWE-bench is a benchmark designed to evaluate large language models on real-world software engineering tasks. It consists of 2,294 GitHub issues from 12 popular Python repositories, paired with the actual pull requests that resolved those issues.&lt;/p&gt;&lt;p&gt; For each task, the AI system is given access to the repo in its state immediately before the pull request was merged, along with the issue description. [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:24) Introduction&lt;/p&gt;&lt;p&gt;(01:10) Background&lt;/p&gt;&lt;p&gt;(02:32) SWE-bench and Docker&lt;/p&gt;&lt;p&gt;(04:46) Docker layering&lt;/p&gt;&lt;p&gt;(06:48) Anatomy of a SWE-bench Dockerfile&lt;/p&gt;&lt;p&gt;(08:52) Moving the git clone operation&lt;/p&gt;&lt;p&gt;(11:28) Should the git history be included?&lt;/p&gt;&lt;p&gt;(12:32) The matplotlib 1.9 GB top layer&lt;/p&gt;&lt;p&gt;(13:35) Disabling the pip cache (or how to go insane)&lt;/p&gt;&lt;p&gt;(16:46) Impact on size&lt;/p&gt;&lt;p&gt;(19:23) Running SWE-bench Verified in about an hour&lt;/p&gt;&lt;p&gt;(21:09) How to use our image registry&lt;/p&gt;&lt;p&gt;(22:02) Come be my boss?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 4 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          July 10th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/swebench-docker?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/swebench-docker&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/swebench-docker/img_1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/swebench-docker/img_1.png" alt="The dive tool output for the Django 13371 image." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/swebench-docker/img_3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/swebench-docker/img_3.png" alt="The dive output for the Django 13371 image, focused on the final (topmost) layer." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/swebench-docker/img_2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/swebench-docker/img_2.png" alt="The optimized final layer reduced from 330MB to 40MB after moving the git clone operation to the env stage." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/swebench-docker/img_4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/swebench-docker/img_4.png" alt="GitHub comment from 2016 expressing surprise that one needs to set PIP_NO_CACHE_DIR=0 to disable the pip cache." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/swebench-docker/img_5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/swebench-docker/img_5.png" alt="GitHub comment from 2018 discussing backward compatibility." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/swebench-docker/image-20250619172345623.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/swebench-docker/image-20250619172345623.png" alt="Screenshot showing reported disk space requirements for SWE-bench evaluations." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/swebench-docker/img.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/swebench-docker/img.png" alt="Pie chart showing "Distribution of SWE-bench tasks" across 12 GitHub repositories." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 10 Jul 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">5a6699d4-ec21-4193-9487-ceee998e853d</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/5a6699d4-ec21-4193-9487-ceee998e853d.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Tom%2520Adamczewski&amp;title=%22How%20to%20run%20SWE-bench%20Verified%20in%20one%20hour%20on%20one%20machine%22%20by%20Tom%20Adamczewski&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fswebench-docker&amp;created_at=2026-05-18T16%3A52%3A41.567423%2B00%3A00&amp;duration=1387" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/swebench-docker</link>
      <itunes:duration>1387</itunes:duration>
    </item>
    <item>
      <title>“What will the IMO tell us about AI math capabilities?” by Greg Burnham</title>
      <description>&lt;p&gt; Subtitle: Most discussion about AI and the IMO focuses on gold medals, but that's not the thing to pay most attention to.&lt;/p&gt;  &lt;p&gt; This year's International Mathematical Olympiad (IMO) will take place on July 15th and 16th in Sunshine Coast, Australia. It is the pinnacle of high school math competitions. Much like the Olympic Games, the stakes are national pride and personal glory.&lt;/p&gt;
&lt;p&gt; AI model developers must be gauging their own chances for pride and glory. No AI system has yet achieved a score equivalent to an IMO gold medal, much less a perfect score. Could this be the year?&lt;/p&gt;
&lt;p&gt; In this post, I’ll say what I think different results might mean for AI math capabilities. In particular, I think there are some important distinctions between results that might generate hype and results that will actually tell us something new.&lt;/p&gt;
&lt;p&gt; Here are the key background facts I have in mind.&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; The baseline is high. Google's AlphaProof1, a specialized system that outputs formal math proofs, solved 4/6 problems on the 2024 IMO. On the 2025 USAMO, a contest similar to the IMO, the best general-purpose LLMs can already solve 2/6 problems. Progress means doing better [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(04:41) IMO Background&lt;/p&gt;&lt;p&gt;(07:52) AlphaProof already set a high bar, but some key abilities were missing&lt;/p&gt;&lt;p&gt;(08:35) AlphaProof solved its hardest problem in a surprisingly uninteresting way&lt;/p&gt;&lt;p&gt;(10:08) AlphaProof didn't solve problems that required more creativity&lt;/p&gt;&lt;p&gt;(11:18) Geometry won't tell us much&lt;/p&gt;&lt;p&gt;(12:05) If an AlphaProof-like system scores well, we'll have to look at the specific problems&lt;/p&gt;&lt;p&gt;(14:12) Closing Notes on AlphaProof&lt;/p&gt;&lt;p&gt;(15:18) General-purpose LLMs have more headroom&lt;/p&gt;&lt;p&gt;(16:12) Deep Think suggests we'll see something better than this&lt;/p&gt;&lt;p&gt;(18:19) All bets are off if obscure background knowledge cracks problems&lt;/p&gt;&lt;p&gt;(19:30) Geometry still probably won't tell us much&lt;/p&gt;&lt;p&gt;(20:26) Closing Notes on LLMs&lt;/p&gt;&lt;p&gt;(21:01) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 18 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          July 8th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/what-will-the-imo-tell-us-about-ai-math-capabilities?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/what-will-the-imo-tell-us-about-ai-math-capabilities&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/what-will-the-imo-tell-us-about-ai-math-capabilities/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/what-will-the-imo-tell-us-about-ai-math-capabilities/figure-1.png" alt="Bar graphs showing participant scores across six problems, with varying success rates." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/what-will-the-imo-tell-us-about-ai-math-capabilities/IMO_GU_2_v3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/what-will-the-imo-tell-us-about-ai-math-capabilities/IMO_GU_2_v3.png" alt="Bar graph comparing AlphaProof and human scores on 2024 IMO problems across six categories." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 08 Jul 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">f1b2c26d-2db3-4665-90ca-4306ffec58e6</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/f1b2c26d-2db3-4665-90ca-4306ffec58e6.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Greg%2520Burnham&amp;title=%22What%20will%20the%20IMO%20tell%20us%20about%20AI%20math%20capabilities%3F%22%20by%20Greg%20Burnham&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fwhat-will-the-imo-tell-us-about-ai-math-capabilities&amp;created_at=2026-05-18T14%3A39%3A06.615956%2B00%3A00&amp;duration=1310" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/what-will-the-imo-tell-us-about-ai-math-capabilities</link>
      <itunes:duration>1310</itunes:duration>
    </item>
    <item>
      <title>“How big could an “AI Manhattan Project” get?” by Arden Berg, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: An AI Manhattan Project could accelerate compute scaling by two years.&lt;/p&gt;  &lt;p&gt; Over the last year, the possibility of an AI national project has steadily grown.&lt;/p&gt;
&lt;p&gt; In November, the US-China Economic and Security Review Commission listed that its top recommendation to Congress was to “establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence capability.” Over the last few months, the US Department of Energy has also repeatedly compared AI to the Manhattan Project and indicated that it would use its power to help the project succeed, recently tweeting this:&lt;/p&gt;

&lt;p&gt; But what would a “Manhattan Project for AI” actually entail? It's not entirely clear, but we think that three distinct features capture much of the essence of what people are referring to:&lt;/p&gt;
&lt;ol&gt; 
&lt;li&gt; It's a project initiated by the US government&lt;/li&gt;
&lt;li&gt; Private sector AI resources (e.g. compute) are consolidated&lt;/li&gt;
&lt;li&gt; Total compute investments reach a similar fraction of US GDP as the peak of the Manhattan Project or the Apollo program&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt; In addition to these core properties, for the purposes of this analysis we focus primarily on the physical bottlenecks to this scaling, thus assuming [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:55) How much compute could a national project muster?&lt;/p&gt;&lt;p&gt;(05:44) Will there be enough power to support this?&lt;/p&gt;&lt;p&gt;(08:23) Discussion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 14 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          July 2nd, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/how-big-could-an-ai-manhattan-project-get?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/how-big-could-an-ai-manhattan-project-get&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-big-could-an-ai-manhattan-project-get/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-big-could-an-ai-manhattan-project-get/figure-1.png" alt="U.S. Department of Energy tweets: "AI is the next Manhattan Project, and THE UNITED STATES WILL WIN. 🇺🇸"." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-big-could-an-ai-manhattan-project-get/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-big-could-an-ai-manhattan-project-get/figure-3.png" alt="A stacked bar chart showing projected investment costs as percentage of GDP from initial through 2027." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-big-could-an-ai-manhattan-project-get/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-big-could-an-ai-manhattan-project-get/figure-2.png" alt="Visualization comparing AI model sizes from GPT-2 to potential 2027 Manhattan Project scale model." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 02 Jul 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">c543dd3b-dc30-480c-86b4-b2f5cf8cc2f7</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/c543dd3b-dc30-480c-86b4-b2f5cf8cc2f7.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Arden%2520Berg%252C%2520Anson%2520Ho&amp;title=%22How%20big%20could%20an%20%E2%80%9CAI%20Manhattan%20Project%E2%80%9D%20get%3F%22%20by%20Arden%20Berg%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fhow-big-could-an-ai-manhattan-project-get&amp;created_at=2026-05-18T14%3A39%3A07.732406%2B00%3A00&amp;duration=704" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/how-big-could-an-ai-manhattan-project-get</link>
      <itunes:duration>704</itunes:duration>
    </item>
    <item>
      <title>“AI and explosive growth redux” by Andrei Potlogea, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: GATE model shows AI-driven growth surges more easily than expected and supports much larger investments—advocating moderate optimism.&lt;/p&gt;  &lt;p&gt; The debate around the macroeconomic effects of AI has shown no sign of convergence.&lt;/p&gt;
&lt;p&gt; On the one hand, renowned economists like Daron Acemoglu envision that AI will only increase US GDP by &amp;lt;2% over ten years. On the other hand, others argue that AI could plausibly drive “explosive growth”, with GWP growth rates north of 30% per year.&lt;/p&gt;
&lt;p&gt; So who's right? To shed light on this debate, we recently released the Growth and AI Transition Endogenous (GATE) model, an integrated assessment model of AI automation designed to bridge the gap between economists and AI practitioners. But while we discussed how the model is laid out on a technical level, we’ve yet to detail how to interpret the model's predictions, and our most substantial takeaways from the model.&lt;/p&gt;
&lt;p&gt; As such, in this post we’ll explain our two biggest updates from our work on GATE. Importantly, these are qualitative updates – the model was designed to provide high-level qualitative insights, not make precise quantitative predictions:&lt;/p&gt;
&lt;ol&gt; 
&lt;li&gt; Significant AI-driven growth accelerations happen more easily than we thought: Skeptics of [...]&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:15) 1. Significant growth accelerations more plausible than we thought, as Baumol effects are overrated&lt;/p&gt;&lt;p&gt;(07:02) 2. We underestimated just by how much the world could be underinvesting in AI today&lt;/p&gt;&lt;p&gt;(10:04) A blow against the skeptics, many blows against the overconfident&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 11 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 20th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/ai-and-explosive-growth-redux?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/ai-and-explosive-growth-redux&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/ai-and-explosive-growth-redux/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/ai-and-explosive-growth-redux/figure-1.png" alt="A line graph titled "Growth rates by automation level" showing yearly growth rate versus fraction of tasks automated." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/ai-and-explosive-growth-redux/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/ai-and-explosive-growth-redux/figure-2.png" alt="Graph showing average growth rate from 2025-2030 versus substitution parameter." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/ai-and-explosive-growth-redux/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/ai-and-explosive-growth-redux/figure-3.png" alt="GATE A is the result of default simulations, whereas GATE B is the default but modified to account for positive externalities in AI development, investor uncertainty, and labor reallocation frictions. __T3A_FOOTNOTE_REMOVED__" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/ai-and-explosive-growth-redux/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/ai-and-explosive-growth-redux/figure-4.png" alt="Graph showing AI investment and value generated as fractions of output over time." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 20 Jun 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">fb0f9b1a-bf5b-40a9-a075-a9cbdd8ed704</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/fb0f9b1a-bf5b-40a9-a075-a9cbdd8ed704.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Andrei%2520Potlogea%252C%2520Anson%2520Ho&amp;title=%22AI%20and%20explosive%20growth%20redux%22%20by%20Andrei%20Potlogea%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fai-and-explosive-growth-redux&amp;created_at=2026-05-18T20%3A13%3A52.040744%2B00%3A00&amp;duration=804" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/ai-and-explosive-growth-redux</link>
      <itunes:duration>804</itunes:duration>
    </item>
    <item>
      <title>“Inference economics of language models” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: We investigate how speed trades off against cost in language model inference. We find that inference latency scales with the square root of model size and the cube root of memory bandwidth, and other results.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; As the capabilities of AI models have expanded, and as the recent paradigm of test-time compute scaling has taken off, the demand for AI inference has grown enormously. Inference revenue at major AI companies such as OpenAI and Anthropic has been growing at a rate of 3x per year or more, even as their models continue to become smaller and cheaper compared to 2023.&lt;/p&gt;&lt;p&gt; A few years ago, the benchmark for whether a language model was fast enough was “human reading speed”: if a model could generate 10 tokens per second when responding to a user, that was good enough. Now, as models are asked to reason at length about complex problems and are placed inside elaborate agentic loops, this benchmark has become obsolete. The benefits to serving models faster for inference are greater than ever before. Despite this, there has been little work investigating how language models can be served quickly at scale and how much we can increase their [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:25) Introduction&lt;/p&gt;&lt;p&gt;(01:51) How does the model work?&lt;/p&gt;&lt;p&gt;(03:56) Some takeaways from the model&lt;/p&gt;&lt;p&gt;(07:07) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 17th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/inference-economics-of-language-models?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/inference-economics-of-language-models&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/inference-economics-of-language-models/inference-economics-of-language-models.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/inference-economics-of-language-models/inference-economics-of-language-models.png" alt="Graph showing cost per million tokens versus tokens per second for various AI models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 17 Jun 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">d3b5ae21-1f60-4945-b37c-e56394215be3</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/d3b5ae21-1f60-4945-b37c-e56394215be3.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22Inference%20economics%20of%20language%20models%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Finference-economics-of-language-models&amp;created_at=2026-05-18T16%3A52%3A42.426859%2B00%3A00&amp;duration=461" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/inference-economics-of-language-models</link>
      <itunes:duration>461</itunes:duration>
    </item>
    <item>
      <title>“Do the biorisk evaluations of AI labs actually measure the risk of developing bioweapons?” by Anson Ho, Arden Berg</title>
      <description>&lt;p&gt; Subtitle: Assessing if AI labs' biorisk evaluations effectively measure models' potential to enable amateur bioweapons development.&lt;/p&gt;  &lt;p&gt; With the recent release of Claude Opus 4, Anthropic activated their AI Safety Level 3 protections. This threshold was designed to pertain to models that can significantly help individuals or groups with basic technical backgrounds create/obtain and deploy CBRN weapons, such as pandemic bioweapons.&lt;/p&gt;
&lt;p&gt; Their reasoning was as follows:&lt;/p&gt;

&lt;p&gt; “We are deploying Claude Opus 4 with our ASL-3 measures as a precautionary and provisional action. […] due to continued improvements in CBRN-related knowledge and capabilities, we have determined that clearly ruling out ASL-3 risks is not possible”.&lt;/p&gt;

&lt;p&gt; But how exactly did they come to this conclusion? And more generally, do existing AI biorisk evaluations provide strong evidence of whether LLMs can aid amateurs in developing bioweapons?&lt;/p&gt;
&lt;p&gt; To answer these questions, we analyzed the biorisk evaluations (or lack thereof) of 8 notable AI labs. Here's what we found:&lt;/p&gt;
&lt;ol&gt; 
&lt;li&gt; Publicly described benchmarks are common but saturate rapidly, with uncertain implications for biorisk: The most common LLM biorisk evaluations reported in the most recent model cards are publicly described benchmarks (i.e. those that have clearly described in a public [...]&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:28) 1. Publicly described benchmarks are common but saturate rapidly, with uncertain implications for biorisk&lt;/p&gt;&lt;p&gt;(10:08) 2. We know little about most other AI biorisk evaluations&lt;/p&gt;&lt;p&gt;(10:14) Evaluations are generally light on detail, and often by design&lt;/p&gt;&lt;p&gt;(13:24) In practice, many biorisk evaluations only tell us about one model&lt;/p&gt;&lt;p&gt;(16:06) 3. Existing evaluations do not fully address the positions most skeptical of LLM-driven biorisk&lt;/p&gt;&lt;p&gt;(16:58) The need for somatic tacit knowledge&lt;/p&gt;&lt;p&gt;(18:57) The importance of infrastructure access&lt;/p&gt;&lt;p&gt;(20:09) These objections are valid but do not rule out AI biorisk concerns&lt;/p&gt;&lt;p&gt;(21:29) Discussion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 20 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 13th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/do-the-biorisk-evaluations-of-ai-labs-actually-measure-the-risk-of-developing-bioweapons?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/do-the-biorisk-evaluations-of-ai-labs-actually-measure-the-risk-of-developing-bioweapons&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/do-the-biorisk-evaluations-of-ai-labs-actually-measure-the-risk-of-developing-bioweapons/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/do-the-biorisk-evaluations-of-ai-labs-actually-measure-the-risk-of-developing-bioweapons/figure-1.png" alt="Figure 1: Data on GPQA-Bio, LitQA2, PubMedQA, and MMLU-Bio were taken from Justen 2025 including human baseline data. Data on Cloning Scenarios and ProtocolQA taken from Justen 2025, the Claude Opus 4 and Claude Sonnet 4 system card, and Laurent et al. 2024 including human baseline data. Data on WMDP-Bio were taken from Justen 2025, the Gemini 2.5 Pro Preview Model Card, and Dev et al. 2025 including human baseline data. Data on FigQA and SeqQA were taken from the Claude Opus 4 and Claude Sonnet 4 system card and Laurent et al. 2024 including human baseline data. Data on WMDP-Chem were taken from Dev et al. 2025 and the Gemini 2.5 Pro Preview Model Card. Data on VCT were taken from Götting et al. 2025. WMDP-Bio and WMDP-Chem are 1,273-question and 408-question subsections of the WMDP benchmark." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/do-the-biorisk-evaluations-of-ai-labs-actually-measure-the-risk-of-developing-bioweapons/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/do-the-biorisk-evaluations-of-ai-labs-actually-measure-the-risk-of-developing-bioweapons/figure-2.png" alt="Text excerpt discussing threat analysis uplift thresholds and risk levels." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/do-the-biorisk-evaluations-of-ai-labs-actually-measure-the-risk-of-developing-bioweapons/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/do-the-biorisk-evaluations-of-ai-labs-actually-measure-the-risk-of-developing-bioweapons/figure-3.png" alt="Figure 3: Ticks correspond to LLM releases where the authors report doing biorisk evals, and give at least 1-2 lines of description about their methodology and results. Crosses correspond to LLM releases that do not meet this threshold." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 13 Jun 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">3bf0a899-4b36-489e-b803-e6e08c0e2025</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/3bf0a899-4b36-489e-b803-e6e08c0e2025.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho%252C%2520Arden%2520Berg&amp;title=%22Do%20the%20biorisk%20evaluations%20of%20AI%20labs%20actually%20measure%20the%20risk%20of%20developing%20bioweapons%3F%22%20by%20Anson%20Ho%2C%20Arden%20Berg&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fdo-the-biorisk-evaluations-of-ai-labs-actually-measure-the-risk-of-developing-bioweapons&amp;created_at=2026-05-18T20%3A14%3A12.309814%2B00%3A00&amp;duration=1550" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/do-the-biorisk-evaluations-of-ai-labs-actually-measure-the-risk-of-developing-bioweapons</link>
      <itunes:duration>1550</itunes:duration>
    </item>
    <item>
      <title>“What skills does SWE-bench Verified evaluate?” by Florian Brand, Jean-Stanislas Denain</title>
      <description>&lt;p&gt; Subtitle: We take a deep dive into SWE-bench Verified, a prominent agentic coding benchmark. While one of the best public tests of AI coding agents, it is limited by its focus on simple bug fixes in familiar open-source repositories.&lt;/p&gt;    SWE-bench Verified Coding   &lt;p&gt; SWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub. Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem.&lt;/p&gt;  &lt;ul data-astro-cid-dga2kyfb=""&gt; &lt;li data-astro-cid-dga2kyfb=""&gt; 
 Size: 500 Python-only coding problems with issue descriptions &lt;/li&gt; &lt;li data-astro-cid-dga2kyfb=""&gt; 
 Data sourcing: Scraping of GitHub issues followed by human filtering &lt;/li&gt; &lt;li data-astro-cid-dga2kyfb=""&gt; 
 Scoring method: Unit tests &lt;/li&gt; &lt;li data-astro-cid-dga2kyfb=""&gt; 
 Contamination risk: High &lt;/li&gt; &lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt; Main takeaways&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; 
&lt;p&gt; SWE-bench Verified tests AI's real-world agentic coding skills, the kind required for coding tools like Cursor or Claude Code ().&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt; 
&lt;p&gt; Most of the problems are relatively simple, needing less than 1 hour to fix for a human engineer ().&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt; 
&lt;p&gt; The benchmark has a high contamination risk (), and the tests might not generalize well to real-world, closed-source codebases ().&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt; 
&lt;p&gt; The scaffold built around a model [...]&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:01) Main takeaways&lt;/p&gt;&lt;p&gt;(01:46) Introduction&lt;/p&gt;&lt;p&gt;(03:07) Anatomy of a benchmark sample&lt;/p&gt;&lt;p&gt;(04:14) How models are evaluated&lt;/p&gt;&lt;p&gt;(04:47) The error rate in SWE-bench Verified is relatively low&lt;/p&gt;&lt;p&gt;(07:34) Most tasks are simple bug fixes&lt;/p&gt;&lt;p&gt;(11:35) The low diversity of codebases limits external validity&lt;/p&gt;&lt;p&gt;(12:50) Half the benchmark tests issues from before 2020&lt;/p&gt;&lt;p&gt;(15:08) Scaffolds matter as much as models&lt;/p&gt;&lt;p&gt;(17:18) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 6 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 13th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/what-skills-does-swe-bench-verified-evaluate?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/what-skills-does-swe-bench-verified-evaluate&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-skills-does-swe-bench-verified-evaluate/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-skills-does-swe-bench-verified-evaluate/figure-1.png" alt="Two bar charts titled "Most issues can be fixed quickly" showing time to fix issues and code changes." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-skills-does-swe-bench-verified-evaluate/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-skills-does-swe-bench-verified-evaluate/figure-2.png" alt="GitHub issue page showing PythonCodePrinter doesn't support Indexed operation." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-skills-does-swe-bench-verified-evaluate/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-skills-does-swe-bench-verified-evaluate/figure-3.png" alt="Bar chart titled "A few repositories dominate the benchmark" showing percentage distribution across Python repositories." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-skills-does-swe-bench-verified-evaluate/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-skills-does-swe-bench-verified-evaluate/figure-4.png" alt="Bar graph titled "The majority of issues are from the last 5 years" showing PR creation dates." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/what-skills-does-swe-bench-verified-evaluate/figure-5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/what-skills-does-swe-bench-verified-evaluate/figure-5.png" alt="Bug tracking issue showing optimization request for Django delete method to use only required fields." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 13 Jun 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">a011a7f5-30ac-4a7a-9044-a3fb602d7233</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/a011a7f5-30ac-4a7a-9044-a3fb602d7233.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Florian%2520Brand%252C%2520Jean-Stanislas%2520Denain&amp;title=%22What%20skills%20does%20SWE-bench%20Verified%20evaluate%3F%22%20by%20Florian%20Brand%2C%20Jean-Stanislas%20Denain&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fwhat-skills-does-swe-bench-verified-evaluate&amp;created_at=2026-05-18T16%3A52%3A43.261981%2B00%3A00&amp;duration=1155" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/what-skills-does-swe-bench-verified-evaluate</link>
      <itunes:duration>1155</itunes:duration>
    </item>
    <item>
      <title>“Beyond benchmark scores: Analyzing o3-mini’s mathematical reasoning” by Anson Ho, Jean-Stanislas Denain, Elliot Glazer</title>
      <description>&lt;p&gt; Subtitle: Examining o3-mini's math reasoning: an erudite, vibes-based solver that excels in knowledge but lacks precision, creativity, and formal human rigor.&lt;/p&gt;  &lt;p&gt; If you’re reading this, you’ll no doubt have heard of the impressive progress that state-of-the-art language models have been able to make in solving math problems. For instance, we recently found that o4-mini outperformed the average team of mathematicians in our human baseline competition.&lt;/p&gt;
&lt;p&gt; However, these numbers alone provide limited insight into what exactly these models are or aren’t able to do, and why. How do reasoning models solve complex math problems? Do they reason similarly to human mathematicians? And where do they fall short?&lt;/p&gt;
&lt;p&gt; To answer these questions, we asked fourteen mathematicians to analyze 29 of o3-mini-high's raw, unsummarized reasoning traces on FrontierMath problems, which OpenAI shared with us.1 Our goal in this post is to share the main takeaways from this survey, and discuss what this means for future developments at the intersection of AI and math.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; How does o3-mini-high solve FrontierMath problems?&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt; Extreme erudition – and it's not just memorization&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Out of the 29 reasoning traces, 13 of them resulted in a correct response – but how does o3-mini-high solve these [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:23) How does o3-mini-high solve FrontierMath problems?&lt;/p&gt;&lt;p&gt;(01:28) Extreme erudition - and it's not just memorization&lt;/p&gt;&lt;p&gt;(02:57) A "vibes-based inductive reasoner"&lt;/p&gt;&lt;p&gt;(04:11) Where o3-mini-high fails&lt;/p&gt;&lt;p&gt;(04:14) Lack of precision&lt;/p&gt;&lt;p&gt;(06:15) Lack of creativity and depth of understanding&lt;/p&gt;&lt;p&gt;(07:51) Hallucinations&lt;/p&gt;&lt;p&gt;(08:57) Does o3-mini-high reason like a human mathematician?&lt;/p&gt;&lt;p&gt;(10:48) Discussion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 6 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 6th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/beyond-benchmark-scores-analysing-o3-mini-math-reasoning?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/beyond-benchmark-scores-analysing-o3-mini-math-reasoning&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/beyond-benchmark-scores-analysing-o3-mini-math-reasoning/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/beyond-benchmark-scores-analysing-o3-mini-math-reasoning/figure-1.png" alt="Figure 1: The reviewing mathematicians generally found that o3-mini-high was decent at invoking relevant results from the mathematical literature, achieving a rating of 3/5 or higher on around two thirds of problems." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/beyond-benchmark-scores-analysing-o3-mini-math-reasoning/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/beyond-benchmark-scores-analysing-o3-mini-math-reasoning/figure-2.png" alt="Figure 2: “Cheesing” the problem (not solving the problem as intended) was fairly common, but more often than not o3-mini-high correctly solved the problem without any cheesing at all (i.e. a score of 5). Note that this graph only pertains to reasoning traces where o3-mini-high correctly answered the problem in question." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/beyond-benchmark-scores-analysing-o3-mini-math-reasoning/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/beyond-benchmark-scores-analysing-o3-mini-math-reasoning/figure-3.png" alt="Figure 3: Only around 18% of the cases where o3-mini-high arrived at an incorrect solution were very close to being correct – overall there was more of a spread in how correct the reasoning was." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/beyond-benchmark-scores-analysing-o3-mini-math-reasoning/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/beyond-benchmark-scores-analysing-o3-mini-math-reasoning/figure-4.png" alt="Figure 4: Mathematician ratings of how human-like o3-mini-high reasoning is. A score of 1 corresponds to reasoning that is not human-like at all, and a score of 5 corresponds to reasoning that is indistinguishable from a human mathematician." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 06 Jun 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">0c809b5c-f6e4-4bf3-9def-8fdfce4fa2ab</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/0c809b5c-f6e4-4bf3-9def-8fdfce4fa2ab.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho%252C%2520Jean-Stanislas%2520Denain%252C%2520Elliot%2520Glazer&amp;title=%22Beyond%20benchmark%20scores%3A%20Analyzing%20o3-mini%E2%80%99s%20mathematical%20reasoning%22%20by%20Anson%20Ho%2C%20Jean-Stanislas%20Denain%2C%20Elliot%20Glazer&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fbeyond-benchmark-scores-analysing-o3-mini-math-reasoning&amp;created_at=2026-05-18T14%3A39%3A08.856196%2B00%3A00&amp;duration=768" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/beyond-benchmark-scores-analysing-o3-mini-math-reasoning</link>
      <itunes:duration>768</itunes:duration>
    </item>
    <item>
      <title>“What is Epoch?” by Jaime Sevilla</title>
      <description>&lt;p&gt; Subtitle: Our director explains Epoch AI's mission and how we decide our priorities. In short, we work on projects to understand the trajectory of AI, share this knowledge publicly, and inform important decisions about AI.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Since we started Epoch three years ago, we have engaged in hundreds of projects and achieved a wide audience. Yet, one question I often get asked is, ‘What is Epoch?’&lt;/p&gt;&lt;p&gt; In a way, this is an easy question to answer. We are a nonprofit research organization with the mission of improving society's understanding of the trajectory of AI. Simply put, we are doing what we can so that decisions about AI are informed by the best possible evidence.&lt;/p&gt;&lt;p&gt; To achieve this, we are curating data and conducting high-quality research into some of the most significant trends in AI. We share most of this work publicly, aimed at a broad audience, including AI policy experts, journalists and AI developers. Importantly, we are committed to always sharing what the data says, rather than tailoring it to fit a narrative.&lt;/p&gt;&lt;p&gt; We work on this mission because we believe that if we all collectively know more about AI, we will make better decisions on average. I [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:24) Introduction&lt;/p&gt;&lt;p&gt;(01:57) What we do&lt;/p&gt;&lt;p&gt;(02:46) We curate and analyze data on AI trends&lt;/p&gt;&lt;p&gt;(04:21) We develop benchmarks to measure advanced AI capabilities&lt;/p&gt;&lt;p&gt;(06:03) We provide independent evaluations of AI models&lt;/p&gt;&lt;p&gt;(06:54) We provide consultations and commissioned research&lt;/p&gt;&lt;p&gt;(10:30) What we are not&lt;/p&gt;&lt;p&gt;(11:04) We are not an AI development company&lt;/p&gt;&lt;p&gt;(12:00) We are not an AI policy think tank&lt;/p&gt;&lt;p&gt;(13:05) We are not a company incubator&lt;/p&gt;&lt;p&gt;(14:18) Closing words&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 5th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/what-is-epoch?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/what-is-epoch&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Thu, 05 Jun 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">cbce6194-8640-4a36-b511-fde1c4460ec7</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/cbce6194-8640-4a36-b511-fde1c4460ec7.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla&amp;title=%22What%20is%20Epoch%3F%22%20by%20Jaime%20Sevilla&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fwhat-is-epoch&amp;created_at=2026-05-18T16%3A52%3A44.27949%2B00%3A00&amp;duration=905" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/what-is-epoch</link>
      <itunes:duration>905</itunes:duration>
    </item>
    <item>
      <title>“GPQA Diamond: What’s left?” by Greg Burnham</title>
      <description>&lt;p&gt; Subtitle: Investigate GPQA Diamond benchmark's validity: uncover flawed questions, model challenges, and why it still informs AI evaluation.&lt;/p&gt;  &lt;p&gt; A specter looms whenever AI systems approach 100% on a benchmark: what if the rest of the benchmark is flawed?&lt;/p&gt;
&lt;p&gt; Recently, these concerns have been levied at GPQA Diamond, a popular benchmark consisting of graduate-level multiple-choice science questions. Scores from state-of-the-art models are clustered in a narrow band, around 83%.1 This led one of the creators of the benchmark to speculate that there's something wrong with the other 17%.&lt;/p&gt;
&lt;p&gt; I’ll use this post to investigate. I’ll start by looking at the questions that models consistently get wrong: is something wrong with these questions, or are they just hard for the models? To assess this, I’ll look at the scientific subdomains of these questions and then go through a small set of outliers in more detail.&lt;/p&gt;
&lt;p&gt; All in all, I think it's likely that at least 90% of the benchmark is valid: GPQA Diamond has a bit more juice left. Regardless of that conclusion, though, I also just think it's good to dig into benchmarks. This post is as much about the journey as the destination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; What [...]&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:28) What could be wrong with GPQA Diamond?&lt;/p&gt;&lt;p&gt;(02:53) Models Tend to Get the Same Questions Wrong&lt;/p&gt;&lt;p&gt;(04:16) Most Unsolved Questions Are in Organic Chemistry&lt;/p&gt;&lt;p&gt;(05:32) Deep Dive: The Most Consistently Wrong Questions&lt;/p&gt;&lt;p&gt;(06:15) Guess the Pattern&lt;/p&gt;&lt;p&gt;(08:06) Know-How Questions&lt;/p&gt;&lt;p&gt;(10:43) Computational Questions, With a Twist&lt;/p&gt;&lt;p&gt;(13:06) Silver Fluorides&lt;/p&gt;&lt;p&gt;(14:24) Reports of Its Death Are Probably Exaggerated&lt;/p&gt;&lt;p&gt;(16:28) If Not Now, Soon&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 6 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 30th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/gpqa-diamond-whats-left?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/gpqa-diamond-whats-left&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-1.png" alt="Graph showing average accuracy across GPQA Diamond questions ordered by difficulty." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-2.png" alt="Algorithm showing input-output examples: AGG maps to 115, TGCTGA to 176, asking for ACAGTGACC's value." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-3.png" alt="Multiple choice question about common sources of errors in genomics data analysis." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-4.png" alt="Multiple choice question about in silico docking studies for Xantheraquin molecule." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-5.png" alt="Physics problem about stellar composition and elemental abundance ratios with multiple choice answers." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-6.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-6.png" alt="Multiple choice chemistry problem about dissolving iron hydroxide using strong acid." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-7.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/gpqa-diamond-whats-left/figure-7.png" alt="Multiple choice chemistry problem about fluorine compounds with element Y." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 30 May 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">5678bd92-4856-403e-8fea-2b4916933b45</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/5678bd92-4856-403e-8fea-2b4916933b45.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Greg%2520Burnham&amp;title=%22GPQA%20Diamond%3A%20What%E2%80%99s%20left%3F%22%20by%20Greg%20Burnham&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fgpqa-diamond-whats-left&amp;created_at=2026-05-18T20%3A14%3A33.996816%2B00%3A00&amp;duration=1074" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/gpqa-diamond-whats-left</link>
      <itunes:duration>1074</itunes:duration>
    </item>
    <item>
      <title>“How many AI models will exceed compute thresholds?” by Ben Cottier, David Owen</title>
      <description>&lt;p&gt; Subtitle: We project how many notable AI models will exceed training compute thresholds, with results accessible in an interactive tool. Model counts rapidly increase from 10 above 1e26 FLOP by 2026, to over 200 by 2030.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Executive summary&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The compute used to train AI models has been a key driver of AI progress, informing many predictions of AI's future capabilities. However, the number of AI models that will surpass different compute levels has received less attention. This is relevant to compute-based AI regulation, as well as AI development and deployment more broadly. We develop a projective model that relates key inputs such as investment and the distribution of compute to the number of notable AI models: models that are state of the art, highly cited, or otherwise historically notable. The projections can be explored in a new interactive tool.&lt;/p&gt;&lt;p&gt; There's a chart here. The chart title reads: en-US-AvaMultilingualNeural__ Cumulative number of notable AI models by year &lt;/p&gt;&lt;p&gt; Our modeling shows that the number of notable AI models above a given compute threshold rapidly accelerates over time. For example, the first model in our dataset estimated to use over 10 to the 26 FLOP was Grok-3 from xAI, released [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:30) Executive summary&lt;/p&gt;&lt;p&gt;(03:41) Introduction&lt;/p&gt;&lt;p&gt;(05:55) Methodology&lt;/p&gt;&lt;p&gt;(05:58) Overview&lt;/p&gt;&lt;p&gt;(09:25) Dataset and inclusion criteria&lt;/p&gt;&lt;p&gt;(13:14) Scenarios based on AI investment and model development&lt;/p&gt;&lt;p&gt;(18:22) Investment in the largest training run&lt;/p&gt;&lt;p&gt;(19:13) Total number of models per year&lt;/p&gt;&lt;p&gt;(20:06) Number of models near the largest training run&lt;/p&gt;&lt;p&gt;(20:57) Distribution of compute over AI models&lt;/p&gt;&lt;p&gt;(23:11) Hardware price-performance&lt;/p&gt;&lt;p&gt;(25:29) Training run duration&lt;/p&gt;&lt;p&gt;(26:44) Sensitivity analysis&lt;/p&gt;&lt;p&gt;(29:05) Limitations&lt;/p&gt;&lt;p&gt;(33:57) Results&lt;/p&gt;&lt;p&gt;(36:21) Conclusion&lt;/p&gt;&lt;p&gt;(38:34) Acknowledgements&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 16 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 30th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/model-counts-compute-thresholds?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/model-counts-compute-thresholds&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/model-counts-projections.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/model-counts-projections.png" alt="Number of notable AI models above 1026 FLOP under three scenarios of AI development" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/model-counts-compute-thresholds/model-counts-diagram-1.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/model-counts-compute-thresholds/model-counts-diagram-1.svg" alt="Figure 3: Overview of our projective model for AI model counts. The model takes six key inputs and combines them to project future model counts above different compute thresholds. The process flows from specifying the compute for the largest training run, through modeling the distribution of compute across all models, to finally sampling and counting models that exceed specific thresholds." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/investment-input-chart.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/investment-input-chart.png" alt="Hardware acquisition cost of the largest training run each year" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/total-model-count-input-chart.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/total-model-count-input-chart.png" alt="Total number of notable AI models each year" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/num-models-near-frontier-input-chart.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/num-models-near-frontier-input-chart.png" alt="Number of notable AI models near the frontier each year" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/training-compute-distribution.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/training-compute-distribution.png" alt="Relative distribution of training compute" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/model-counts-compute-thresholds/model-counts-diagram-2.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/model-counts-compute-thresholds/model-counts-diagram-2.svg" alt="Figure 8: Overview of how we truncate the empirical distribution of compute runs over different training runs, to model a certain density at the right end of the distribution. This density is determined by another key input to the model: the number of models near the largest training run." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/hardware-price-performance-projection.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/hardware-price-performance-projection.png" alt="Projection of realized hardware price-performance" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/training-run-time.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/training-run-time.png" alt="Projection of training run time" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/sensitivity-analysis-chart.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/sensitivity-analysis-chart.png" alt="Sensitivity of the 1026 FLOP model count to each input" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/model-counts-projections-reproduction.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/model-counts-projections-reproduction.png" alt="Number of notable AI models above 1026 FLOP under three scenarios of AI development" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/number-of-models-above-threshold-comparison.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/number-of-models-above-threshold-comparison.png" alt="Cumulative number of notable models above a compute threshold" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 30 May 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">e36ec019-93c6-45c7-b1c3-fd201f698326</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/e36ec019-93c6-45c7-b1c3-fd201f698326.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ben%2520Cottier%252C%2520David%2520Owen&amp;title=%22How%20many%20AI%20models%20will%20exceed%20compute%20thresholds%3F%22%20by%20Ben%20Cottier%2C%20David%20Owen&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fmodel-counts-compute-thresholds&amp;created_at=2026-05-18T17%3A09%3A42.401908%2B00%3A00&amp;duration=2383" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/model-counts-compute-thresholds</link>
      <itunes:duration>2383</itunes:duration>
    </item>
    <item>
      <title>“Is AI already superhuman on FrontierMath?” by Anson Ho</title>
      <description>&lt;p&gt; Subtitle: How do humans and AIs compare on FrontierMath? We ran a competition at MIT to put this to the test.&lt;/p&gt; 
&lt;p&gt; How well do humans perform on FrontierMath?&lt;/p&gt;
&lt;p&gt; This is a benchmark that we released last year, designed to test the limits of AI's math capabilities. It contains 300 questions that range in difficulty from upper-undergraduate level, to those that even Fields Medallists find challenging.&lt;/p&gt;
&lt;p&gt; To figure out a human baseline, we organized a competition at MIT, with around forty exceptional math undergrads and subject matter experts taking part. The participants were split into eight teams of four or five people, and given 4.5 hours to solve 23 questions with internet access.1 They were then pitted against the current state-of-the-art AI system on FrontierMath, namely o4-mini-medium.2&lt;/p&gt;
&lt;p&gt; The result? o4-mini-medium outperformed the average human team, but worse than the combined score across all teams, where we look at the fraction of problems solved by at least one team. So AIs aren’t yet unambiguously superhuman on FrontierMath – but I think they soon will be.&lt;/p&gt;
&lt;p&gt; Figure 1: o4-mini-medium scored 22% on the FrontierMath human baseline competition, outperforming the average team (19%) but falling short of the [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:21) 1. Subject matter expertise was underrepresented&lt;/p&gt;&lt;p&gt;(03:22) 2. The competition was more designed to reflect reasoning capabilities than broad knowledge&lt;/p&gt;&lt;p&gt;(05:30) 3. The definition of "human baseline" is somewhat ambiguous&lt;/p&gt;&lt;p&gt;(08:13) 4. AIs aren't yet superhuman on FrontierMath, but they probably soon will be&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 6 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 23rd, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/is-ai-already-superhuman-on-frontiermath?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/is-ai-already-superhuman-on-frontiermath&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/is-ai-already-superhuman-on-frontiermath/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/is-ai-already-superhuman-on-frontiermath/figure-1.png" alt="Figure 1: o4-mini-medium scored 22% on the FrontierMath human baseline competition, outperforming the average team (19%) but falling short of the combined score across all teams (35%). Note that o4-mini-medium only managed to solve problems that at least one human team had solved. Competition results can be found in this spreadsheet." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/is-ai-already-superhuman-on-frontiermath/figure-2.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/is-ai-already-superhuman-on-frontiermath/figure-2.jpg" alt="Figure 2: Graphical representation of the topics on the full FrontierMath benchmark, spanning a wide range of domains." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 23 May 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">44be21b9-0915-47a8-a6f0-f01a06ca4bc9</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/44be21b9-0915-47a8-a6f0-f01a06ca4bc9.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho&amp;title=%22Is%20AI%20already%20superhuman%20on%20FrontierMath%3F%22%20by%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fis-ai-already-superhuman-on-frontiermath&amp;created_at=2026-05-18T14%3A39%3A09.656705%2B00%3A00&amp;duration=617" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/is-ai-already-superhuman-on-frontiermath</link>
      <itunes:duration>617</itunes:duration>
    </item>
    <item>
      <title>“How fast can algorithms advance capabilities?” by Henry Josephson</title>
      <description>&lt;p&gt; Subtitle: This week's issue is a guest post by Henry Josephson, who is a research manager at UChicago's XLab and an AI governance intern at Google DeepMind.&lt;/p&gt;  &lt;p&gt; This week's issue is a guest post by Henry Josephson, who is a research manager at UChicago's XLab and an AI governance intern at Google DeepMind.&lt;/p&gt;
&lt;p&gt; In the AI 2027 scenario, the authors predict a fast takeoff of AI systems recursively self-improving until we have superintelligence in just a few years.&lt;/p&gt;
&lt;p&gt; Could this really happen? Whether it's possible may depend on if a software intelligence explosion — a series of rapid algorithmic advances that lead to greater AI capabilities — occurs.&lt;/p&gt;
&lt;p&gt; A key crux in the debate about the possibility of a software intelligence explosion comes down to whether key algorithmic improvements scale from small models to larger models. If the most important algorithmic advances need a large amount of compute to demonstrate their effectiveness, then we should think that a software-only intelligence explosion is less likely. And so a fast takeoff could be bottlenecked by compute constraints.&lt;/p&gt;
&lt;p&gt; In a recent preprint, my team at UChicago's XLab — Spencer Guo, Teddy Foley, Jack Sanderson, Anqi Qu, and [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:52) Are the best algorithmic improvements compute-dependent?&lt;/p&gt;&lt;p&gt;(07:49) Can Capabilities Advance With Frozen Compute? DeepSeek-V3&lt;/p&gt;&lt;p&gt;(08:55) What This Means for AI Progress&lt;/p&gt;&lt;p&gt;(12:59) Limitations&lt;/p&gt;&lt;p&gt;(14:20) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 7 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 16th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/how-fast-can-algorithms-advance-capabilities?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/how-fast-can-algorithms-advance-capabilities&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-fast-can-algorithms-advance-capabilities/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-fast-can-algorithms-advance-capabilities/figure-1.png" alt="Graph showing "Compute equivalent gain (CEG)" versus "Training compute (FLOP)" with two trend lines." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-fast-can-algorithms-advance-capabilities/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-fast-can-algorithms-advance-capabilities/figure-2.png" alt="Bar chart showing compute efficiency gains for various algorithmic innovations." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 16 May 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">4717260c-ddcc-429e-bc0b-583db7dd65b5</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/4717260c-ddcc-429e-bc0b-583db7dd65b5.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Henry%2520Josephson&amp;title=%22How%20fast%20can%20algorithms%20advance%20capabilities%3F%22%20by%20Henry%20Josephson&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fhow-fast-can-algorithms-advance-capabilities&amp;created_at=2026-05-18T14%3A39%3A10.611379%2B00%3A00&amp;duration=911" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/how-fast-can-algorithms-advance-capabilities</link>
      <itunes:duration>911</itunes:duration>
    </item>
    <item>
      <title>“How far can reasoning models scale?” by Josh You</title>
      <description>&lt;p&gt; Subtitle: Available evidence suggests that rapid growth in reasoning training can continue for a year or so.&lt;/p&gt;  &lt;p&gt; Reasoning models like OpenAI's o3 are less than a year old, but they’ve already seen rapid improvements on capabilities, and OpenAI researchers are very optimistic that this progress will continue.1 But it's not clear how much further the techniques used to train reasoning models can scale.&lt;/p&gt;
&lt;p&gt; After looking into the question, I think there is room to scale reasoning training further, but it's unlikely that OpenAI or other frontier AI developers can scale by many orders of magnitude.&lt;/p&gt;
&lt;p&gt; If reasoning training continues to scale at 10× every few months, in line with the jump from o1 to o3, it will reach the frontier of total training compute before long, perhaps within a year. At that point, the scaling rate will slow and converge with the overall growth rate in training compute of ~4× per year. Progress in reasoning models may slow down after this point as well.&lt;/p&gt;
&lt;p&gt; Figure 1: An illustration of a possible trajectory for reasoning compute growth, if scale-ups similar to the jump between o1 and o3 continue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; How much compute is used for frontier [...]&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:25) How much compute is used for frontier reasoning training?&lt;/p&gt;&lt;p&gt;(02:59) Scaling from o1 to o3&lt;/p&gt;&lt;p&gt;(04:17) Insights from DeepSeek-R1&lt;/p&gt;&lt;p&gt;(05:27) Insights from other reasoning models&lt;/p&gt;&lt;p&gt;(06:39) What can we conclude?&lt;/p&gt;&lt;p&gt;(09:35) What does reasoning compute scale mean for AI progress?&lt;/p&gt;&lt;p&gt;(11:24) Can reasoning actually scale?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 11 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 9th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/how-far-can-reasoning-models-scale?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/how-far-can-reasoning-models-scale&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-reasoning-models-scale/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-reasoning-models-scale/figure-1.png" alt="Figure 1: An illustration of a possible trajectory for reasoning compute growth, if scale-ups similar to the jump between o1 and o3 continue." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-reasoning-models-scale/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-reasoning-models-scale/figure-2.png" alt="Figure 2. Taken from OpenAI’s o3 livestream announcement (18:45). The presenters did not verbally clarify any details beyond what is shown in the graph." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-reasoning-models-scale/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-far-can-reasoning-models-scale/figure-3.png" alt="Figure 3. o1’s AIME performance vs training compute. Source: OpenAI" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 09 May 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">d0b46cd2-3d1c-480a-a10e-eace9aeb38a6</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/d0b46cd2-3d1c-480a-a10e-eace9aeb38a6.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Josh%2520You&amp;title=%22How%20far%20can%20reasoning%20models%20scale%3F%22%20by%20Josh%20You&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fhow-far-can-reasoning-models-scale&amp;created_at=2026-05-18T14%3A39%3A11.545048%2B00%3A00&amp;duration=833" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/how-far-can-reasoning-models-scale</link>
      <itunes:duration>833</itunes:duration>
    </item>
    <item>
      <title>“Where’s my ten minute AGI?” by Anson Ho</title>
      <description>&lt;p&gt; Subtitle: Why don't AIs automate more real-world tasks if they can handle 1-hour ones? Anson Ho explores key capability and context bottlenecks.&lt;/p&gt; 
&lt;p&gt; Recently, METR released a paper arguing that the length of tasks that AIs can do is doubling every 7 months.&lt;/p&gt;
&lt;p&gt; We can see this in the following graph, where the best AI system1 is able to do roughly hour-long tasks at a 50% success rate on average:&lt;/p&gt;
&lt;p&gt; METR's research finds that AIs are rapidly able to do longer and longer tasks, where length is measured by the time it takes for a human with requisite expertise to do the task.&lt;/p&gt;
&lt;p&gt; But there's a big problem here – if AIs are actually able to perform most tasks on 1-hour task horizons, why don’t we see more real-world task automation? For example, most emails take less than an hour to write, but crafting emails remains an important part of the lives of billions of people every day.&lt;/p&gt;
&lt;p&gt; Some of this could be due to people underusing AI systems,2 but in this post I want to focus on reasons that are more fundamental to the capabilities of AI systems. In particular, I think there are [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:48) 1. Time-horizon estimates are very domain-specific&lt;/p&gt;&lt;p&gt;(04:15) 2. Task reliability strongly influences task horizons&lt;/p&gt;&lt;p&gt;(07:19) 3. Real-world tasks are bundled together and hard to separate out&lt;/p&gt;&lt;p&gt;(09:56) Discussion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 9 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 2nd, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/where-is-my-ten-minute-agi?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/where-is-my-ten-minute-agi&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/where-is-my-ten-minute-agi/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/where-is-my-ten-minute-agi/figure-1.png" alt="METR’s research finds that AIs are rapidly able to do longer and longer tasks, where length is measured by the time it takes for a human with requisite expertise to do the task." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/where-is-my-ten-minute-agi/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/where-is-my-ten-minute-agi/figure-3.png" alt="Time-horizon estimates are heavily dependent on the tasks used to make measurements. For example, using chess as the relevant task would’ve led to absurd predictions about the time horizons of AI systems in the 1990s. For modern systems it predicts decade-long timescales, which is likely false but we don’t have empirical evidence for how humans perform on chess over these durations." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/where-is-my-ten-minute-agi/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/where-is-my-ten-minute-agi/figure-2.png" alt="Time horizons were estimated by gathering data on human task completion times and model success rates, and doing curve-fitting. It does not represent the minimum of the human completion times across all tasks." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/where-is-my-ten-minute-agi/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/where-is-my-ten-minute-agi/figure-4.png" alt="Ajeya Cotra tweets: "Neatly encapsulated benchmark-style tasks (RE-Bench, SWE-Bench, Cybench) rarely come up. It's not worth encapsulating real-world tasks into that format, and you can't yet easily delegate the encapsulation itself to AIs."" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 02 May 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">83ef5a5b-a93f-4031-a9cf-e5f37c49b462</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/83ef5a5b-a93f-4031-a9cf-e5f37c49b462.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho&amp;title=%22Where%E2%80%99s%20my%20ten%20minute%20AGI%3F%22%20by%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fwhere-is-my-ten-minute-agi&amp;created_at=2026-05-18T14%3A39%3A13.082383%2B00%3A00&amp;duration=775" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/where-is-my-ten-minute-agi</link>
      <itunes:duration>775</itunes:duration>
    </item>
    <item>
      <title>“The case for multi-decade AI timelines” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: In this Gradient Updates weekly issue, Ege discusses the case for multi-decade AI timelines.&lt;/p&gt;  &lt;p&gt; The date at which transformative AI capabilities will be reached is among the most discussed questions about AI. Opinions vary widely, with industry insiders typically expecting far faster progress than external observers. For instance, Dario Amodei thinks there might be only 2 to 3 years left until AI surpasses “almost all humans at almost everything”, while economists such as William Nordhaus still believe we might have more than 100 years left.&lt;/p&gt;
&lt;p&gt; Compared to most people in the world, my own median timelines of ~ 20 years until full automation of remote work would be considered quite aggressive. However, most people in the field of AI (and even many others at Epoch) have much shorter timelines than this, and timelines on the order of 1 to 10 years, as seen in the recent AI 2027 report, are often seen as a “default position” that one has to present arguments against. In this issue, I’ll elaborate on the key reasons behind my relatively bearish views. I’ll first explain why I find some common short timelines arguments unconvincing, then elaborate on how I arrive at [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:10) Trend extrapolations don't point towards short timelines&lt;/p&gt;&lt;p&gt;(08:39) A software singularity is unlikely&lt;/p&gt;&lt;p&gt;(12:15) AI agents will need a lot of compute to automate all remote work&lt;/p&gt;&lt;p&gt;(18:17) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 26th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/the-case-for-multi-decade-ai-timelines?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/the-case-for-multi-decade-ai-timelines&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-case-for-multi-decade-ai-timelines/nvidia-datacenter-revenue-projections.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-case-for-multi-decade-ai-timelines/nvidia-datacenter-revenue-projections.png" alt="Graph showing "Projections of NVIDIA datacenter revenue under different models" with three forecast lines." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Sat, 26 Apr 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">6b909edb-d585-4d01-8712-fc76917e55c9</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/6b909edb-d585-4d01-8712-fc76917e55c9.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22The%20case%20for%20multi-decade%20AI%20timelines%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fthe-case-for-multi-decade-ai-timelines&amp;created_at=2026-05-18T14%3A39%3A13.757029%2B00%3A00&amp;duration=1149" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/the-case-for-multi-decade-ai-timelines</link>
      <itunes:duration>1149</itunes:duration>
    </item>
    <item>
      <title>“Trends in AI supercomputers” by Konstantin F. Pilz, Robi Rahman, James Sanders, Lennart Heim</title>
      <description>&lt;p&gt; Subtitle: AI supercomputers double in performance every 9 months, cost billions of dollars, and require as much power as mid-sized cities. Companies now own 80% of all AI supercomputers, while governments’ share has declined.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Frontier AI development relies on powerful AI supercomputers. To train AI models with exponentially more compute, companies have developed massive systems like xAI's Colossus, that contain up to 200,000 specialized AI chips, cost billions of dollars to build, and require hundreds of MW of power—equivalent to a medium-sized city.&lt;/p&gt;&lt;p&gt; However, public data on these systems is limited.&lt;/p&gt;&lt;p&gt; We curated a dataset of over 500 AI supercomputers (sometimes called GPU clusters or AI data centers) from 2019 to 2025 and analyzed key trends in performance, power needs, hardware cost, and ownership. We found:&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; Computational performance grew 2.5x per year, driven by using more and better chips in the leading AI supercomputers.&lt;/li&gt;
&lt;li&gt; Power requirements and hardware costs doubled every year. If current trends continue, the largest AI supercomputer in 2030 would cost hundreds of billions of dollars and require 9 gigawatts of power.&lt;/li&gt;
&lt;li&gt; The rapid growth in AI supercomputers coincided with a shift to private ownership. In our dataset, industry owned about [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:29) Introduction&lt;/p&gt;&lt;p&gt;(02:10) Computational performance, energy, and cost trends&lt;/p&gt;&lt;p&gt;(04:37) Locations and public/private sector share of AI supercomputers&lt;/p&gt;&lt;p&gt;(06:31) Our dataset&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 23rd, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/trends-in-ai-supercomputers?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/trends-in-ai-supercomputers&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/supercomputers-performance-2019.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/supercomputers-performance-2019.png" alt="The performance of leading AI supercomputers has doubled every 9 months" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/supercomputers-cost-2019.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/supercomputers-cost-2019.png" alt="The hardware cost of leading AI supercomputers has doubled every year" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/supercomputers-performance-share-by-sector.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/supercomputers-performance-share-by-sector.png" alt="Companies have rapidly increased their share of AI supercomputer ownership" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/supercomputers-performance-share-by-country.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/supercomputers-performance-share-by-country.png" alt="The United States leads in total computational performance, followed by China" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 23 Apr 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">88ec7d76-3b5c-4ed4-bc36-8de04607694e</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/88ec7d76-3b5c-4ed4-bc36-8de04607694e.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Konstantin%2520F.%2520Pilz%252C%2520Robi%2520Rahman%252C%2520James%2520Sanders%252C%2520Lennart%2520Heim&amp;title=%22Trends%20in%20AI%20supercomputers%22%20by%20Konstantin%20F.%20Pilz%2C%20Robi%20Rahman%2C%20James%20Sanders%2C%20Lennart%20Heim&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Ftrends-in-ai-supercomputers&amp;created_at=2026-05-18T17%3A09%3A46.224416%2B00%3A00&amp;duration=416" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/trends-in-ai-supercomputers</link>
      <itunes:duration>416</itunes:duration>
    </item>
    <item>
      <title>“The real reason AI benchmarks haven’t reflected economic impacts” by Anson Ho, Jean-Stanislas Denain</title>
      <description>&lt;p&gt; Subtitle: The real reason that AI benchmarks haven’t reflected real-world impacts historically is that they weren’t optimized for this, not because of fundamental limitations – but this might be changing.&lt;/p&gt;  &lt;p&gt; Figure 1: This graph demonstrates rapid AI progress across key benchmarks, which have been useful indicators for driving capabilities forward. However, for most of this period, benchmark realism was not a priority. This explains why high benchmark scores often provide limited insight into AI systems’ real-world impact.&lt;/p&gt;
&lt;p&gt; Back in the prehistoric days of March 2023, OpenAI released GPT-4, and its benchmark results raised a lot of questions and speculation about the future of the legal profession.&lt;/p&gt;
&lt;p&gt; In response to these concerns, Naryanan and Kapoor wrote a blog post pointing out that this is an instance of a more general problem, where AI benchmarks fail to reflect the complexities of the real world. And if benchmarks fail at this, it can lead to misleading conclusions about the current and future impacts of AI systems.&lt;/p&gt;
&lt;p&gt; This is undoubtedly true, but it seems to only give a partial answer, because it doesn’t tell us why existing benchmarks don’t capture the complexities of the real world. For example, surely [...]&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          March 28th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/the-real-reason-ai-benchmarks-havent-reflected-economic-impacts?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/the-real-reason-ai-benchmarks-havent-reflected-economic-impacts&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-real-reason-ai-benchmarks-havent-reflected-economic-impacts/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-real-reason-ai-benchmarks-havent-reflected-economic-impacts/figure-1.png" alt="Figure 1: This graph demonstrates rapid AI progress across key benchmarks, which have been useful indicators for driving capabilities forward. However, for most of this period, benchmark realism was not a priority. This explains why high benchmark scores often provide limited insight into AI systems’ real-world impact." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/the-real-reason-ai-benchmarks-havent-reflected-economic-impacts/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/the-real-reason-ai-benchmarks-havent-reflected-economic-impacts/figure-2.png" alt="Ofir Press tweets: "At NeurIPS 2023 I was trying to get people to run SWE-bench but almost everyone wasn't interested, they said it was too hard. At NeurIPS 2024 everyone talked about SWE-bench. I hope that at NeurIPS 2025 no one will talk about SWE-bench because that initial version will be solved."" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 28 Mar 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">3986fdcc-195b-4029-a382-416cee050e02</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/3986fdcc-195b-4029-a382-416cee050e02.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho%252C%2520Jean-Stanislas%2520Denain&amp;title=%22The%20real%20reason%20AI%20benchmarks%20haven%E2%80%99t%20reflected%20economic%20impacts%22%20by%20Anson%20Ho%2C%20Jean-Stanislas%20Denain&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fthe-real-reason-ai-benchmarks-havent-reflected-economic-impacts&amp;created_at=2026-05-18T14%3A41%3A19.40446%2B00%3A00&amp;duration=480" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/the-real-reason-ai-benchmarks-havent-reflected-economic-impacts</link>
      <itunes:duration>480</itunes:duration>
    </item>
    <item>
      <title>“GATE: Modeling the trajectory of AI and automation” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: We introduce a compute-centric model of AI automation and its economic effects, illustrating key dynamics of AI development. The model suggests large AI investments and subsequent economic growth.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The rapid progress and adoption of large language models in recent years have sparked extensive discussions about how artificial intelligence (AI) will shape the future of our economy. Central to these discussions are questions about whether AI will substantially accelerate economic growth, how much investment AI will attract, the timing and scale of these investments, and the speed at which automation will transform labor markets.&lt;/p&gt;&lt;p&gt; To advance this discussion, we introduce the Growth and AI Transition Endogenous (GATE) model. GATE brings together concepts from machine learning and economic growth theory to illustrate the key dynamics of AI development, task automation and their downstream macroeconomic effects. It draws heavily on scaling laws—empirical regularities relating compute scaling to performance for both training and inference—and semi-endogenous growth—a theory that explains economic growth as a result of R&amp;amp;D efforts that generate scientific advances. You can find a technical description of the model in our whitepaper.&lt;/p&gt;&lt;p&gt; Alongside our paper, we are releasing an interactive model. This tool lets you simulate a variety of [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:27) Introduction&lt;/p&gt;&lt;p&gt;(02:17) About GATE&lt;/p&gt;&lt;p&gt;(05:29) Preliminary insights&lt;/p&gt;&lt;p&gt;(09:16) Conclusion and next steps&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          March 21st, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/announcing-gate?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/announcing-gate&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/announcing-gate/gate-playground.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/announcing-gate/gate-playground.png" alt="Graph showing "Largest training run" with Physical FLOP over time, displaying AI automation scenarios from 2020 to 2045." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/announcing-gate/gate-overview.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/announcing-gate/gate-overview.png" alt="Diagram showing feedback loop between compute, automation, and production in AI development." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/announcing-gate/gate-investment-and-value.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/announcing-gate/gate-investment-and-value.png" alt="Graph showing AI investment and value generated as fraction of output over time from 2025 to 2045." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/announcing-gate/gate-largest-training-run.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/announcing-gate/gate-largest-training-run.png" alt="Graph showing physical FLOP trends in AI training runs from 2020 to 2045." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/announcing-gate/gate-gwp.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/announcing-gate/gate-gwp.png" alt="Graph showing Gross World Product projections under different AI automation scenarios from 2025 to 2045." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 21 Mar 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">f85ae008-ab55-4448-9eed-96f4ad260568</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/f85ae008-ab55-4448-9eed-96f4ad260568.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22GATE%3A%20Modeling%20the%20trajectory%20of%20AI%20and%20automation%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fannouncing-gate&amp;created_at=2026-05-18T17%3A09%3A47.605284%2B00%3A00&amp;duration=633" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/announcing-gate</link>
      <itunes:duration>633</itunes:duration>
    </item>
    <item>
      <title>“Most AI value will come from broad automation, not from R&amp;D” by Ege Erdil, Matthew Barnett</title>
      <description>&lt;p&gt; Subtitle: AI's biggest impact will come from broad labor automation—not R&amp;D—driving economic growth through scale, not scientific breakthroughs.&lt;/p&gt; 
&lt;p&gt; A popular view about the future impact of AI on the economy is that it will be primarily mediated through AI automation of R&amp;amp;D. In some form or another, this view has been expressed by many influential figures in the industry:&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; 
&lt;p&gt; In his essay “Machines of Loving Grace”, Dario Amodei lists five ways in which AI can benefit humanity in a scenario where AI goes well. He considers biology R&amp;amp;D, neuroscience R&amp;amp;D, and economics R&amp;amp;D as three of these ways. There's no point at which he clearly argues that AI will lead to high rates of economic growth due to being broadly deployed throughout the economy as opposed to speeding up R&amp;amp;D and perhaps improving economic governance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt; 
&lt;p&gt; Demis Hassabis, CEO of DeepMind, is also bullish on R&amp;amp;D as the main channel through which AI will benefit society. In a recent interview, he provides specific mechanisms through which this could happen: AI could cure all diseases and “solve energy”. He mentions “radical abundance” as a possibility as well, but beyond the R&amp;amp;D channel doesn’t name [...]&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:44) The primary economic impact of AI will be its ability to broadly automate labor&lt;/p&gt;&lt;p&gt;(09:56) Automating AI R&amp;amp;D alone likely won't dramatically accelerate AI progress&lt;/p&gt;&lt;p&gt;(14:29) Fully automating R&amp;amp;D requires a very broad set of abilities&lt;/p&gt;&lt;p&gt;(19:11) AI takeoff will likely be diffuse and salient&lt;/p&gt;&lt;p&gt;(21:44) Key takeaways&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          March 21st, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/most-ai-value-will-come-from-broad-automation-not-from-r-d?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/most-ai-value-will-come-from-broad-automation-not-from-r-d&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/most-ai-value-will-come-from-broad-automation-not-from-r-d/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/most-ai-value-will-come-from-broad-automation-not-from-r-d/figure-1.png" alt="A stacked bar chart showing US GDP growth and labor productivity contributions from 1988-2019." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/most-ai-value-will-come-from-broad-automation-not-from-r-d/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/most-ai-value-will-come-from-broad-automation-not-from-r-d/figure-2.png" alt="A bar graph showing task requirements to perform tasks for 12 common R&amp;amp;D professions." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 21 Mar 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">335b8b74-4704-4722-beac-3aaff824f13d</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/335b8b74-4704-4722-beac-3aaff824f13d.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil%252C%2520Matthew%2520Barnett&amp;title=%22Most%20AI%20value%20will%20come%20from%20broad%20automation%2C%20not%20from%20R%26D%22%20by%20Ege%20Erdil%2C%20Matthew%20Barnett&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fmost-ai-value-will-come-from-broad-automation-not-from-r-d&amp;created_at=2026-05-18T14%3A41%3A22.662871%2B00%3A00&amp;duration=1469" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/most-ai-value-will-come-from-broad-automation-not-from-r-d</link>
      <itunes:duration>1469</itunes:duration>
    </item>
    <item>
      <title>“Train once, deploy many: AI and increasing returns” by Ege Erdil, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: AI's “train-once-deploy-many” advantage yields increasing returns: doubling compute more than doubles output by increasing models' inference efficiency and enabling more deployed inference instances.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; A significant advantage of AI models over human intelligence is the ability to train a model once and then serve arbitrarily many copies of it for inference. This ‘train-once-deploy-many’ property means we can justify spending far more resources to train a single AI model than we could ever spend training a single human (something that AI labs have recently started doing). For example, it's common for frontier models to be trained on tens of thousands of GPUs, yet each instance during inference requires only a few dozen GPUs.&lt;/p&gt;&lt;p&gt; This difference suggests that AI systems exhibit increasing returns to scale when we add more compute for training and inference. If we set aside price effects for the moment, then with twice the compute, we can double economic output simply by running twice as many copies of the models we’re using for inference. In addition, we can use the same extra compute to train models using twice the training compute, and we expect these larger models to be more efficient at converting inference compute into [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:28) Introduction&lt;/p&gt;&lt;p&gt;(03:59) The basic argument&lt;/p&gt;&lt;p&gt;(06:02) Doesn't this assume an infinite span for the tradeoff?&lt;/p&gt;&lt;p&gt;(07:35) What does this imply about an AI-only economy?&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          March 7th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/train-once-deploy-many-ai-and-increasing-returns?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/train-once-deploy-many-ai-and-increasing-returns&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/compute-stock-chart.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/compute-stock-chart.png" alt="Compute stock over time" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 07 Mar 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">33011ebe-ccb7-4431-806d-8958f5463a63</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/33011ebe-ccb7-4431-806d-8958f5463a63.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil%252C%2520Tamay%2520Besiroglu&amp;title=%22Train%20once%2C%20deploy%20many%3A%20AI%20and%20increasing%20returns%22%20by%20Ege%20Erdil%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Ftrain-once-deploy-many-ai-and-increasing-returns&amp;created_at=2026-05-18T17%3A09%3A47.774752%2B00%3A00&amp;duration=589" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/train-once-deploy-many-ai-and-increasing-returns</link>
      <itunes:duration>589</itunes:duration>
    </item>
    <item>
      <title>“What AI can currently do is not the story” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: Forecasting AI progress requires more than extrapolating current capabilities; understanding fundamental task difficulty is key to predicting future breakthroughs.&lt;/p&gt;  &lt;p&gt; When trying to forecast future capabilities of AI systems and the economic and social impacts these capabilities will have, there are two different common methods that people use:&lt;/p&gt;
&lt;ol&gt; 
&lt;li&gt; 
&lt;p&gt; Look at past AI capabilities along with how fast they’ve changed and try to extrapolate that knowledge to the future.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt; 
&lt;p&gt; Use first principles reasoning based on the capabilities and resource use of the human brain, the availability of training data across different domains, how expensive it is to get reward signals on different tasks, etc. to estimate the difficulty of automating tasks.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt; There are further details about how each method may be used in practice, but they represent two fundamentally different ways of forecasting AI capabilities. The first method is often preferred by economists: for instance, Robin Hanson used a variant of it in 2012 by asking AI experts how much progress towards human-level capabilities we had made over the past 20 years and extrapolated their answer to reach human-level AI timelines of a century or longer.&lt;/p&gt;
&lt;p&gt; People following this [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(05:27) The perils of extrapolation&lt;/p&gt;&lt;p&gt;(10:34) What is the alternative?&lt;/p&gt;&lt;p&gt;(15:07) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          March 7th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/what-ai-can-currently-do-is-not-the-story?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/what-ai-can-currently-do-is-not-the-story&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/what-ai-can-currently-do-is-not-the-story/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/what-ai-can-currently-do-is-not-the-story/figure-1.png" alt="Figure 1: A table from a 2016 Harvard Business Review article by Andrew Ng about what AI systems of the time could and couldn’t do." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/what-ai-can-currently-do-is-not-the-story/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/what-ai-can-currently-do-is-not-the-story/figure-2.png" alt="Graph showing "Reasoning models exceed the historical trend of math performance" with Mock AIME pass@1 accuracy over time." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 07 Mar 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">f9907414-0bbd-4a2a-a771-51c6694a1549</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/f9907414-0bbd-4a2a-a771-51c6694a1549.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22What%20AI%20can%20currently%20do%20is%20not%20the%20story%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fwhat-ai-can-currently-do-is-not-the-story&amp;created_at=2026-05-18T14%3A41%3A23.663402%2B00%3A00&amp;duration=979" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/what-ai-can-currently-do-is-not-the-story</link>
      <itunes:duration>979</itunes:duration>
    </item>
    <item>
      <title>“The promise of reasoning models” by Matthew Barnett</title>
      <description>&lt;p&gt; Subtitle: AI reasoning models will achieve superhuman performance in math and coding, yet their economic applications will lag behind, limiting real-world impact.&lt;/p&gt;  &lt;p&gt; Perhaps the most significant AI development of the past year has been the rise of reasoning models—LLMs trained via reinforcement learning to solve complex problems, such as OpenAI's o1, DeepSeek-R1, and Claude 3.7 Sonnet. These models have already demonstrated remarkable success, significantly enhancing AI capabilities in mathematical problem-solving, scientific reasoning, and coding.&lt;/p&gt;
&lt;p&gt; In this article, I aim to present a clear conceptual framework for understanding the impacts reasoning models may have on the world. My core thesis is that the primary consequence of reasoning models will be the creation of AIs that are narrowly superhuman at “pure reasoning tasks”—abstract tasks with correct answers that can be cheaply verified. For example, I would guess that in the next three years, AIs will likely be developed that are capable of outperforming top human mathematicians at proving arbitrary mathematical theorems. At the same time, I predict that economically valuable AI capabilities will lag behind, with reliable computer-control agents arriving significantly later than high-quality reasoning models.&lt;/p&gt;
&lt;p&gt; I also address some broader speculation about the downstream implications of reasoning [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:48) A brief primer on reasoning models&lt;/p&gt;&lt;p&gt;(06:51) What I think reasoning models will be able to do&lt;/p&gt;&lt;p&gt;(12:14) What I suspect AI labs will struggle with in the near term&lt;/p&gt;&lt;p&gt;(15:44) Will reasoning models upset the business model of AI labs?&lt;/p&gt;&lt;p&gt;(21:31) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 28th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/the-promise-of-reasoning-models?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/the-promise-of-reasoning-models&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Fri, 28 Feb 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">856d23f0-2342-4457-b51f-690874127b7e</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/856d23f0-2342-4457-b51f-690874127b7e.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Matthew%2520Barnett&amp;title=%22The%20promise%20of%20reasoning%20models%22%20by%20Matthew%20Barnett&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fthe-promise-of-reasoning-models&amp;created_at=2026-05-18T14%3A41%3A24.784334%2B00%3A00&amp;duration=1413" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/the-promise-of-reasoning-models</link>
      <itunes:duration>1413</itunes:duration>
    </item>
    <item>
      <title>“AI progress is about to speed up” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: AI progress is accelerating, with next-gen models surpassing GPT-4 in compute power, driving major leaps in reasoning, coding, and math capabilities.&lt;/p&gt;  &lt;p&gt; AI is a field where progress happens remarkably quickly compared to the standards of other industries. Even in the past two years, we’ve seen impressive capability gains and cost reductions over models such as GPT-4 that would have been unprecedented in almost any other domain. However, there's a general sense among observers that progress has been slower than they’ve expected since GPT-4 was released.&lt;/p&gt;
&lt;p&gt; I think this is mostly because compute growth has been slow since then, so we’ve been seeing gains from algorithmic progress and improved data quality rather than large compute scale-ups. The chart below from our article on the compute cost of training frontier models shows this clearly:&lt;/p&gt;

&lt;p&gt; The release of GPT-4 in March 2023 stands out because GPT-4 represented a 10x compute scale-up over the models we had seen before. Since then, we’ve not seen another scale-up of this magnitude: all currently available frontier models, with the exception of Grok 3, have been trained on a compute budget similar to GPT-4 or less. For instance, Dario Amodei, CEO of Anthropic [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:06) What to expect from the new models&lt;/p&gt;&lt;p&gt;(06:15) Programming&lt;/p&gt;&lt;p&gt;(07:20) Math&lt;/p&gt;&lt;p&gt;(08:07) Agents&lt;/p&gt;&lt;p&gt;(10:28) What should we make of Grok 3?&lt;/p&gt;&lt;p&gt;(13:14) Why has spending not grown faster before?&lt;/p&gt;&lt;p&gt;(16:14) Concluding thoughts&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 21st, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/ai-progress-is-about-to-speed-up?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/ai-progress-is-about-to-speed-up&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/ai-progress-is-about-to-speed-up/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/ai-progress-is-about-to-speed-up/figure-1.png" alt="Scatter plot showing training compute versus publication date for 179 large-scale AI models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 21 Feb 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">268af41d-cb9e-4be7-8294-f22a8b0aa5aa</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/268af41d-cb9e-4be7-8294-f22a8b0aa5aa.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22AI%20progress%20is%20about%20to%20speed%20up%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fai-progress-is-about-to-speed-up&amp;created_at=2026-05-18T14%3A41%3A25.526144%2B00%3A00&amp;duration=1099" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/ai-progress-is-about-to-speed-up</link>
      <itunes:duration>1099</itunes:duration>
    </item>
    <item>
      <title>“Algorithmic progress likely spurs more spending on compute, not less” by Matthew Barnett</title>
      <description>&lt;p&gt; Subtitle: Algorithmic progress in AI may not reduce compute spending—instead, it could drive higher investment as efficiency unlocks new opportunities.&lt;/p&gt;  &lt;p&gt; In recent weeks, there has been widespread speculation about the economic implications of algorithmic progress—improvements to machine learning methods that allow us to develop and deploy models using fewer resources. Many have suggested that algorithmic progress, as observed in DeepSeek's training of V3 and R1, will reduce demand for high-performance GPUs going forward. Their argument is that it enables AI labs to “do more with less”, and build AI products without needing as much compute.&lt;/p&gt;
&lt;p&gt; Here, I argue the opposite: rather than decreasing overall spending, algorithmic progress is likely to increase AI compute spending, both in inflation-adjusted terms and in terms of the share of GDP spent on compute. This is particularly true for forms of algorithmic progress that allow AI labs to improve frontier performance at the same time they can increase computational efficiency, which is largely true for DeepSeek's recent innovations.&lt;/p&gt;
&lt;p&gt; I approach this argument from an empirical perspective. Most directly, data from machine learning trends indicates that algorithmic progress has coincided with a rapidly increasing rate of investment in computers used for training [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:57) Introduction&lt;/p&gt;&lt;p&gt;(08:00) The empirical data&lt;/p&gt;&lt;p&gt;(15:57) The case for AI as an unusual computing product&lt;/p&gt;&lt;p&gt;(21:30) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 14th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/algorithmic-progress-likely-spurs-more-spending-on-compute-not-less?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/algorithmic-progress-likely-spurs-more-spending-on-compute-not-less&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/algorithmic-progress-likely-spurs-more-spending-on-compute-not-less/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/algorithmic-progress-likely-spurs-more-spending-on-compute-not-less/figure-1.png" alt="Graph showing pretraining algorithmic progress estimates across software domains with compute doubling times." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/algorithmic-progress-likely-spurs-more-spending-on-compute-not-less/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/algorithmic-progress-likely-spurs-more-spending-on-compute-not-less/figure-2.png" alt="Line graph titled "Fraction of US GDP spent on computers and peripheral equipment" showing data from 1959 to 2024." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/algorithmic-progress-likely-spurs-more-spending-on-compute-not-less/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/algorithmic-progress-likely-spurs-more-spending-on-compute-not-less/figure-3.png" alt="Line graph showing share of US households using cellular phones from 1994 to 2019." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/algorithmic-progress-likely-spurs-more-spending-on-compute-not-less/figure-4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/algorithmic-progress-likely-spurs-more-spending-on-compute-not-less/figure-4.png" alt="Graph showing effective compute growth from algorithmic progress versus compute scaling, relative to 2014." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 14 Feb 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">8548795d-668e-4d41-9a92-3afa4096aa7f</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/8548795d-668e-4d41-9a92-3afa4096aa7f.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Matthew%2520Barnett&amp;title=%22Algorithmic%20progress%20likely%20spurs%20more%20spending%20on%20compute%2C%20not%20less%22%20by%20Matthew%20Barnett&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Falgorithmic-progress-likely-spurs-more-spending-on-compute-not-less&amp;created_at=2026-05-18T14%3A41%3A26.566255%2B00%3A00&amp;duration=1402" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/algorithmic-progress-likely-spurs-more-spending-on-compute-not-less</link>
      <itunes:duration>1402</itunes:duration>
    </item>
    <item>
      <title>“A more systematic and transparent AI benchmarking hub” by Tom Adamczewski</title>
      <description>&lt;p&gt; Subtitle: We've overhauled our AI benchmarking infrastructure to provide more transparent, systematic, and up-to-date evaluations of AI model capabilities.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Back in November, we released the Epoch AI Benchmarking Hub. This platform hosts the results of evaluations of notable AI models conducted by Epoch AI, including visualizations and additional analysis. Its goal is to shed light on what today's AI systems are capable of—and where they are headed. Data on the platform is publicly accessible under a permissive license, allowing other members of the research community to use it for their own analyses.&lt;/p&gt;&lt;p&gt; Today, we are publishing a major update of the Benchmarking Hub. Our visualisations still look very similar, but we have completely overhauled the process by which we run AI benchmarks and share results. We’ve made significant engineering investments in our infrastructure that allow us to be more transparent, systematic, and up-to-date.&lt;/p&gt;&lt;p&gt; The most noticeable changes for you as a user are:&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; You have access to richer data about each evaluation and the model being evaluated&lt;/li&gt;
&lt;li&gt; The database will be much more frequently updated&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt; Key features of the AI Benchmarking Hub&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Our database fills a gap in the publicly available data about [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:23) Introduction&lt;/p&gt;&lt;p&gt;(01:26) Key features of the AI Benchmarking Hub&lt;/p&gt;&lt;p&gt;(03:38) The Epoch AI client library&lt;/p&gt;&lt;p&gt;(04:26) Evaluation platform&lt;/p&gt;&lt;p&gt;(04:54) Next steps&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 7th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/benchmarking-hub-update?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/benchmarking-hub-update&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/benchmarking-hub-update/SCR-20250206-qzws.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/benchmarking-hub-update/SCR-20250206-qzws.png" alt="The Inspect log viewer showing a sample from the evaluation of DeepSeek R1 on GPQA Diamond. To mitigate the risk of accidental leakage into LLM training corpora, bots are prevented from accessing the log viewer, so you will need to solve a CAPTCHA to access it." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/benchmarking-hub-update/SCR-20250206-qrkf.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/benchmarking-hub-update/SCR-20250206-qrkf.png" alt="Output from the library’s example script." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 07 Feb 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">4c970def-9925-488e-84e5-ad4ca9b69e45</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/4c970def-9925-488e-84e5-ad4ca9b69e45.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Tom%2520Adamczewski&amp;title=%22A%20more%20systematic%20and%20transparent%20AI%20benchmarking%20hub%22%20by%20Tom%20Adamczewski&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fbenchmarking-hub-update&amp;created_at=2026-05-18T17%3A09%3A49.110285%2B00%3A00&amp;duration=324" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/benchmarking-hub-update</link>
      <itunes:duration>324</itunes:duration>
    </item>
    <item>
      <title>“How much energy does ChatGPT use?” by Josh You</title>
      <description>&lt;p&gt; Subtitle: This Gradient Updates issue explores how much energy ChatGPT uses per query, revealing it's 10x less than common estimates.&lt;/p&gt;  &lt;p&gt; Credit to Alex Erben and Ege Erdil for substantial help with research and calculations. In this issue, “we” refers to our collective judgment.&lt;/p&gt;
&lt;p&gt; A commonly-cited claim is that powering an individual ChatGPT query requires around 3 watt-hours of electricity, or 10 times as much as a Google search.1 This is often brought up to express concern over AI's impact on the environment, climate change, or the electric grid.&lt;/p&gt;
&lt;p&gt; However, we believe that this figure of 3 watt-hours per query is likely an overestimate. In this issue, we revisit this question using a similar methodology, but using up-to-date facts and clearer assumptions. We find that typical ChatGPT queries using GPT-4o likely consume roughly 0.3 watt-hours, which is ten times less than the older estimate. This difference comes from more efficient models and hardware compared to early 2023, and an overly pessimistic estimate of token counts in the original estimate.&lt;/p&gt;
&lt;p&gt; For context, 0.3 watt-hours is less than the amount of electricity that an LED lightbulb or a laptop consumes in a few minutes. And even for a [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:36) Estimating the energy cost of a query&lt;/p&gt;&lt;p&gt;(08:33) Why is our estimate different from others?&lt;/p&gt;&lt;p&gt;(10:33) What about other models besides GPT-4o?&lt;/p&gt;&lt;p&gt;(14:55) Training and other upstream costs&lt;/p&gt;&lt;p&gt;(18:20) Discussion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 23 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 7th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/how-much-energy-does-chatgpt-use?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/how-much-energy-does-chatgpt-use&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-much-energy-does-chatgpt-use/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-much-energy-does-chatgpt-use/figure-1.png" alt="Bar chart showing energy consumption per ChatGPT query compared to household electricity use." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-much-energy-does-chatgpt-use/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-much-energy-does-chatgpt-use/figure-1.png" alt="Bar chart showing energy consumption per ChatGPT query compared to household electricity use." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 07 Feb 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">1b23142d-8dc0-44ab-aa7a-736540049b42</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/1b23142d-8dc0-44ab-aa7a-736540049b42.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Josh%2520You&amp;title=%22How%20much%20energy%20does%20ChatGPT%20use%3F%22%20by%20Josh%20You&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fhow-much-energy-does-chatgpt-use&amp;created_at=2026-05-18T14%3A41%3A27.520206%2B00%3A00&amp;duration=1255" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/how-much-energy-does-chatgpt-use</link>
      <itunes:duration>1255</itunes:duration>
    </item>
    <item>
      <title>“What went into training DeepSeek-R1?” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: This Gradient Updates issue explores DeepSeek-R1's architecture, training cost, and pricing, showing how it rivals OpenAI's o1 at 30x lower cost.&lt;/p&gt;  &lt;p&gt; On January 20th, 2025, DeepSeek released their latest open-weights reasoning model, DeepSeek-R1, which is on par with OpenAI's o1 in benchmark performance. The release has generated a significant amount of controversy, most notably about the possibility that DeepSeek might have underreported or misrepresented the training cost of their model. I find this claim implausible for reasons that I will explore in this issue.&lt;/p&gt;
&lt;p&gt; Aside from the point about the model's training cost, I also want to clarify what we actually know about the model's architecture, training process, performance, and pricing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; Architecture&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; DeepSeek R1's architecture is identical to DeepSeek v3, an earlier model that the company released in December 2024. I covered the key architectural details of this model in a Gradient Updates issue from two weeks ago, so I will only provide a brief high-level summary here.&lt;/p&gt;&lt;p&gt; Overall, the model is a very sparse mixture-of-experts, with 671 billion total parameters but only 37 billion active per token. The experts are divided into two classes: one “shared expert” which every token is always routed to [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:01) Architecture&lt;/p&gt;&lt;p&gt;(04:13) Training&lt;/p&gt;&lt;p&gt;(04:27) Pre-training&lt;/p&gt;&lt;p&gt;(10:20) RL training for R1-Zero&lt;/p&gt;&lt;p&gt;(17:20) Subsequent training for R1&lt;/p&gt;&lt;p&gt;(19:50) Performance and pricing&lt;/p&gt;&lt;p&gt;(21:29) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 31st, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/what-went-into-training-deepseek-r1?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/what-went-into-training-deepseek-r1&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/what-went-into-training-deepseek-r1/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/what-went-into-training-deepseek-r1/figure-1.png" alt="Figure 1: Results of the ablation experiments for attention mechanisms from the DeepSeek v2 paper." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/what-went-into-training-deepseek-r1/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/what-went-into-training-deepseek-r1/figure-2.png" alt="Figure 2: Table comparing DeepSeek-R1 and other representative models. From the DeepSeek R1 technical report." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 31 Jan 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">66eed446-55d1-425a-a685-82fc35a9e912</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/66eed446-55d1-425a-a685-82fc35a9e912.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22What%20went%20into%20training%20DeepSeek-R1%3F%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fwhat-went-into-training-deepseek-r1&amp;created_at=2026-05-18T14%3A57%3A35.008011%2B00%3A00&amp;duration=1402" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/what-went-into-training-deepseek-r1</link>
      <itunes:duration>1402</itunes:duration>
    </item>
    <item>
      <title>“Announcing our expanded biology AI coverage” by Pablo Villalobos, David Atanasov</title>
      <description>&lt;p&gt; Subtitle: We've expanded our Biology AI Dataset, now covering 360+ models. Our analysis reveals rapid scaling from 2017-2021, followed by a notable slowdown in biological model development.&lt;/p&gt;  &lt;p&gt; We’re pleased to announce an expansion of our Biological Model Dataset, a component of Epoch AI's larger database of machine learning models. As the role of AI in biology continues to grow—powering advances in drug design, protein engineering, and genomics—the opportunities and governance challenges posed by biological AI models increase the importance of tracking advances in this field.&lt;/p&gt;
&lt;p&gt; Our goal with this project is to provide a comprehensive resource for researchers and policymakers. To this end, we have curated information from over 360 models in this update, prioritizing recent models at the frontier of capability, scale, or scientific impact. Alongside details on their developers, intended tasks, and training datasets, we’ve included new estimates of the training compute that went into developing them.&lt;/p&gt;
&lt;p&gt; There's a chart here. The chart title reads: en-US-AvaMultilingualNeural__ Training compute of biological models &lt;/p&gt;
&lt;p&gt; There's a chart here. The chart title reads: en-US-AvaMultilingualNeural__ Training dataset size of biological models &lt;/p&gt;
&lt;p&gt; Analyzing compute and data trends can help us understand how invested the field is [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 29th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/announcing-expanded-biology-ai-coverage?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/announcing-expanded-biology-ai-coverage&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/biology-ai-models-compute.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/biology-ai-models-compute.png" alt="Training compute of biological models" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/biology-ai-models-data.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/biology-ai-models-data.png" alt="Training dataset size of biological models" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 29 Jan 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">10612900-6aa2-4e98-a508-cfe7077dd1b5</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/10612900-6aa2-4e98-a508-cfe7077dd1b5.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Pablo%2520Villalobos%252C%2520David%2520Atanasov&amp;title=%22Announcing%20our%20expanded%20biology%20AI%20coverage%22%20by%20Pablo%20Villalobos%2C%20David%20Atanasov&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fannouncing-expanded-biology-ai-coverage&amp;created_at=2026-05-18T17%3A09%3A50.363082%2B00%3A00&amp;duration=217" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/announcing-expanded-biology-ai-coverage</link>
      <itunes:duration>217</itunes:duration>
    </item>
    <item>
      <title>“AGI could drive wages below subsistence level” by Matthew Barnett</title>
      <description>&lt;p&gt; Subtitle: This Gradient Updates issue explores how AGI could disrupt labor markets, potentially driving wages below subsistence levels, and challenge historical economic trends.&lt;/p&gt;  &lt;p&gt; Historically, many have feared that automation would lead to mass unemployment and lower wages. Yet, despite massive improvements in automation in the last two centuries, average wages have risen, living standards have improved, and high unemployment has not become a persistent, long-term issue as many had expected. This historical pattern has led most economists to adopt the following optimistic view: automation typically creates at least as many opportunities as it destroys, and its overall impacts on wages are positive.&lt;/p&gt;
&lt;p&gt; But artificial general intelligence (AGI)—defined here as a technology that can functionally substitute for human workers in all labor tasks—may defy these historical precedents. Unlike past technologies, which typically automated specific tasks within industries, AGI has the potential to replace human labor across the entire spectrum of work, including physical tasks, and any new tasks that could be created in the future. Because of this, AGI might disrupt labor markets in an unprecedented way.&lt;/p&gt;
&lt;p&gt; In fact, there is a straightforward case for why developing AGI could drive human wages below subsistence level—the bare minimum [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:39) Why wages will plausibly fall after AGI&lt;/p&gt;&lt;p&gt;(09:37) Returns to scale&lt;/p&gt;&lt;p&gt;(14:34) Reasons for temporary optimism&lt;/p&gt;&lt;p&gt;(16:34) Long-run pessimism about wages&lt;/p&gt;&lt;p&gt;(22:37) Comparative advantage doesn't save us&lt;/p&gt;&lt;p&gt;(25:46) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 24th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/agi-could-drive-wages-below-subsistence-level?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/agi-could-drive-wages-below-subsistence-level&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/agi-could-drive-wages-below-subsistence-level/labor_output_wages_margin.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/agi-could-drive-wages-below-subsistence-level/labor_output_wages_margin.png" alt="Two graphs showing labor supply effects on output and wages in Cobb-Douglas production function." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 24 Jan 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">e41c0d21-f64a-4be7-b056-6f80c562621f</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/e41c0d21-f64a-4be7-b056-6f80c562621f.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Matthew%2520Barnett&amp;title=%22AGI%20could%20drive%20wages%20below%20subsistence%20level%22%20by%20Matthew%20Barnett&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fagi-could-drive-wages-below-subsistence-level&amp;created_at=2026-05-18T14%3A57%3A13.520963%2B00%3A00&amp;duration=1698" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/agi-could-drive-wages-below-subsistence-level</link>
      <itunes:duration>1698</itunes:duration>
    </item>
    <item>
      <title>“Clarifying the creation and use of the FrontierMath benchmark” by Tamay Besiroglu, Jaime Sevilla</title>
      <description>&lt;p&gt; Subtitle: We clarify that OpenAI commissioned Epoch AI to produce 300 math questions for the FrontierMath benchmark. They own these and have access to the statements and solutions, except for a 50-question holdout set.&lt;/p&gt;  &lt;p&gt; FrontierMath is a benchmark we created to evaluate the mathematical capabilities of frontier AI models. We saw a need for high-quality, challenging mathematical problems that could meaningfully test the limits of these systems. This remains our core mission—to help the AI community and the public at large accurately understand and measure AI capabilities.&lt;/p&gt;
&lt;p&gt; Building high-quality evaluations at this scale requires substantial resources. After approaching several potential funders, we partnered with OpenAI, who provided both the necessary funding and technical expertise to develop the benchmark.1 Working with industry sponsors helps make the benchmark more impactful for the AI field.&lt;/p&gt;
&lt;p&gt; However, we recognize we have not communicated clearly enough about the relationship between FrontierMath and OpenAI, leading to questions and concerns among contributors, researchers, and the public. To address these issues, here are the facts:&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; 
&lt;p&gt; OpenAI commissioned Epoch AI to produce 300 advanced math problems for AI evaluation that form the core of the FrontierMath benchmark. As is typical of commissioned [...]&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 23rd, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/openai-and-frontiermath?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/openai-and-frontiermath&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Thu, 23 Jan 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">1975d48e-189d-4ea8-aa98-c55c81f4d047</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/1975d48e-189d-4ea8-aa98-c55c81f4d047.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Tamay%2520Besiroglu%252C%2520Jaime%2520Sevilla&amp;title=%22Clarifying%20the%20creation%20and%20use%20of%20the%20FrontierMath%20benchmark%22%20by%20Tamay%20Besiroglu%2C%20Jaime%20Sevilla&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fopenai-and-frontiermath&amp;created_at=2026-05-18T17%3A09%3A51.094623%2B00%3A00&amp;duration=241" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/openai-and-frontiermath</link>
      <itunes:duration>241</itunes:duration>
    </item>
    <item>
      <title>“Epoch AI 2024 impact report” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: In 2024, Epoch published influential research, launched FrontierMath, expanded its AI data hub, engaged with policy and industry leaders, raised 7 million dollars, and more.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; 2024 Impact Report&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; 2024 has proven yet another impactful year for AI. The release of OpenAI's o1 established inference-time scaling as a crucial driver of progress, later culminating in the announcement of OpenAI's o3, for which OpenAI claims large advances in math, reasoning and coding benchmarks. We also saw many other major LLM releases, including highly capable models from OpenAI competitors, such as Google's Gemini 1.5 and 2.0 and Anthropic's Claude 3.5, a proliferation of GPT-4 kcale models such as Mistral Large (2), GLM-4, Doubao Pro, Nemotron-4 340B and Grok 2, and Llama 3.1 405B and DeepSeek v3, the first downloadable-weight models comparable to GPT-4 in performance. Lastly, we have seen large advances in video generation, including the release of models such as Sora, ImageGen and Veo 2, as well as some early results in computer interaction through GPT-4o and Claude 3.5 Sonnet.&lt;/p&gt;&lt;p&gt; Amidst all these developments, Epoch AI's mission of informing the public about ongoing developments remains as critical as ever. Throughout the year, we have updated our data page with [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:25) 2024. Impact Report&lt;/p&gt;&lt;p&gt;(02:52) Highlights from 2024&lt;/p&gt;&lt;p&gt;(02:59) FrontierMath&lt;/p&gt;&lt;p&gt;(04:04) Can AI Scaling Continue Through 2030?&lt;/p&gt;&lt;p&gt;(05:13) Epoch AI's Data Hub&lt;/p&gt;&lt;p&gt;(06:38) Press and citations&lt;/p&gt;&lt;p&gt;(08:40) What people are saying about Epoch AI&lt;/p&gt;&lt;p&gt;(10:08) 2024. in numbers&lt;/p&gt;&lt;p&gt;(10:11) Outputs&lt;/p&gt;&lt;p&gt;(10:23) Data collection&lt;/p&gt;&lt;p&gt;(10:47) Engagement&lt;/p&gt;&lt;p&gt;(11:16) Social media&lt;/p&gt;&lt;p&gt;(11:48) Company&lt;/p&gt;&lt;p&gt;(12:09) Our plans for 2025&lt;/p&gt;&lt;p&gt;(12:17) Curating Data on AI&lt;/p&gt;&lt;p&gt;(12:56) Measuring AI capabilities&lt;/p&gt;&lt;p&gt;(13:53) Modeling the impact of AI&lt;/p&gt;&lt;p&gt;(15:08) Communications&lt;/p&gt;&lt;p&gt;(15:29) Hiring&lt;/p&gt;&lt;p&gt;(15:57) Feedback&lt;/p&gt;&lt;p&gt;(16:16) Partnerships&lt;/p&gt;&lt;p&gt;(16:45) Support our work&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 17th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/epoch-impact-report-2024?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/epoch-impact-report-2024&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/epoch-impact-report-2024/frontiermath.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/epoch-impact-report-2024/frontiermath.png" alt="FrontierMath" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/epoch-impact-report-2024/can-ai-scaling.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/epoch-impact-report-2024/can-ai-scaling.png" alt="FrontierMath" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2025/epoch-impact-report-2024/datahub.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2025/epoch-impact-report-2024/datahub.png" alt="FrontierMath" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/nature.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/nature.png" alt="NATURE logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/time.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/time.png" alt="TIME logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/science.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/science.png" alt="SCIENCE logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/nyt.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/nyt.png" alt="THE NEW YORK TIMES logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/uscommerce.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/uscommerce.png" alt="US DEPARTMENT OF COMMERCE logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/ukscience.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/ukscience.png" alt="UK DEPARTMENT OF SCIENCE, INNOVATION AND TECHNOLOGY logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/microsoft.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/microsoft.png" alt="MICROSOFT, SATYA NADELLA logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/logos/leopold.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/logos/leopold.png" alt="LEOPOLD ASCHENBRENNER logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 17 Jan 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">2f01d705-a1d2-420f-83d4-ef7f239db25a</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/2f01d705-a1d2-420f-83d4-ef7f239db25a.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Epoch%20AI%202024%20impact%20report%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fepoch-impact-report-2024&amp;created_at=2026-05-18T17%3A09%3A52.964839%2B00%3A00&amp;duration=1124" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/epoch-impact-report-2024</link>
      <itunes:duration>1124</itunes:duration>
    </item>
    <item>
      <title>“How has DeepSeek improved the Transformer architecture?” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: This Gradient Updates issue goes over the major changes that went into DeepSeek's most recent model.&lt;/p&gt;  &lt;p&gt; DeepSeek has recently released DeepSeek v3, which is currently state-of-the-art in benchmark performance among open-weight models, alongside a technical report describing in some detail the training of the model. Impressively, they’ve achieved this SOTA performance by only using 2.8 million H800 hours of training hardware time—equivalent to about 4e24 FLOP if we assume 40% MFU. This is about ten times less training compute than the similarly performing Llama 3.1 405B.&lt;/p&gt;
&lt;p&gt; In this issue, I’ll cover some of the important architectural improvements that DeepSeek highlight in their report and why we should expect them to result in better performance compared to a vanilla Transformer. The full technical report contains plenty of non-architectural details as well, and I strongly recommend reading it if you want to get a better idea of the engineering problems that have to be solved when orchestrating a moderate-sized training run.&lt;/p&gt;
&lt;p&gt; Figure 1: The DeepSeek v3 architecture with its two most important improvements: DeepSeekMoE and multi-head latent attention (MLA). Multi-token prediction is not shown. From the DeepSeek v3 technical report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; Multi-head latent attention (MLA)&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Multi-head latent [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:38) Multi-head latent attention (MLA)&lt;/p&gt;&lt;p&gt;(02:12) What is the KV cache and why does it matter?&lt;/p&gt;&lt;p&gt;(05:40) Beating grouped-query attention&lt;/p&gt;&lt;p&gt;(09:01) Mixture-of-experts innovations&lt;/p&gt;&lt;p&gt;(11:46) Auxiliary-loss-free load balancing&lt;/p&gt;&lt;p&gt;(12:57) Shared experts&lt;/p&gt;&lt;p&gt;(15:16) Multi-token prediction&lt;/p&gt;&lt;p&gt;(17:46) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 17th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-has-deepseek-improved-the-transformer-architecture/figure-1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-has-deepseek-improved-the-transformer-architecture/figure-1.png" alt="Figure 1: The DeepSeek v3 architecture with its two most important improvements: DeepSeekMoE and multi-head latent attention (MLA). Multi-token prediction is not shown. From the DeepSeek v3 technical report." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-has-deepseek-improved-the-transformer-architecture/figure-2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-has-deepseek-improved-the-transformer-architecture/figure-2.png" alt="Figure 2: An illustration of multi-head latent attention from the DeepSeek v2 technical report." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/how-has-deepseek-improved-the-transformer-architecture/figure-3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/how-has-deepseek-improved-the-transformer-architecture/figure-3.png" alt="Figure 3: An illustration of DeepSeek v3’s multi-token prediction setup taken from its technical report." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 17 Jan 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">221b8fc5-c8e3-4621-a157-364065eb704e</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/221b8fc5-c8e3-4621-a157-364065eb704e.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22How%20has%20DeepSeek%20improved%20the%20Transformer%20architecture%3F%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fhow-has-deepseek-improved-the-transformer-architecture&amp;created_at=2026-05-18T14%3A41%3A30.491993%2B00%3A00&amp;duration=1159" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture</link>
      <itunes:duration>1159</itunes:duration>
    </item>
    <item>
      <title>“FrontierMath competition: Setting benchmarks for AI evaluation” by Tamay Besiroglu, Elliot Glazer, Caroline Falkman Olsson</title>
      <description>&lt;p&gt; Subtitle: We are hosting a competition to establish rigorous human performance baselines for FrontierMath. With a prize pool of $10,000, your participation will contribute directly to measuring AI progress in solving challenging mathematical problems.&lt;/p&gt;  &lt;p&gt; We’re launching a competition to establish rigorous human performance baselines for FrontierMath, our benchmark for evaluating AI mathematical capabilities. The results will provide crucial reference points for measuring AI progress in tackling very difficult mathematics problems.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Competition Overview&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; Format: 4.5 hours solving novel mathematics problems alongside leading mathematicians&lt;/li&gt;
&lt;li&gt; Guaranteed payment: $150 per participant&lt;/li&gt;
&lt;li&gt; Prize pool: 10 thousand dollars distributed across top performers&lt;/li&gt;
&lt;li&gt; Recognition: Participants acknowledged in FrontierMath baseline publication&lt;/li&gt;
&lt;li&gt; Date: March 30, 2025&lt;/li&gt;
&lt;li&gt; Time: Full event: 11 AM-7 PM. Competition: 12:30PM - 5PM.&lt;/li&gt;
&lt;li&gt; Location: Cambridge, MA&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt; Sign Up&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; We’re now encouraging interested mathematicians to sign up for the event. Please express your interest in competing below.
Note that sign up does not guarantee participation, as we may need to select among applicants.&lt;/p&gt;Get involved
&lt;p&gt;&lt;strong&gt; Why Participate&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; This competition offers a unique opportunity to contribute to AI progress measurement while competing for substantial prizes. Results will directly inform our understanding of AI capabilities by providing [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:51) Competition Overview&lt;/p&gt;&lt;p&gt;(01:28) Sign Up&lt;/p&gt;&lt;p&gt;(01:48) Why Participate&lt;/p&gt;&lt;p&gt;(02:06) Contact us&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 16th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/frontiermath-competition-setting-benchmarks-for-ai-evaluation?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/frontiermath-competition-setting-benchmarks-for-ai-evaluation&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Thu, 16 Jan 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">c4c3419c-d7dc-4445-88aa-e5b3aa4b2e0e</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/c4c3419c-d7dc-4445-88aa-e5b3aa4b2e0e.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Tamay%2520Besiroglu%252C%2520Elliot%2520Glazer%252C%2520Caroline%2520Falkman%2520Olsson&amp;title=%22FrontierMath%20competition%3A%20Setting%20benchmarks%20for%20AI%20evaluation%22%20by%20Tamay%20Besiroglu%2C%20Elliot%20Glazer%2C%20Caroline%20Falkman%20Olsson&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Ffrontiermath-competition-setting-benchmarks-for-ai-evaluation&amp;created_at=2026-05-18T17%3A12%3A08.390785%2B00%3A00&amp;duration=140" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/frontiermath-competition-setting-benchmarks-for-ai-evaluation</link>
      <itunes:duration>140</itunes:duration>
    </item>
    <item>
      <title>“The economic consequences of automating remote work” by Matthew Barnett</title>
      <description>&lt;p&gt; Subtitle: This Gradient Updates issue investigates the economic consequences of fully automating remote work.&lt;/p&gt;  &lt;p&gt; Recent AI progress has shown great promise in automating cognitive tasks, like those in natural language processing and vision. By contrast, progress in general-purpose robotics has lagged. While we already have access to intelligent virtual assistants at our fingertips, robots capable of fully cleaning a standard suburban home still appear years away.&lt;/p&gt;
&lt;p&gt; Given the relatively slow pace of progress in robotics, a large share of knowledge work might be automated before physical jobs are overtaken by AIs. This naturally prompts the question: what might be the economic impact if only remote work—defined as work that can be performed entirely from home using digital tools, a computer, and an internet connection—were to be automated?&lt;/p&gt;
&lt;p&gt; To investigate this question, I conduct a three-part analysis. First, I use GPT-4o to classify the tasks involved in US occupations according to the O*NET database, finding that around 34% of job tasks can be performed remotely. This contrasts with previous findings from a study by Dingel &amp;amp; Neiman, which found that 37% of US occupations—not tasks—can be performed entirely remotely.&lt;/p&gt;
&lt;p&gt; Second, I use data from the transition [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:34) What fraction of present economic work can be done remotely?&lt;/p&gt;&lt;p&gt;(06:49) How much would the economy grow if remote work were automated?&lt;/p&gt;&lt;p&gt;(07:20) A brief primer on the elasticity of substitution&lt;/p&gt;&lt;p&gt;(10:01) The production function&lt;/p&gt;&lt;p&gt;(12:50) Using the pandemic to estimate the elasticity of substitution&lt;/p&gt;&lt;p&gt;(18:32) Estimating the elasticity of substitution via proxy&lt;/p&gt;&lt;p&gt;(21:02) Putting it all together&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 2 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 10th, 2025 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/consequences-of-automating-remote-work?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/consequences-of-automating-remote-work&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/consequences-of-automating-remote-work/top_20_wage_bill_professions.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/consequences-of-automating-remote-work/top_20_wage_bill_professions.png" alt="Bar chart titled "Share of tasks suitable for remote work among main US jobs" showing percentages across 20 professions." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/consequences-of-automating-remote-work/yearly_data_plot.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/consequences-of-automating-remote-work/yearly_data_plot.png" alt="Two line graphs showing labor force and Real GDP trends from 2008 to 2022." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2025/consequences-of-automating-remote-work/pessimistic_optimistic_scenarios_v2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2025/consequences-of-automating-remote-work/pessimistic_optimistic_scenarios_v2.png" alt="Graph showing real GDP multiplier versus factor increase in remote workers under different reallocation scenarios." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 10 Jan 2025 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">29e8bcc1-47a0-4bdc-81b2-f05b34a5e3d7</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/29e8bcc1-47a0-4bdc-81b2-f05b34a5e3d7.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Matthew%2520Barnett&amp;title=%22The%20economic%20consequences%20of%20automating%20remote%20work%22%20by%20Matthew%20Barnett&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fconsequences-of-automating-remote-work&amp;created_at=2026-05-18T15%3A06%3A21.908541%2B00%3A00&amp;duration=1544" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/consequences-of-automating-remote-work</link>
      <itunes:duration>1544</itunes:duration>
    </item>
    <item>
      <title>“Moravec’s paradox and its implications” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: This Gradient Updates issue explains Moravec's paradox and offers a speculative picture of how hard various economic tasks are to automate based on the paradox.&lt;/p&gt;  &lt;p&gt; Since the birth of the field of artificial intelligence in the 20th century, researchers have observed that the difficulty of a task for humans at best weakly correlates with its difficulty for AI systems. For example, humans find it difficult to multiply ten-digit numbers in their heads but easy to draw boxes around each individual cat in a photograph. In contrast, for AI systems the difficulty is reversed: they could do the former task in the 1950s, and it took until the 2010s for segmentation algorithms to match human performance on the latter task.&lt;/p&gt;
&lt;p&gt; The specific observation that it's easy to build AI systems that perform formal reasoning tasks but difficult to build AI systems whose perception and motor skills are comparable to a human is called Moravec's paradox. Moravec himself offered an evolutionary explanation for the paradox: we should expect cognitive skills that have been around for longer to be more difficult to reproduce in AI systems, because evolution is likely to have applied significantly more optimization pressure on older [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:05) How does the brain work?&lt;/p&gt;&lt;p&gt;(05:05) What explains performance differences between AIs and the human brain?&lt;/p&gt;&lt;p&gt;(10:11) Which tasks will be automated next according to this picture?&lt;/p&gt;&lt;p&gt;(13:24) Concluding thoughts&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 27th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/moravec-s-paradox?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/moravec-s-paradox&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Fri, 27 Dec 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">b46f05a3-a5f5-429b-ab75-6cca69213de4</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/b46f05a3-a5f5-429b-ab75-6cca69213de4.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22Moravec%E2%80%99s%20paradox%20and%20its%20implications%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fmoravec-s-paradox&amp;created_at=2026-05-18T15%3A06%3A25.100847%2B00%3A00&amp;duration=911" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/moravec-s-paradox</link>
      <itunes:duration>911</itunes:duration>
    </item>
    <item>
      <title>“How do mixture-of-experts models compare to dense models in inference?” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: This Gradient Updates issue explores how mixture-of-experts models compare to dense models in inference, focusing on costs, efficiency, and decoding dynamics.&lt;/p&gt;  &lt;p&gt; In last week's Gradient Updates issue, I discussed how we can guess that GPT-4o and Claude 3.5 Sonnet have significantly fewer parameters than GPT-4. The most common question I’ve received from readers was some version of the following:&lt;/p&gt;

&lt;p&gt; Don’t active parameters matter more for inference economics than total parameters? If so, how can you infer the total number of parameters of a model by only looking at its inference cost and speed?&lt;/p&gt;

&lt;p&gt; In response to this, I’ve decided to make this issue specifically about the question of inference with mixture-of-experts (MoE) models. The basic takeaway is that MoEs are more efficient at inference than dense models of the same total parameter count, but less efficient than dense models with the same active parameter count. A rough rule of thumb is that an 8-way sparse model has the same short-context decoding economics as a dense model half its size.&lt;/p&gt;
&lt;p&gt; For the sake of brevity, I’ll assume familiarity with basic concepts in Transformer inference, so I expect this issue to be confusing to readers without [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:45) Advantages of MoE models&lt;/p&gt;&lt;p&gt;(03:00) MoEs have fewer active parameters than dense models&lt;/p&gt;&lt;p&gt;(06:39) MoEs are shallower and wider than dense models&lt;/p&gt;&lt;p&gt;(08:14) At fixed model depth, MoE models need less network communication than dense models&lt;/p&gt;&lt;p&gt;(10:00) MoEs have smaller attention blocks than dense models&lt;/p&gt;&lt;p&gt;(11:53) Estimating the MoE inference edge&lt;/p&gt;&lt;p&gt;(15:09) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 20th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/moe-vs-dense-models-inference?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/moe-vs-dense-models-inference&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Fri, 20 Dec 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">41596d32-ccc2-4a73-8d9e-cf19b7b9aa4c</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/41596d32-ccc2-4a73-8d9e-cf19b7b9aa4c.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22How%20do%20mixture-of-experts%20models%20compare%20to%20dense%20models%20in%20inference%3F%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fmoe-vs-dense-models-inference&amp;created_at=2026-05-18T15%3A24%3A59.041706%2B00%3A00&amp;duration=979" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/moe-vs-dense-models-inference</link>
      <itunes:duration>979</itunes:duration>
    </item>
    <item>
      <title>“Announcing Gradient Updates: Our new weekly newsletter” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: We are announcing Gradient Updates, our new weekly newsletter focused on timely and important questions in AI.&lt;/p&gt;  &lt;p&gt; Last Friday, we released the first issue of our new weekly newsletter, Gradient Updates, led and mainly written by senior researcher Ege Erdil. Each issue will offer in-depth commentary on timely and enduring questions in AI. Rather than delivering a roundup of the week's headlines, Gradient Updates focuses on a single, carefully chosen topic each week. For instance, our inaugural issue examined the impact of U.S. export controls on Chinese AI capabilities.&lt;/p&gt;
&lt;p&gt; With Gradient Updates, we aim to share insights and explorations that are less formal than a full-length paper or technical report, but more substantial than typical industry news briefs. You won’t find the latest investments or product releases covered here; instead, expect content that grapples with broader themes—from the economic implications of vertical disintegration within the AI sector to the potential of synthetic data and test-time compute scaling to surpass current pretraining limitations.&lt;/p&gt;
&lt;p&gt; You can read our first two issues now, and subscribe to receive new issues as they’re published.&lt;/p&gt;
&lt;p&gt; Issue #1: What did US export controls mean for China's AI capabilities? In this issue [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 13th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/announcing-gradient-updates-newsletter?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/announcing-gradient-updates-newsletter&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Fri, 13 Dec 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">f3bbf8ad-d9ff-49e6-a620-dca723fc5f36</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/f3bbf8ad-d9ff-49e6-a620-dca723fc5f36.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22Announcing%20Gradient%20Updates%3A%20Our%20new%20weekly%20newsletter%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fannouncing-gradient-updates-newsletter&amp;created_at=2026-05-18T17%3A09%3A54.645778%2B00%3A00&amp;duration=125" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/announcing-gradient-updates-newsletter</link>
      <itunes:duration>125</itunes:duration>
    </item>
    <item>
      <title>“Frontier language models have become much smaller” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: In this Gradient Updates weekly issue, Ege discusses how frontier language models have unexpectedly reversed course on scaling, with current models an order of magnitude smaller than GPT-4.&lt;/p&gt;  &lt;p&gt; Between the release of the original Transformer in 2017 and the release of GPT-4, language models at the frontier of capabilities became much larger. Parameter counts were scaled up by 1000 times from 117 million to 175 billion between GPT-1 and GPT-3 in the span of two years and by another 10 times from 175 billion to 1.8 trillion between GPT-3 and GPT-4 in the span of the next two years and nine months.&lt;/p&gt;
&lt;p&gt; If the post GPT-3 trend had continued, given that GPT-4 was released in March 2023, by now we could have expected to see models with close to 10 trillion parameters, around 4 times bigger than GPT-4. However, in 2023, the trend of frontier language models becoming bigger reversed. Let alone reaching the 10 trillion parameter mark, current frontier models such as the original GPT-4o and Claude 3.5 Sonnet are probably an order of magnitude smaller than GPT-4, with 4o having around 200 billion and 3.5 Sonnet around 400 billion parameters.&lt;/p&gt;
&lt;p&gt; In this issue [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:51) How do we know this has happened?&lt;/p&gt;&lt;p&gt;(05:51) Why did this happen?&lt;/p&gt;&lt;p&gt;(09:53) Should we expect frontier models to keep getting smaller?&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 13th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/frontier-language-models-have-become-much-smaller?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/frontier-language-models-have-become-much-smaller&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/gradient-updates/2024/frontier-language-models-have-become-much-smaller/token-economics.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/gradient-updates/2024/frontier-language-models-have-become-much-smaller/token-economics.png" alt="Figure 1: Pareto frontiers for token generation speed per request versus cost per million tokens generated on scaled-down versions of GPT-4. The results are based on an internal Epoch model." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 13 Dec 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">af11e1ef-03af-4f1e-89f2-b2799e4951c6</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/af11e1ef-03af-4f1e-89f2-b2799e4951c6.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22Frontier%20language%20models%20have%20become%20much%20smaller%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Ffrontier-language-models-have-become-much-smaller&amp;created_at=2026-05-18T15%3A06%3A27.209415%2B00%3A00&amp;duration=794" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/frontier-language-models-have-become-much-smaller</link>
      <itunes:duration>794</itunes:duration>
    </item>
    <item>
      <title>“What did US export controls mean for China’s AI capabilities?” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: Export controls on China give the US a hardware lead of around 4 years in training frontier models, but essentially no lead in serving those models to users.&lt;/p&gt;  &lt;p&gt; Four days ago, the US government announced new rules around the export of powerful chips and semiconductor manufacturing equipment to China. This recent update is part of a broader trend of increasing export restrictions, following the announcement of the first export controls by the US Bureau of Industry and Security (BIS) in 2022 and the first revision in 2023.&lt;/p&gt;
&lt;p&gt; The BIS documents in which these export controls are announced are quite long. The most recent update takes up two documents, totaling 210 pages, and previous updates weren’t any shorter. In this issue, I aim to cut through most of this noise and give you a broad picture of what impact the chip export controls have had on China's ability to train and serve powerful AI models.&lt;/p&gt;
&lt;p&gt; The high-level takeaway is that current export controls on China give the US a hardware lead of around 4 years when it comes to training frontier models, but the US has essentially no lead when it comes to serving those models [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:47) The October 2022 export controls&lt;/p&gt;&lt;p&gt;(05:43) The October 2023 update&lt;/p&gt;&lt;p&gt;(09:11) The December 2024 update&lt;/p&gt;&lt;p&gt;(10:48) What's going to happen next?&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 6th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/gradient-updates/us-export-controls-china-ai?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/gradient-updates/us-export-controls-china-ai&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Fri, 06 Dec 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">4ea4e179-f9f7-466b-aaf2-9c53331b89f7</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/4ea4e179-f9f7-466b-aaf2-9c53331b89f7.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22What%20did%20US%20export%20controls%20mean%20for%20China%E2%80%99s%20AI%20capabilities%3F%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fgradient-updates%2Fus-export-controls-china-ai&amp;created_at=2026-05-18T15%3A06%3A25.715017%2B00%3A00&amp;duration=768" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/gradient-updates/us-export-controls-china-ai</link>
      <itunes:duration>768</itunes:duration>
    </item>
    <item>
      <title>“What is the future of AI in mathematics? Interviews with leading mathematicians” by Anson Ho, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: How will AI transform mathematics? Fields Medalists and other leading mathematicians discuss whether they expect AI to automate advanced math research.&lt;/p&gt;  &lt;p&gt; Recent advances in artificial intelligence are beginning to influence the field of mathematics, prompting important questions about how mathematical research might evolve. Systems like Google DeepMind's AlphaProof have demonstrated capabilities approaching gold medal-level performance at the International Mathematical Olympiad. Professor Timothy Gowers described these abilities as “very impressive, and well beyond what I thought was state of the art.”&lt;/p&gt;
&lt;p&gt; Developments like these raise several important questions: How will the nature of mathematics research evolve over the next decade? Can mathematics research be fully automated, and when might this happen? To answer these questions, we interviewed four distinguished mathematicians about the implications of AI progress in mathematics: Fields Medalists Prof Terence Tao, Prof Timothy Gowers, and Prof Richard Borcherds, as well as IMO expert Evan Chen.&lt;/p&gt;
&lt;p&gt; In our conversations, the mathematicians highlighted key themes regarding AI's potential impact on mathematics. They discussed how AI could assist in proof development and verification, facilitate experimental approaches by exploring vast numbers of potential statements, generate novel conjectures by synthesizing information across fields, reduce barriers to entry into specialized [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:04) AI augmentation of mathematics research&lt;/p&gt;&lt;p&gt;(06:17) Challenges to AI systems achieving deep research competence in math&lt;/p&gt;&lt;p&gt;(08:31) Fully automating mathematics research&lt;/p&gt;&lt;p&gt;(11:13) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 4th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/frontiermath/tiers-1-4/expert-perspectives?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/frontiermath/tiers-1-4/expert-perspectives&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/expert-perspectives/ai-proof-assistant.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/expert-perspectives/ai-proof-assistant.png" alt="A visualization illustrating how AI can process and evaluate proof statements. On the left, a grid of text bubbles represents millions of proof statements being analyzed by AI. In the center, a neural network diagram symbolizes the AI model performing the analysis. On the right, the processed statements are marked with green checkmarks, red crosses, or blue question marks, indicating whether the statements are true, false, or uncertain." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 04 Dec 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">15be0f65-6ce9-4ea9-95b9-d8a11d818a33</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/15be0f65-6ce9-4ea9-95b9-d8a11d818a33.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho%252C%2520Tamay%2520Besiroglu&amp;title=%22What%20is%20the%20future%20of%20AI%20in%20mathematics%3F%20Interviews%20with%20leading%20mathematicians%22%20by%20Anson%20Ho%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Ffrontiermath%2Ftiers-1-4%2Fexpert-perspectives&amp;created_at=2026-05-26T19%3A05%3A00.089055%2B00%3A00&amp;duration=730" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/frontiermath/tiers-1-4/expert-perspectives</link>
      <itunes:duration>730</itunes:duration>
    </item>
    <item>
      <title>“Introducing the distributed training interactive simulator” by Ege Erdil, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: We introduce an interactive simulation tool which can simulate distributed training runs of large language models under ideal conditions.&lt;/p&gt;  &lt;p&gt; Recently, we published the results of our investigation into data movement bottlenecks in distributed training of deep learning models, introducing a detailed model of the bandwidth and latency costs of different modes of parallelism in GPU clusters. Today, we’re also releasing an interactive simulator with a user-friendly interface implementing the same model.&lt;/p&gt;
&lt;p&gt; This post will introduce the simulation tool's features by using it to answer the following question: how big of a neural network training run could we have done using the GTX 580 GPUs that were used to train AlexNet in 2012? In other words, instead of training AlexNet on two GTX 580 3GB GPUs for six days, how far could an AI lab have scaled training if they wanted to train the largest possible model using just GTX 580 3GB GPUs, disregarding cost considerations?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; How to use the tool&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Before getting into the specific experiment to answer this question, let's see how to get started with using the tool. On launch, you’ll be greeted with the following interface:&lt;/p&gt;&lt;p&gt; Figure 1: The user interface as [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:20) How to use the tool&lt;/p&gt;&lt;p&gt;(05:00) What would have been the largest feasible training run in 2012?&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 2 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 29th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/introducing-the-distributed-training-interactive-simulator?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/introducing-the-distributed-training-interactive-simulator&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/introducing-the-distributed-training-interactive-simulator/figure%201.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/introducing-the-distributed-training-interactive-simulator/figure%201.jpg" alt="Figure 1: The user interface as it would be seen upon first launching the tool." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/distributed-training-fig.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/distributed-training-fig.png" alt="FLOP Utilization Rates of Training Runs" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 29 Nov 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">4e3d489f-8851-4ce5-b3f5-ef60fc34bff2</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/4e3d489f-8851-4ce5-b3f5-ef60fc34bff2.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil%252C%2520Tamay%2520Besiroglu&amp;title=%22Introducing%20the%20distributed%20training%20interactive%20simulator%22%20by%20Ege%20Erdil%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fintroducing-the-distributed-training-interactive-simulator&amp;created_at=2026-05-18T17%3A22%3A41.053574%2B00%3A00&amp;duration=537" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/introducing-the-distributed-training-interactive-simulator</link>
      <itunes:duration>537</itunes:duration>
    </item>
    <item>
      <title>“Introducing Epoch AI’s AI benchmarking hub” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: We are launching the AI Benchmarking Hub: a platform presenting our evaluations of leading models on challenging benchmarks, with analysis of trends in AI capabilities.&lt;/p&gt;  &lt;p&gt; Epoch AI is launching our AI Benchmarking Hub—a platform for comprehensively understanding AI capabilities.&lt;/p&gt;
&lt;p&gt; By evaluating leading AI models ourselves and carefully analyzing the results, we aim to shed light on the main trends in AI capabilities. Our clear visuals and detailed findings can help researchers, developers, and decision-makers better understand what today's AI systems can actually do—and where they’re headed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; Key Features of the AI Benchmarking Hub&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Challenging benchmarks: Our goal is to track model performance on the hardest and most informative benchmarks. For this first release, the hub features results from two benchmarks:&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; GPQA Diamond: This is a higher-quality, challenging subset of the GPQA benchmark, which tests models’ ability to answer PhD-level multiple choice questions about chemistry, physics, and biology.&lt;/li&gt;
&lt;li&gt; MATH Level 5: This is a subset of the hardest questions from the MATH benchmark, a dataset of high-school level competition math problems.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt; We plan to rapidly expand our suite of benchmarks to create a thorough picture of AI progress, by adding benchmarks such as [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:05) Key Features of the AI Benchmarking Hub&lt;/p&gt;&lt;p&gt;(02:50) What's Next?&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 27th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/introducing-benchmarks-dashboard?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/introducing-benchmarks-dashboard&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/introducing-benchmarks-dashboard/benchmarks-screenshot.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/introducing-benchmarks-dashboard/benchmarks-screenshot.png" alt="Graph showing AI performance on expert-level mathematics problems, plotting FrontierMath accuracy over time by organization." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/data-insights/insight-downloadable-comparison-gpqa-simple.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/data-insights/insight-downloadable-comparison-gpqa-simple.png" alt="Models with downloadable weights currently lag behind the top-performing models" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 27 Nov 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">dc614449-5d75-45e7-8e2e-f8a3ae6c6c6d</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/dc614449-5d75-45e7-8e2e-f8a3ae6c6c6d.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Introducing%20Epoch%20AI%E2%80%99s%20AI%20benchmarking%20hub%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fintroducing-benchmarks-dashboard&amp;created_at=2026-05-18T17%3A22%3A44.52356%2B00%3A00&amp;duration=251" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/introducing-benchmarks-dashboard</link>
      <itunes:duration>251</itunes:duration>
    </item>
    <item>
      <title>“Hardware failures won’t limit AI scaling” by Alexander Erben, Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: Our analysis shows hardware failures won't limit AI training scale. GPU memory-based checkpointing enables training beyond millions of GPUs.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Computational power has become the primary driving force in the race for improved model capabilities. Leading tech companies (Google, Microsoft, Meta, and Amazon) already possess AI computing infrastructure equivalent to more than two million NVIDIA H100 GPUs, and frontier AI model training costs are projected to exceed $1 billion per model by 2027. This scale of computation brings with it an increasingly important challenge in distributed computing: hardware failures.&lt;/p&gt;&lt;p&gt; To understand why, consider that if a single H100 GPU fails on average once every 50,000 hours (about 6 years), a cluster of 100,000 GPUs will face a failure every 30 minutes, and a million-GPU cluster will see failures every 3 minutes.&lt;/p&gt;&lt;p&gt; To combat these failures, engineers resort to one crucial technique: checkpointing. Checkpointing saves training progress to a resilient storage, and in the event of a failure, training can restart by loading saved progress into a working set of GPUs. This opens up an opportunity for optimization, as excessively frequent checkpointing wastes valuable computation time, while insufficient checkpointing risks losing significant progress in the event of a [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:24) Introduction&lt;/p&gt;&lt;p&gt;(02:43) Background and Current Practices&lt;/p&gt;&lt;p&gt;(07:30) Alternative Checkpointing Strategies Using GPU Memory&lt;/p&gt;&lt;p&gt;(11:48) Chinchilla-based Model Sizes&lt;/p&gt;&lt;p&gt;(16:02) Idle Spares, Maintenance and Catastrophic Failures&lt;/p&gt;&lt;p&gt;(18:41) Conclusion&lt;/p&gt;&lt;p&gt;(20:18) Acknowledgements&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 15 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 22nd, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/hardware-failures-wont-limit-ai-scaling?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/hardware-failures-wont-limit-ai-scaling&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/hardware-failures-wont-limit-ai-scaling/figure%201.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/hardware-failures-wont-limit-ai-scaling/figure%201.png" alt="Figure 1: Visualization of how checkpointing, failures, and recovery time lead to lost time." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 22 Nov 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">1be764d7-bbdb-494c-8b6c-a2ef01dbf047</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/1be764d7-bbdb-494c-8b6c-a2ef01dbf047.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Alexander%2520Erben%252C%2520Ege%2520Erdil&amp;title=%22Hardware%20failures%20won%E2%80%99t%20limit%20AI%20scaling%22%20by%20Alexander%20Erben%2C%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fhardware-failures-wont-limit-ai-scaling&amp;created_at=2026-05-18T17%3A22%3A45.751975%2B00%3A00&amp;duration=1259" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/hardware-failures-wont-limit-ai-scaling</link>
      <itunes:duration>1259</itunes:duration>
    </item>
    <item>
      <title>“FrontierMath: A benchmark for evaluating advanced mathematical reasoning in AI” by Tamay Besiroglu, Elliot Glazer, Caroline Falkman Olsson</title>
      <description>&lt;p&gt; Subtitle: FrontierMath: a new benchmark of expert-level math problems designed to measure AI's mathematical abilities. See how leading AI models perform against the collective mathematics community.&lt;/p&gt; &lt;p&gt; This announcement was originally published on November 8, 2024. For the latest benchmark results on FrontierMath, please see AI Benchmarking.&lt;/p&gt;  &lt;p&gt; We’re introducing FrontierMath, a benchmark of hundreds of original, expert-crafted mathematics problems designed to evaluate advanced reasoning capabilities in AI systems. These problems span major branches of modern mathematics, from computational number theory to abstract algebraic geometry, and typically require hours or days for expert mathematicians to solve.1&lt;/p&gt;
&lt;p&gt; Figure 1. While leading AI models now achieve near-perfect scores on traditional benchmarks like GSM-8k and MATH, they solve less than 2% of FrontierMath problems, revealing a substantial gap between current AI capabilities and the collective prowess of the mathematics community. MMLU scores shown are for the College Mathematics category of the benchmark.&lt;/p&gt;
&lt;p&gt; To understand and measure progress in artificial intelligence, we need carefully designed benchmarks that can assess how well AI systems engage in complex scientific reasoning. Mathematics offers a unique opportunity for this assessment: it requires extended chains of precise reasoning, with each step building exactly on what came [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:02) The FrontierMath Benchmark&lt;/p&gt;&lt;p&gt;(05:22) Current Performance on FrontierMath&lt;/p&gt;&lt;p&gt;(06:37) Our next steps&lt;/p&gt;&lt;p&gt;(07:49) Conclusion&lt;/p&gt;&lt;p&gt;(08:52) Conflict of interest statement&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 8th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/frontiermath/tiers-1-4/the-benchmark?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/frontiermath/tiers-1-4/the-benchmark&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/the-benchmark/frontiermath-vs-other-benchmarks.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/the-benchmark/frontiermath-vs-other-benchmarks.png" alt="Figure 1. While leading AI models now achieve near-perfect scores on traditional benchmarks like GSM-8k and MATH, they solve less than 2% of FrontierMath problems, revealing a substantial gap between current AI capabilities and the collective prowess of the mathematics community. MMLU scores shown are for the College Mathematics category of the benchmark." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/the-benchmark/current-models-vs-frontiermath.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/the-benchmark/current-models-vs-frontiermath.png" alt="Figure 2. Performance of leading language models on FrontierMath. All models show consistently poor performance, with even the best models solving less than 2% of problems." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 08 Nov 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">de1c8df0-13cf-46ba-8c95-4418f88c2063</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/de1c8df0-13cf-46ba-8c95-4418f88c2063.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Tamay%2520Besiroglu%252C%2520Elliot%2520Glazer%252C%2520Caroline%2520Falkman%2520Olsson&amp;title=%22FrontierMath%3A%20A%20benchmark%20for%20evaluating%20advanced%20mathematical%20reasoning%20in%20AI%22%20by%20Tamay%20Besiroglu%2C%20Elliot%20Glazer%2C%20Caroline%20Falkman%20Olsson&amp;source_url=https%3A%2F%2Fepoch.ai%2Ffrontiermath%2Ftiers-1-4%2Fthe-benchmark&amp;created_at=2026-05-27T04%3A31%3A47.182671%2B00%3A00&amp;duration=591" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/frontiermath/tiers-1-4/the-benchmark</link>
      <itunes:duration>591</itunes:duration>
    </item>
    <item>
      <title>“How far behind are open models?” by Ben Cottier, Josh You, Natalia Martemianova, David Owen</title>
      <description>&lt;p&gt; Subtitle: We compare open and closed AI models, and study how openness has evolved. The best open model today is on par with closed models in performance and training compute, but with a lag of about one year.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Openness has long been a norm in AI research, fostering collaboration in the field. However, rapid advances in AI have sparked concerns about the potential consequences of releasing the most capable models. In addition, businesses that sell access to models like ChatGPT have a commercial incentive to keep models private.&lt;/p&gt;&lt;p&gt; Industry AI labs have responded to these developments in various ways. Some models remain unreleased, such as Google DeepMind's Chinchilla model. Alternatively, models like GPT-4o have structured access, controlling how users can interact with the model. Other models have their weights available to download with restrictions on the terms of use, such as Meta's Llama models.&lt;/p&gt;&lt;p&gt; Publishing models, code, and datasets enables innovation and external scrutiny, but is irrevocable and risks misuse if a model's safeguards are bypassed.1,2 There is an ongoing debate about whether this trade-off is acceptable, or avoidable. Supporters of open source AI have argued that openness benefits society, as well as the model developer, through [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:27) Introduction&lt;/p&gt;&lt;p&gt;(02:57) Summary of findings&lt;/p&gt;&lt;p&gt;(05:39) What is an "open" AI model?&lt;/p&gt;&lt;p&gt;(08:27) Degrees of accessibility&lt;/p&gt;&lt;p&gt;(09:23) Benchmark performance&lt;/p&gt;&lt;p&gt;(10:02) Open models have lagged on benchmarks by 5 to 22 months&lt;/p&gt;&lt;p&gt;(12:10) Newer, open LLMs use less compute to match older, closed LLMs&lt;/p&gt;&lt;p&gt;(15:00) Training compute&lt;/p&gt;&lt;p&gt;(16:13) The scaling of open models lags by about 15 months&lt;/p&gt;&lt;p&gt;(18:00) Will open AI models catch up or fall further behind?&lt;/p&gt;&lt;p&gt;(23:08) Most notable AI models released between 2019 to 2023 were open&lt;/p&gt;&lt;p&gt;(27:30) Related work&lt;/p&gt;&lt;p&gt;(29:26) Discussion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 42 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 4th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/open-models-report?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/open-models-report&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/open-closed-lag.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/open-closed-lag.png" alt="Open models have lagged on benchmarks by 5 to 22 months." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/open-models-use-less-compute.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/open-models-use-less-compute.png" alt="Some open LLMs use less compute to match closed LLMs" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/top-1-open-closed-ai-models-compute-lag.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/top-1-open-closed-ai-models-compute-lag.png" alt="The training compute of top-1 open models keeps pace, but lags by 15 months" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/top-10-open-closed-ai-models-compute-lag.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/top-10-open-closed-ai-models-compute-lag.png" alt="The training compute of top-10 open models is falling behind" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/llamas-catch-up-to-top-closed-models.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/llamas-catch-up-to-top-closed-models.png" alt="The training compute of Llama models may catch up to top closed models in 2025" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/notable-ai-models-accesibility.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/notable-ai-models-accesibility.png" alt="Most notable AI models released from 2019 to 2023 were open" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 04 Nov 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">d65ed137-8c51-408f-a6d9-f6d50dc2b3a0</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/d65ed137-8c51-408f-a6d9-f6d50dc2b3a0.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ben%2520Cottier%252C%2520Josh%2520You%252C%2520Natalia%2520Martemianova%252C%2520David%2520Owen&amp;title=%22How%20far%20behind%20are%20open%20models%3F%22%20by%20Ben%20Cottier%2C%20Josh%20You%2C%20Natalia%20Martemianova%2C%20David%20Owen&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fopen-models-report&amp;created_at=2026-05-18T17%3A22%3A46.554818%2B00%3A00&amp;duration=1999" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/open-models-report</link>
      <itunes:duration>1999</itunes:duration>
    </item>
    <item>
      <title>“Data movement bottlenecks to large-scale model training: Scaling past 1e28 FLOP” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: Data movement bottlenecks limit LLM scaling beyond 2e28 FLOP, with a "latency wall" at 2e31 FLOP. We may hit these in ~3 years. Aggressive batch size scaling could potentially overcome these limits.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Over the past five years, the performance of large language models (LLMs) has improved dramatically, driven largely by rapid scaling in training compute budgets to handle larger models and training datasets. Our own estimates suggest that the training compute used by frontier AI models has grown by 4-5 times every year from 2010 to 2024. This rapid pace of scaling far outpaces Moore's law, and sustaining it has required scaling along three dimensions: First, making training runs last longer; second, increasing the number of GPUs participating in each training run; and third, utilizing more performant GPUs.&lt;/p&gt;&lt;p&gt; It's relatively easy to scale the duration that a GPU cluster is used to train a model.1 However, in practice training runs rarely exceed 6 months. This is because both the hardware and software used for a training run risks becoming obsolete at timescales longer than this, and no lab would want to release a model which has become outdated immediately upon release. This sets a practical limit [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:34) Introduction&lt;/p&gt;&lt;p&gt;(04:18) Data movement patterns during distributed training&lt;/p&gt;&lt;p&gt;(04:43) Intra-GPU data movement&lt;/p&gt;&lt;p&gt;(06:41) Inter-GPU data movement&lt;/p&gt;&lt;p&gt;(10:10) The limits to scaling&lt;/p&gt;&lt;p&gt;(11:26) Understanding the latency wall&lt;/p&gt;&lt;p&gt;(14:21) Possible ways to overcome the limits&lt;/p&gt;&lt;p&gt;(16:15) Increasing the batch size&lt;/p&gt;&lt;p&gt;(18:18) Reducing the model depth&lt;/p&gt;&lt;p&gt;(19:43) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 4 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 2nd, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/data-movement-bottlenecks-scaling-past-1e28-flop?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/data-movement-bottlenecks-scaling-past-1e28-flop&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/ml-models-nearing-gpu-limits.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/ml-models-nearing-gpu-limits.png" alt="Data movement bottlenecks constrain AI scaling" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/flop-utilization-rate.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/flop-utilization-rate.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/theorical-utilization-rate.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/theorical-utilization-rate.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Sat, 02 Nov 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">439edf67-726b-4e43-a9f6-d8375f66acc6</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/439edf67-726b-4e43-a9f6-d8375f66acc6.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22Data%20movement%20bottlenecks%20to%20large-scale%20model%20training%3A%20Scaling%20past%201e28%20FLOP%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fdata-movement-bottlenecks-scaling-past-1e28-flop&amp;created_at=2026-05-18T18%3A03%3A20.320254%2B00%3A00&amp;duration=1295" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/data-movement-bottlenecks-scaling-past-1e28-flop</link>
      <itunes:duration>1295</itunes:duration>
    </item>
    <item>
      <title>“Introducing Epoch AI’s machine learning hardware database” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: Our new database covers hardware used to train AI models, featuring over 100 accelerators (GPUs and TPUs) across the deep learning era.&lt;/p&gt;  &lt;p&gt; Modern AI models are trained on large supercomputing clusters using specialized hardware. For the leading AI models of today, hardware spending can reach billions of dollars. To explore ML hardware trends in detail, we have added a new Machine Learning Hardware database in our data hub. This covers key trends, such as how hardware performance has improved over time, or how AI clusters have grown larger for training leading models. It also features an interactive visualization, allowing users to explore their own questions using the database.&lt;/p&gt;
&lt;p&gt; For example, we can use this data to plot how GPU performance has improved in two distinct ways: the raw number of operations per second has increased around 20% per year, but innovations such as tensor cores and reduced precision number formats have provided further improvements.&lt;/p&gt;
&lt;p&gt; There's a chart here. The chart title reads: en-US-AvaMultilingualNeural__ Computational performance improves 15x when switching from FP32 to INT8 &lt;/p&gt;
&lt;p&gt; We also track properties such as hardware prices or energy consumption, allowing us to chart how leading AI chips have [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          October 23rd, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/introducing-epoch-ai-machine-learning-hardware-database?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/introducing-epoch-ai-machine-learning-hardware-database&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/introducing-ml-hardware-database-performance-trend.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/introducing-ml-hardware-database-performance-trend.png" alt="Computational performance improves 15x when switching from FP32 to INT8" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/introducing-ml-hardware-database-price-performance.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/introducing-ml-hardware-database-price-performance.png" alt="Price-performance of leading ML hardware" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 23 Oct 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">a12e1b55-be17-4f34-a5e1-e75ec698d3e5</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/a12e1b55-be17-4f34-a5e1-e75ec698d3e5.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Introducing%20Epoch%20AI%E2%80%99s%20machine%20learning%20hardware%20database%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fintroducing-epoch-ai-machine-learning-hardware-database&amp;created_at=2026-05-18T17%3A22%3A48.365841%2B00%3A00&amp;duration=138" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/introducing-epoch-ai-machine-learning-hardware-database</link>
      <itunes:duration>138</itunes:duration>
    </item>
    <item>
      <title>“Interviewing AI researchers on automation of AI R&amp;D” by David Owen</title>
      <description>&lt;p&gt; Subtitle: AI could accelerate AI R&amp;D, especially in coding and debugging tasks. We explore AI researchers’ differing predictions on automation, and their suggestions for designing AI R&amp;D evaluations.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The question of when and how AI might automate AI R&amp;amp;D is crucial for AI forecasting—if AI could automate the tasks involved in AI research, it could drastically accelerate AI progress. There is a long history of researchers considering this question in the abstract, and describing its importance for how AI will shape the future.1 However, AI researchers disagree substantially on timelines for automating AI R&amp;amp;D—for instance, researchers’ predictions for when all AI researcher tasks will be automated vary between years and centuries.2&lt;/p&gt;&lt;p&gt; In this work, we interviewed AI researchers with three goals:&lt;/p&gt;&lt;ol&gt; 
&lt;li&gt; Characterize AI R&amp;amp;D work tasks in detail, to better understand how automation might take place.&lt;/li&gt;
&lt;li&gt; Clarify the reasoning underpinning researchers’ predictions about automation, to see where and why they disagree.&lt;/li&gt;
&lt;li&gt; Collect their views on evaluations intended to measure how capable AI systems are at performing AI R&amp;amp;D, to better understand how society can track AI progress in this critical area.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt; To do this, we used qualitative interviews. We asked open-ended questions to [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:26) Introduction&lt;/p&gt;&lt;p&gt;(02:27) Summary of findings&lt;/p&gt;&lt;p&gt;(03:49) Key results&lt;/p&gt;&lt;p&gt;(03:51) Hypotheses are fast, implementation is time-consuming; both are crucial&lt;/p&gt;&lt;p&gt;(05:43) Predictions differ on automation pace, but agree that engineering will be the main driver of R&amp;amp;D automation&lt;/p&gt;&lt;p&gt;(09:10) Existing R&amp;amp;D evaluations may be a promising start&lt;/p&gt;&lt;p&gt;(11:24) Discussion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 2 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          August 27th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/interviewing-ai-researchers-on-automation-of-ai-rnd?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/interviewing-ai-researchers-on-automation-of-ai-rnd&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/interviewing-ai-researchers-on-automation-of-ai-rnd/workflow-figure.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/interviewing-ai-researchers-on-automation-of-ai-rnd/workflow-figure.png" alt="Figure 1: The AI R&amp;amp;D workflow based on participants’ descriptions, expanding on pre-existing descriptions in the evaluation literature. Participants offered examples for each subtask." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 27 Aug 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">329594e1-4d75-4454-a5e6-855b4f79d453</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/329594e1-4d75-4454-a5e6-855b4f79d453.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=David%2520Owen&amp;title=%22Interviewing%20AI%20researchers%20on%20automation%20of%20AI%20R%26D%22%20by%20David%20Owen&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Finterviewing-ai-researchers-on-automation-of-ai-rnd&amp;created_at=2026-05-18T17%3A22%3A49.39528%2B00%3A00&amp;duration=813" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/interviewing-ai-researchers-on-automation-of-ai-rnd</link>
      <itunes:duration>813</itunes:duration>
    </item>
    <item>
      <title>“Can AI scaling continue through 2030?” by Jaime Sevilla, Tamay Besiroglu, Ben Cottier, Josh You, Edu Roldán, Pablo Villalobos, Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: We investigate the scalability of AI training runs. We identify electric power, chip manufacturing, data and latency as constraints. We conclude that 2e29 FLOP training runs will likely be feasible by 2030.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; In recent years, the capabilities of AI models have significantly improved. Our research suggests that this growth in computational resources accounts for a significant portion of AI performance improvements.1 The consistent and predictable improvements from scaling have led AI labs to aggressively expand the scale of training, with training compute expanding at a rate of approximately 4x per year.&lt;/p&gt;&lt;p&gt; To put this 4x annual growth in AI training compute into perspective, it outpaces even some of the fastest technological expansions in recent history. It surpasses the peak growth rates of mobile phone adoption (2x per year, 1980-1987), solar energy capacity installation (1.5x per year, 2001-2010), and human genome sequencing (3.3x per year, 2008-2015).&lt;/p&gt;&lt;p&gt; Here, we examine whether it is technically feasible for the current rapid pace of AI training scaling—approximately 4x per year—to continue through 2030. We investigate four key factors that might constrain scaling: power availability, chip manufacturing capacity, data scarcity, and the “latency wall”, a fundamental speed limit imposed by unavoidable delays [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:31) Introduction&lt;/p&gt;&lt;p&gt;(08:17) What constrains AI scaling this decade&lt;/p&gt;&lt;p&gt;(08:21) Power constraints&lt;/p&gt;&lt;p&gt;(10:37) The current trend of AI power demand&lt;/p&gt;&lt;p&gt;(15:35) Power constraints for geographically localized training runs&lt;/p&gt;&lt;p&gt;(20:37) Power constraints for geographically distributed training&lt;/p&gt;&lt;p&gt;(28:36) Feasibility of geographically distributed training&lt;/p&gt;&lt;p&gt;(34:01) Modeling energy bottlenecks&lt;/p&gt;&lt;p&gt;(35:02) Chip manufacturing capacity&lt;/p&gt;&lt;p&gt;(36:48) Current production and projections&lt;/p&gt;&lt;p&gt;(42:27) Modeling GPU production and compute availability&lt;/p&gt;&lt;p&gt;(47:45) Data scarcity&lt;/p&gt;&lt;p&gt;(50:00) Multimodality&lt;/p&gt;&lt;p&gt;(55:29) Synthetic data&lt;/p&gt;&lt;p&gt;(01:00:36) Latency wall&lt;/p&gt;&lt;p&gt;(01:03:19) Latency wall given intranode latencies&lt;/p&gt;&lt;p&gt;(01:05:40) Latency wall given latencies between nodes&lt;/p&gt;&lt;p&gt;(01:07:50) How can these latencies be reduced?&lt;/p&gt;&lt;p&gt;(01:10:01) What constraint is the most limiting?&lt;/p&gt;&lt;p&gt;(01:12:00) Will labs attempt to scale to these new heights?&lt;/p&gt;&lt;p&gt;(01:15:54) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 107 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          August 20th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/can-ai-scaling-continue-through-2030?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/can-ai-scaling-continue-through-2030&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/summary-slideshow.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/summary-slideshow.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/can-ai-scaling-continue-through-2030/datacenter-it-capacity.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/can-ai-scaling-continue-through-2030/datacenter-it-capacity.png" alt="Figure 2: Reported and planned total installed IT capacity of North America data centers via SemiAnalysis’ data center industry model. Important note: to find total capacity, we must multiply these figures by PUE, which ranges from 1.2x for AI datacenters to 1.5x for other datacenters." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/power-consumption-bottleneck.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/power-consumption-bottleneck.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/chip-production-bottleneck.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/chip-production-bottleneck.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/data-bottleneck.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/data-bottleneck.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/latency-bottleneck.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/latency-bottleneck.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/bottlenecks-summary.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/bottlenecks-summary.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 20 Aug 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">1cd41097-85b0-4bea-b03b-787da7f2af9d</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/1cd41097-85b0-4bea-b03b-787da7f2af9d.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla%252C%2520Tamay%2520Besiroglu%252C%2520Ben%2520Cottier%252C%2520Josh%2520You%252C%2520Edu%2520Rold%25C3%25A1n%252C%2520Pablo%2520Villalobos%252C%2520Ege%2520Erdil&amp;title=%22Can%20AI%20scaling%20continue%20through%202030%3F%22%20by%20Jaime%20Sevilla%2C%20Tamay%20Besiroglu%2C%20Ben%20Cottier%2C%20Josh%20You%2C%20Edu%20Rold%C3%A1n%2C%20Pablo%20Villalobos%2C%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fcan-ai-scaling-continue-through-2030&amp;created_at=2026-05-18T15%3A39%3A24.324498%2B00%3A00&amp;duration=4776" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/can-ai-scaling-continue-through-2030</link>
      <itunes:duration>4776</itunes:duration>
    </item>
    <item>
      <title>“Announcing Epoch AI’s data hub” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: We are launching a hub for data and visualizations, to make our databases more accessible for users and researchers. It currently features our data on notable and large-scale AI models.&lt;/p&gt;  &lt;p&gt; Our data on the trajectory of AI has been valuable for policymakers, journalists, and other stakeholders. Our research, for example on training compute or model development costs, relies on our data. Now, we are creating a new Data on AI page as a central hub for all of this data. This includes key insights, interactive visualizations, and detailed documentation to inform users on how the data were collected.&lt;/p&gt;
&lt;p&gt; Currently, the page hosts two datasets. Our collection of notable AI models tracks key factors driving machine learning progress in over 800 historically significant AI models. The database contains information on development, training details, and more. Among other key information, we track training compute, dataset sizes, parameter counts, and training hardware. A new interactive visualization tool allows users to explore these details, examining trends in time and breakdowns by machine learning domain.&lt;/p&gt;
&lt;p&gt; Figure 1: Visualization of notable AI models and the overall trend of training compute overtime.&lt;/p&gt;
&lt;p&gt; The second dataset, devoted to large-scale AI models, features [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 19th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/announcing-epoch-ai-data-hub?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/announcing-epoch-ai-data-hub&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/announcing-epoch-ai-data-hub/training-compute-visualization.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/announcing-epoch-ai-data-hub/training-compute-visualization.png" alt="Figure 1: Visualization of notable AI models and the overall trend of training compute overtime." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/announcing-epoch-large-scale-models-by-domain-and-date.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/announcing-epoch-large-scale-models-by-domain-and-date.png" alt="Large-scale models by domain and publication date" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 19 Jun 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">96e3c10a-44fa-45d2-850f-62b44afa7d9c</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/96e3c10a-44fa-45d2-850f-62b44afa7d9c.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Announcing%20Epoch%20AI%E2%80%99s%20data%20hub%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fannouncing-epoch-ai-data-hub&amp;created_at=2026-05-18T17%3A22%3A50.440856%2B00%3A00&amp;duration=152" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/announcing-epoch-ai-data-hub</link>
      <itunes:duration>152</itunes:duration>
    </item>
    <item>
      <title>“Will we run out of data? Limits of LLM scaling based on human-generated data” by Pablo Villalobos, Anson Ho, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Marius Hobbhahn</title>
      <description>&lt;p&gt; Subtitle: We estimate the effective stock of quality and repetition adjusted human-generated public text for AI training at around 300 trillion tokens. If trends continue, language models will fully utilize this stock between 2026 and 2032, or even earlier if intensely overtrained.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Scaling has been a key factor driving progress in AI. Models are growing in parameters and being trained on increasingly enormous datasets, leading to exponential growth in training compute, and dramatic increases in performance. For example, five years and four orders of magnitude of compute separate the barely coherent GPT-2 with the powerful GPT-4.&lt;/p&gt;&lt;p&gt; So far, AI developers have not faced major limits to scaling beyond simply procuring AI chips, which are scarce but rapidly growing in supply. If chips are the only bottleneck, then AI systems are likely to continue growing exponentially in compute and expanding the frontier of capabilities. As such, a key question in forecasting AI progress is whether inputs other than raw compute could become binding constraints.&lt;/p&gt;&lt;p&gt; In particular, scaling requires growing training datasets. The most powerful AI systems to date are language models that are primarily trained on trillions of words of human-generated text from the internet. However, there is [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:36) Introduction&lt;/p&gt;&lt;p&gt;(02:03) Results&lt;/p&gt;&lt;p&gt;(04:26) Comparison with previous estimates&lt;/p&gt;&lt;p&gt;(06:17) Discussion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 6 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 6th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/figure1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/figure1.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/figure2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/figure2.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/figure3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/figure3.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/figure4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/figure4.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 06 Jun 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">47fd4f99-4ccf-476b-967f-4264c3b532ee</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/47fd4f99-4ccf-476b-967f-4264c3b532ee.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Pablo%2520Villalobos%252C%2520Anson%2520Ho%252C%2520Jaime%2520Sevilla%252C%2520Tamay%2520Besiroglu%252C%2520Lennart%2520Heim%252C%2520Marius%2520Hobbhahn&amp;title=%22Will%20we%20run%20out%20of%20data%3F%20Limits%20of%20LLM%20scaling%20based%20on%20human-generated%20data%22%20by%20Pablo%20Villalobos%2C%20Anson%20Ho%2C%20Jaime%20Sevilla%2C%20Tamay%20Besiroglu%2C%20Lennart%20Heim%2C%20Marius%20Hobbhahn&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fwill-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data&amp;created_at=2026-05-18T17%3A22%3A51.576068%2B00%3A00&amp;duration=523" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data</link>
      <itunes:duration>523</itunes:duration>
    </item>
    <item>
      <title>“How much does it cost to train frontier AI models?” by Ben Cottier, Robi Rahman, Loredana Fattorini, Nestor Maslej, David Owen</title>
      <description>&lt;p&gt; Subtitle: The cost of training frontier AI models has grown by a factor of 2 to 3x per year for the past eight years, suggesting that the largest models will cost over a billion dollars by 2027.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Summary of findings&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The costs of training frontier AI models have grown dramatically in recent years, but there is limited public data on the magnitude and growth of these expenses. In our new paper, we develop a detailed cost model to address this gap, estimating training costs for up to 45 frontier models using three different approaches that account for hardware and energy expenditures, cloud rental costs, and R&amp;amp;D staff expenses, respectively. This work builds upon the cost estimates featured in the 2024 AI Index.&lt;/p&gt;&lt;p&gt; Our analysis reveals that the amortized hardware and energy cost for the final training run of frontier models has grown rapidly, at a rate of 2.4x per year since 2016 (95% CI: 2.0x to 3.1x). We also estimated a cost breakdown to develop key frontier models such as GPT-4 and Gemini Ultra, including R&amp;amp;D staff costs and compute for experiments. We found that most of the development cost is for the hardware at 47–67%, but R&amp;amp;D [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:27) Summary of findings&lt;/p&gt;&lt;p&gt;(01:59) Key Results&lt;/p&gt;&lt;p&gt;(05:19) Implications&lt;/p&gt;&lt;p&gt;(06:17) Webinar&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 3rd, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/how-much-does-it-cost-to-train-frontier-ai-models?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/how-much-does-it-cost-to-train-frontier-ai-models&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/cost_regression_hardware-capex-energy.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/cost_regression_hardware-capex-energy.png" alt="Amortized hardware and energy cost to train frontier AI models over time" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/cost_regression_cloud.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/cost_regression_cloud.png" alt="Cloud compute cost to train frontier AI models over time" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/cost_proportions_stacked_equity.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/cost_proportions_stacked_equity.png" alt="Breakdown of costs for training and experiments" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 03 Jun 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">2a255964-feb5-4946-a7eb-aa939b1ef14c</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/2a255964-feb5-4946-a7eb-aa939b1ef14c.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ben%2520Cottier%252C%2520Robi%2520Rahman%252C%2520Loredana%2520Fattorini%252C%2520Nestor%2520Maslej%252C%2520David%2520Owen&amp;title=%22How%20much%20does%20it%20cost%20to%20train%20frontier%20AI%20models%3F%22%20by%20Ben%20Cottier%2C%20Robi%20Rahman%2C%20Loredana%20Fattorini%2C%20Nestor%20Maslej%2C%20David%20Owen&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fhow-much-does-it-cost-to-train-frontier-ai-models&amp;created_at=2026-05-18T17%3A22%3A52.499303%2B00%3A00&amp;duration=404" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/how-much-does-it-cost-to-train-frontier-ai-models</link>
      <itunes:duration>404</itunes:duration>
    </item>
    <item>
      <title>“Training compute of frontier AI models grows by 4-5x per year” by Jaime Sevilla, Edu Roldán</title>
      <description>&lt;p&gt; Subtitle: Our expanded AI model database shows that the compute used to train recent models grew 4 to 5 times yearly from 2010 to May 2024. We find similar growth in frontier models, recent large language models, and models from leading companies.&lt;/p&gt; 

&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Over the last ten years, we have witnessed a dramatic increase in the computational resources dedicated to training state-of-the-art AI models. This strategy has been incredibly productive, translating into large gains in generality and performance. For example, we estimate that about two-thirds of the improvements in performance in language models in the last decade have been due to increases in model scale.&lt;/p&gt;&lt;p&gt; Given the central role of scaling, it is important to track how the computational resources (‘compute’) used to train models have grown in recent years. In this short article, we provide an updated view of the trends so far, having collected three times more data since our last analysis.&lt;/p&gt;&lt;p&gt; We tentatively conclude that compute growth in recent years is currently best described as increasing by a factor of 4 to 5 times per year. We find consistent growth between recent notable models, the running top 10 of models by compute, recent large language [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:31) Introduction&lt;/p&gt;&lt;p&gt;(03:06) The overall trend of training compute growth has held&lt;/p&gt;&lt;p&gt;(05:17) Frontier model growth has slowed, and now aligns with overall trends&lt;/p&gt;&lt;p&gt;(09:31) Language models caught up to the frontier around 2020&lt;/p&gt;&lt;p&gt;(11:43) Leading companies are growing their top models by 5x/year&lt;/p&gt;&lt;p&gt;(13:20) The scale of the largest models today can be retrodicted using a 4-5x/year growth rate&lt;/p&gt;&lt;p&gt;(15:49) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 16 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 28th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/training-compute-of-frontier-ai-models-grows-by-4-5x-per-year?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/training-compute-of-frontier-ai-models-grows-by-4-5x-per-year&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/training-compute-of-frontier-ai-models-grows-by-4-5x-per-year/summary_figure.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/training-compute-of-frontier-ai-models-grows-by-4-5x-per-year/summary_figure.svg" alt="Figure 1: Summary of the compute growth trends we found for overall notable models (top left), frontier models (top right), top language models (bottom left) and top models within leading companies (bottom right). All point to a recent trend of 4-5x/year growth." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/compute-trend-notable.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/compute-trend-notable.png" alt="Training compute of notable models" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/compute-trend-frontier.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/compute-trend-frontier.png" alt="Training compute of frontier models" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/compute-trend-frontier-llm.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/compute-trend-frontier-llm.png" alt="Training compute of frontier LLMs" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/compute-trend-companies.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/compute-trend-companies.png" alt="Training compute of frontier models from leading companies" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/largest-models-retrodiction.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/largest-models-retrodiction.png" alt="Retrodicting the size of the largest models today from GPT-3" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 28 May 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">0566e957-3019-492f-8bab-f67ae434293b</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/0566e957-3019-492f-8bab-f67ae434293b.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla%252C%2520Edu%2520Rold%25C3%25A1n&amp;title=%22Training%20compute%20of%20frontier%20AI%20models%20grows%20by%204-5x%20per%20year%22%20by%20Jaime%20Sevilla%2C%20Edu%20Rold%C3%A1n&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Ftraining-compute-of-frontier-ai-models-grows-by-4-5x-per-year&amp;created_at=2026-05-18T18%3A06%3A08.239144%2B00%3A00&amp;duration=1167" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/training-compute-of-frontier-ai-models-grows-by-4-5x-per-year</link>
      <itunes:duration>1167</itunes:duration>
    </item>
    <item>
      <title>“Do the returns to software R&amp;D point towards a singularity?” by Tamay Besiroglu, Ege Erdil, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: The returns to R&amp;D are crucial in determining the dynamics of growth and potentially the pace of AI development. Our new paper offers new empirical techniques and estimates for this crucial parameter.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Returns to R&amp;amp;D and hyperbolic technological progress&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Improvements in AI have predominantly been driven by two factors. First, advancements in hardware performance and substantial investments in larger clusters have increased the computing power for training AI models. This has resulted in improved performance given the abundance of data that we can use to train larger AI systems. Second, progress on the “software” side (training techniques, architectures, algorithm implementations, etc.) has resulted in the compute being used more efficiently (see our work on algorithmic progress). This means that AI model performance surpasses what we’d expect from merely increasing computing resources. At Epoch, we have extensively researched each of these trends.&lt;/p&gt;&lt;p&gt; The combination of the scaling of compute and improvements in training techniques has effectively increased the total budget of “effective compute,” which refers to the computational resources available for AI development when accounting for improvements on the “software” side. This increase in effective compute has been a key driver in the development of capable AI systems.&lt;/p&gt;&lt;p&gt; [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:28) Returns to R&amp;amp;D and hyperbolic technological progress&lt;/p&gt;&lt;p&gt;(04:17) Hyperbolic growth in a model of idea production&lt;/p&gt;&lt;p&gt;(07:01) Do the returns to software R&amp;amp;D point towards a software singularity?&lt;/p&gt;&lt;p&gt;(07:36) Computer chess&lt;/p&gt;&lt;p&gt;(09:26) Other domains of software&lt;/p&gt;&lt;p&gt;(12:05) Do our estimates point to a singularity?&lt;/p&gt;&lt;p&gt;(13:39) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 3 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 17th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/do-the-returns-to-software-rnd-point-towards-a-singularity?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/do-the-returns-to-software-rnd-point-towards-a-singularity&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/output_plot_stockfish_minor.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/output_plot_stockfish_minor.png" alt="Algorithmic progress on Stockfish" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/returns_violin_plot.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/returns_violin_plot.png" alt="Returns to software R&amp;amp;D in different domains" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/stockfish_tests_monthly_ma.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/stockfish_tests_monthly_ma.png" alt="Fishtest tests per day, monthly moving average" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 17 May 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">f641a209-652e-4ad3-b071-f3cdb029c3c9</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/f641a209-652e-4ad3-b071-f3cdb029c3c9.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Tamay%2520Besiroglu%252C%2520Ege%2520Erdil%252C%2520Anson%2520Ho&amp;title=%22Do%20the%20returns%20to%20software%20R%26D%20point%20towards%20a%20singularity%3F%22%20by%20Tamay%20Besiroglu%2C%20Ege%20Erdil%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fdo-the-returns-to-software-rnd-point-towards-a-singularity&amp;created_at=2026-05-18T18%3A06%3A11.888561%2B00%3A00&amp;duration=935" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/do-the-returns-to-software-rnd-point-towards-a-singularity</link>
      <itunes:duration>935</itunes:duration>
    </item>
    <item>
      <title>“Chinchilla scaling: A replication attempt” by Tamay Besiroglu, Ege Erdil, Matthew Barnett, Josh You</title>
      <description>&lt;p&gt; Subtitle: We replicate Hoffmann et al.'s estimation of a parametric scaling law and find issues with their estimates. Our estimates fit the data better and align with Hoffmann's other approaches.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Summary&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Hoffmann et al. (2022) investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. The authors train over 400 language models and find that for compute-optimal training, the model size and number of training tokens should scale at equal rates: for every doubling of model size, the number of training tokens should also be doubled. They train a 70B model called Chinchilla, compute-optimally according to their results, on 1.4 trillion tokens for a ratio of 20 tokens per parameter. For this reason, the scaling laws they propose are often called “Chinchilla scaling laws”.&lt;/p&gt;&lt;p&gt; There's a chart here. The chart title reads: en-US-AvaMultilingualNeural__ Optimal ratio of training tokens to model parameters &lt;/p&gt;&lt;p&gt; In their paper, Hoffmann et al. use three different methods to derive the optimal scaling policy. In one of their approaches, they estimate the following parametric scaling law:&lt;/p&gt;&lt;p&gt; &lt;prosody pitch="0%"&gt; L &lt;/prosody&gt; equals &lt;prosody pitch="0%"&gt; E &lt;/prosody&gt; plus &lt;break time="250ms"&gt; the fraction with numerator &lt;prosody pitch="0%"&gt; [...]&lt;/prosody&gt;&lt;/break&gt;&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:27) Summary&lt;/p&gt;&lt;p&gt;(03:59) Implications of this work&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 17th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/chinchilla-scaling-a-replication-attempt?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/chinchilla-scaling-a-replication-attempt&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/optimal-tokens-per-parameter.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/optimal-tokens-per-parameter.png" alt="Optimal ratio of training tokens to model parameters" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 17 Apr 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">a7bb3122-9455-4d2f-a017-658c5910e2c3</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/a7bb3122-9455-4d2f-a017-658c5910e2c3.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Tamay%2520Besiroglu%252C%2520Ege%2520Erdil%252C%2520Matthew%2520Barnett%252C%2520Josh%2520You&amp;title=%22Chinchilla%20scaling%3A%20A%20replication%20attempt%22%20by%20Tamay%20Besiroglu%2C%20Ege%20Erdil%2C%20Matthew%20Barnett%2C%20Josh%20You&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fchinchilla-scaling-a-replication-attempt&amp;created_at=2026-05-18T18%3A06%3A12.812155%2B00%3A00&amp;duration=427" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/chinchilla-scaling-a-replication-attempt</link>
      <itunes:duration>427</itunes:duration>
    </item>
    <item>
      <title>“Tracking large-scale AI models” by Robi Rahman, David Owen, Josh You</title>
      <description>&lt;p&gt; Subtitle: We present a dataset of 81 large-scale models, from AlphaGo to Gemini, developed across 18 countries, at the leading edge of scale and capabilities.&lt;/p&gt; 
Update&lt;p&gt; Explore our Large-scale AI models dataset through interactive visualizations and documentation on our dedicated data page.&lt;/p&gt;
&lt;p&gt; We present a new dataset tracking AI models with training compute over 10 to the 23 floating point operations (FLOP). This corresponds to training costs of hundreds of thousands of dollars or more.1 We have identified 81 such models, and another 86 models that may exceed the 10 to the 23 FLOP threshold but don’t have confirmed training details.&lt;/p&gt;
&lt;p&gt; Our previous work has examined the crucial role of training compute in the development of modern AI, and how it drives model capabilities. Existing AI regulation explicitly acknowledges the importance of training compute: both the recent US Executive Order on AI development and the EU AI Act establish reporting requirements based on compute thresholds. Motivated by these developments, we plan to track models with training compute above 10 to the 23 FLOP by updating this dataset on an ongoing basis. We call models above this threshold “large-scale models”.&lt;/p&gt;
&lt;p&gt; The dataset offers insight into several recent [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:05) Results&lt;/p&gt;&lt;p&gt;(03:08) There are few models at the leading edge, but the frontier advances rapidly&lt;/p&gt;&lt;p&gt;(04:38) Most large-scale AI models are language models&lt;/p&gt;&lt;p&gt;(08:01) Most large-scale models are developed by US companies&lt;/p&gt;&lt;p&gt;(09:33) Downloadable models are common, but have lower training compute&lt;/p&gt;&lt;p&gt;(11:15) Methods for finding large-scale models&lt;/p&gt;&lt;p&gt;(11:50) Benchmarks and Repositories&lt;/p&gt;&lt;p&gt;(14:12) Non-English news and websites&lt;/p&gt;&lt;p&gt;(16:05) Other sources&lt;/p&gt;&lt;p&gt;(17:07) Unconfirmed large-scale models&lt;/p&gt;&lt;p&gt;(18:25) Outcomes and limitations&lt;/p&gt;&lt;p&gt;(20:03) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 9 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 5th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/tracking-large-scale-ai-models?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/tracking-large-scale-ai-models&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/large-scale-models.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/large-scale-models.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/large-scale-models-by-year.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/large-scale-models-by-year.png" alt="A bar chart showing the number of machine learning models with training compute of at least 10^23 FLOP published in each year trending up from 2 in 2017 to over 40 in 2023." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/large-scale-models-by-domain.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/large-scale-models-by-domain.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/large-scale-post-large-scale-models-by-domain-and-date.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/large-scale-post-large-scale-models-by-domain-and-date.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/large-scale-post-large-scale-models-by-country.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/large-scale-post-large-scale-models-by-country.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/large-scale-models-by-organization.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/large-scale-models-by-organization.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/large-scale-model-count-by-compute.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/large-scale-model-count-by-compute.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 05 Apr 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">7bf19bb4-e7ff-41c4-a71f-a39f20ed5959</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/7bf19bb4-e7ff-41c4-a71f-a39f20ed5959.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Robi%2520Rahman%252C%2520David%2520Owen%252C%2520Josh%2520You&amp;title=%22Tracking%20large-scale%20AI%20models%22%20by%20Robi%20Rahman%2C%20David%20Owen%2C%20Josh%20You&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Ftracking-large-scale-ai-models&amp;created_at=2026-05-18T18%3A06%3A13.686319%2B00%3A00&amp;duration=1347" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/tracking-large-scale-ai-models</link>
      <itunes:duration>1347</itunes:duration>
    </item>
    <item>
      <title>“Optimally allocating compute between inference and training” by Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: Our analysis indicates that AI labs should spend comparable resources on training and running inference, assuming they can flexibly balance compute between these tasks to maintain model performance.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Sam Altman recently claimed that OpenAI currently generates around 100 billion tokens per day, or about 36 trillion tokens per year. Given that modern language models are trained on the order of 10 trillion tokens and tokens seen during training are around three times more expensive1 compared to tokens seen or generated during inference, a naive analysis2 suggests OpenAI's annual inference costs are on the same order as their annual model training costs.&lt;/p&gt;&lt;p&gt; This seems like an odd coincidence at first: why should one of these not completely dominate the other? However, there's a good reason to suppose that these two quantities should be on a similar order of magnitude: the training-inference compute tradeoff. In this post, I will briefly explain what this tradeoff is about and why the current empirical evidence about it implies we should see rough parity in how much compute is spent on training versus inference.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; The tradeoff&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; In general, it's possible to get a model to perform better by one of two [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:25) Introduction&lt;/p&gt;&lt;p&gt;(01:24) The tradeoff&lt;/p&gt;&lt;p&gt;(04:03) Why we expect investment parity&lt;/p&gt;&lt;p&gt;(08:10) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 3 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          March 29th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/optimally-allocating-compute-between-inference-and-training?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/optimally-allocating-compute-between-inference-and-training&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/training-inference.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/training-inference.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 29 Mar 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">76a21c5a-1a26-48ec-9f8c-8c743dfa8890</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/76a21c5a-1a26-48ec-9f8c-8c743dfa8890.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil&amp;title=%22Optimally%20allocating%20compute%20between%20inference%20and%20training%22%20by%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Foptimally-allocating-compute-between-inference-and-training&amp;created_at=2026-05-18T18%3A06%3A14.844793%2B00%3A00&amp;duration=644" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/optimally-allocating-compute-between-inference-and-training</link>
      <itunes:duration>644</itunes:duration>
    </item>
    <item>
      <title>“Algorithmic progress in language models” by Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla</title>
      <description>&lt;p&gt; Subtitle: Progress in pretrained language model performance surpasses what we’d expect from merely increasing computing resources, occurring at a pace equivalent to doubling computational power every 5 to 14 months.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Overview&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; In 2012, the best language models were recurrent networks that struggled to form coherent sentences. Fast forward to today and language models like GPT-4 assist hundreds of millions of active users and are able to perform tasks across a wide range of domains.&lt;/p&gt;&lt;p&gt; Clearly, progress has been rapid—but what made this possible? One reason is that the compute used to train language models has been scaled up drastically, resulting in better performance. But that's only part of the puzzle. AI practitioners have developed better model architectures, optimizers, and other algorithmic innovations that reduce the compute required to reach certain performance levels—what we refer to as algorithmic progress.&lt;/p&gt;&lt;p&gt; Figure 1. Performance of 231 language models (measured in log perplexity) used in our work against their date and scale (measured in FLOP). Models are both becoming larger and more proficient. It's unclear to which degree the better results are driven by improvements in scale or in efficiency.&lt;/p&gt;&lt;p&gt; In our new paper, we conduct the most comprehensive analysis of algorithmic [...]&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          March 12th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/algorithmic-progress-in-language-models?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/algorithmic-progress-in-language-models&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/algorithmic-progress-in-language-models/performance-and-scale-over-time.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/algorithmic-progress-in-language-models/performance-and-scale-over-time.png" alt="Figure 1. Performance of 231 language models (measured in log perplexity) used in our work against their date and scale (measured in FLOP). Models are both becoming larger and more proficient. It’s unclear to which degree the better results are driven by improvements in scale or in efficiency." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/algorithmic-progress-in-language-models/estimates-of-algorithmic-progress-by-domain.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/algorithmic-progress-in-language-models/estimates-of-algorithmic-progress-by-domain.png" alt="Figure 2. Estimates of the rate of algorithmic progress across different domains. This is measured in terms of the “effective compute” – i.e. the equivalent increase in scale that would be needed to match a given model performance absent innovation." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/algorithmic-progress-in-language-models/relative-contributions-to-compute.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/algorithmic-progress-in-language-models/relative-contributions-to-compute.png" alt="Figure 3: Estimates of the contributions of scaling and algorithmic innovation in terms of the raw compute that would be naively needed to achieve a state-of-the-art level of performance. The contribution of algorithmic progress is roughly half as much as that of compute scaling." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/algorithmic-progress-in-language-models/estimates-of-algorithmic-progress-by-model.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/algorithmic-progress-in-language-models/estimates-of-algorithmic-progress-by-model.png" alt="Figure 4. We estimate the rate of algorithmic progress according to dozens of models. We find a wide range of values compatible with the different models we tested." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 12 Mar 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">482ec998-108b-4111-b41b-176d30e5feb9</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/482ec998-108b-4111-b41b-176d30e5feb9.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho%252C%2520Tamay%2520Besiroglu%252C%2520Ege%2520Erdil%252C%2520David%2520Owen%252C%2520Robi%2520Rahman%252C%2520Zifan%2520Carl%2520Guo%252C%2520David%2520Atkinson%252C%2520Neil%2520Thompson%252C%2520Jaime%2520Sevilla&amp;title=%22Algorithmic%20progress%20in%20language%20models%22%20by%20Anson%20Ho%2C%20Tamay%20Besiroglu%2C%20Ege%20Erdil%2C%20David%20Owen%2C%20Robi%20Rahman%2C%20Zifan%20Carl%20Guo%2C%20David%20Atkinson%2C%20Neil%20Thompson%2C%20Jaime%20Sevilla&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Falgorithmic-progress-in-language-models&amp;created_at=2026-05-18T18%3A06%3A15.75663%2B00%3A00&amp;duration=398" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/algorithmic-progress-in-language-models</link>
      <itunes:duration>398</itunes:duration>
    </item>
    <item>
      <title>“Epoch AI 2023 impact report” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: In 2023, Epoch published almost 20 reports on developments in AI, added hundreds of new models to our database, had a direct impact on government policies, raised over $7 million in funds, and more.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; About Epoch AI&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Epoch AI is a multidisciplinary research institute investigating the trajectory of Artificial Intelligence (AI). We produce papers and reports on the drivers, trajectory and consequences of the development and deployment of AI.&lt;/p&gt;&lt;p&gt; In the past, we have produced cutting-edge AI forecasting work such as Compute Trends Across Three Eras of Machine Learning (Sevilla et al., 2022), Revisiting Algorithmic Progress (Besiroglu and Erdil, 2022) and Will We Run Out of ML Data? Evidence From Projecting Dataset Size Trends (Villalobos et al., 2022). We also maintain a database of notable ML models, widely regarded as the most comprehensive public database of its kind in existence.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Testimonials&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Epoch does the most thoughtful and best-researched survey work in the industry. Several times I have thought I found errors in their results, only to discover when going through their notebooks that they had it right. They are my go-to resource for field-wide trends.&lt;/p&gt;&lt;p&gt; Nat Friedman - Former CEO of GitHub&lt;/p&gt;&lt;p&gt; I feel like I [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:27) About Epoch AI&lt;/p&gt;&lt;p&gt;(01:17) Testimonials&lt;/p&gt;&lt;p&gt;(01:51) This year and future goals&lt;/p&gt;&lt;p&gt;(02:59) Summary of key findings&lt;/p&gt;&lt;p&gt;(03:02) ML trends&lt;/p&gt;&lt;p&gt;(04:54) Algorithmic improvements&lt;/p&gt;&lt;p&gt;(05:55) Economics of AI&lt;/p&gt;&lt;p&gt;(06:44) Other topics&lt;/p&gt;&lt;p&gt;(07:39) Overview of research projects&lt;/p&gt;&lt;p&gt;(08:04) Overview of non-research activities&lt;/p&gt;&lt;p&gt;(08:15) Fundraising&lt;/p&gt;&lt;p&gt;(08:53) Hiring&lt;/p&gt;&lt;p&gt;(09:10) Website&lt;/p&gt;&lt;p&gt;(09:27) Mentorship Programs&lt;/p&gt;&lt;p&gt;(09:47) Overview of impact&lt;/p&gt;&lt;p&gt;(09:50) Media and scientific communication&lt;/p&gt;&lt;p&gt;(10:17) Conferences and workshops&lt;/p&gt;&lt;p&gt;(10:52) Mentions in industry, research and scientific publications&lt;/p&gt;&lt;p&gt;(11:36) Engagement with policymakers&lt;/p&gt;&lt;p&gt;(12:17) Next year's goals and funding needs&lt;/p&gt;&lt;p&gt;(12:21) Research goals&lt;/p&gt;&lt;p&gt;(12:24) Model of AI and growth&lt;/p&gt;&lt;p&gt;(13:11) Drivers in AI&lt;/p&gt;&lt;p&gt;(13:30) Data on machine learning&lt;/p&gt;&lt;p&gt;(13:53) Organizational goals&lt;/p&gt;&lt;p&gt;(13:57) Stakeholder engagement&lt;/p&gt;&lt;p&gt;(14:19) Hiring&lt;/p&gt;&lt;p&gt;(14:43) Project management&lt;/p&gt;&lt;p&gt;(15:11) Diversifying our funding&lt;/p&gt;&lt;p&gt;(15:28) Website and brand&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 19th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/epoch-impact-report-2023?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/epoch-impact-report-2023&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/peak-computational-performance-different-precisions.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/peak-computational-performance-different-precisions.png" alt="Peak computational performance of ML hardware for different precisions" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/key-innovations-occurrence.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/key-innovations-occurrence.png" alt="Adoption frequency of key innovations in the ten largest language models" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2024/epoch-impact-report-2023/pte_survey.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2024/epoch-impact-report-2023/pte_survey.svg" alt="Two scatter plots comparing compute cost versus content equivalent gain and additional runtime cost for various AI methods." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 19 Jan 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">42e855ef-5765-415c-bbf0-5c3622d66cd3</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/42e855ef-5765-415c-bbf0-5c3622d66cd3.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Epoch%20AI%202023%20impact%20report%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fepoch-impact-report-2023&amp;created_at=2026-05-18T18%3A06%3A16.93594%2B00%3A00&amp;duration=963" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/epoch-impact-report-2023</link>
      <itunes:duration>963</itunes:duration>
    </item>
    <item>
      <title>“Biological sequence models in the context of the AI directives” by Nicole Maug, Aidan O’Gara, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: The expanded Epoch database now includes biological sequence models, revealing potential regulatory gaps in the White House's Executive Order on AI and the growth of the compute used in their training.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Executive summary&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The White House's recent Executive Order on AI addresses risks from AI applied to biology. Developers training machine learning models on primarily biological sequence data using more than 1e23 operations are required by the Executive Order to report to the federal government about their cybersecurity, red-teaming, and other risk management efforts made during the development of these models.&lt;/p&gt;&lt;p&gt; This report provides an overview of our newly curated dataset focused on biological sequence models. Our dataset contains comprehensive information on nearly a hundred biological sequence models, including the specific datasets used for their training, their intended tasks, and the availability of the model weights. Additionally, our analysis covers 30 biological sequence datasets, collectively containing billions of protein sequences. Our focus in this analysis includes:&lt;/p&gt;&lt;p&gt; The training compute trends of biological sequence models&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; xTrimoPGLM-100B (Chen et al., 2023), a 100B-parameter protein language model, exceeds the Executive Order's reporting threshold of 1e23 operations by a factor of six. Over a dozen models are within a factor [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:28) Executive summary&lt;/p&gt;&lt;p&gt;(04:19) The AI Executive Order&lt;/p&gt;&lt;p&gt;(07:22) Models trained on biological sequence data&lt;/p&gt;&lt;p&gt;(12:56) Training compute&lt;/p&gt;&lt;p&gt;(15:39) Large language models trained on biological data&lt;/p&gt;&lt;p&gt;(17:57) Biological sequence data&lt;/p&gt;&lt;p&gt;(23:16) Trends in biological sequence data&lt;/p&gt;&lt;p&gt;(26:05) Sources of biological sequence data&lt;/p&gt;&lt;p&gt;(28:41) Discussion&lt;/p&gt;&lt;p&gt;(31:23) Acknowledgments&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 8 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 17th, 2024 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/biological-sequence-models-in-the-context-of-the-ai-directives?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/biological-sequence-models-in-the-context-of-the-ai-directives&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/bio-models-compute.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/bio-models-compute.png" alt="Compute used to train biological sequence models" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/bio-datasets.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/bio-datasets.png" alt="Number of entries in key biological sequence databases" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 17 Jan 2024 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">1d417053-b8f3-4ad3-948c-9146beaf20bd</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/1d417053-b8f3-4ad3-948c-9146beaf20bd.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Nicole%2520Maug%252C%2520Aidan%2520O'Gara%252C%2520Tamay%2520Besiroglu&amp;title=%22Biological%20sequence%20models%20in%20the%20context%20of%20the%20AI%20directives%22%20by%20Nicole%20Maug%2C%20Aidan%20O'Gara%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fbiological-sequence-models-in-the-context-of-the-ai-directives&amp;created_at=2026-05-18T18%3A06%3A17.859725%2B00%3A00&amp;duration=1939" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/biological-sequence-models-in-the-context-of-the-ai-directives</link>
      <itunes:duration>1939</itunes:duration>
    </item>
    <item>
      <title>“Limits to the energy efficiency of CMOS microprocessors” by Anson Ho, Ege Erdil, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: How far can the energy efficiency of CMOS microprocessors be pushed before we hit physical limits? Using a simple model, we find that there is room for a further 50 to 1000x improvement in energy efficiency.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Summary&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Compared to the ENIAC in 1946, modern microprocessors can perform roughly a quadrillion times more computations for every unit of dissipated energy. This illustrates one of the most important trends in the history of computing – Koomey's Law – under which the energy efficiency of computers has increased drastically over the last eight decades.&lt;/p&gt;&lt;p&gt; But how much longer can this progress persist before hitting physical limits? The answer to this question has important implications for forecasts of future progress in hardware: if we are close to the fundamental limits, the returns to hardware R&amp;amp;D within the existing paradigm of Complementary Metal-Oxide Semiconductor (CMOS) processors may rapidly diminish in the near future.&lt;/p&gt;&lt;p&gt; In our new paper, published in the IEEE International Conference on Rebooting Computing, we propose a simple model of energy efficiency to shed light on this question. This allows us to estimate an upper bound to the energy efficiency of CMOS microprocessors, measured in Floating Point Operations per Joule [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:30) Summary&lt;/p&gt;&lt;p&gt;(04:47) Implications&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 15th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/limits-to-the-energy-efficiency-of-cmos-microprocessors?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/limits-to-the-energy-efficiency-of-cmos-microprocessors&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Fri, 15 Dec 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">78cdf95c-f56a-466c-a968-03eb6ae1d0e0</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/78cdf95c-f56a-466c-a968-03eb6ae1d0e0.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho%252C%2520Ege%2520Erdil%252C%2520Tamay%2520Besiroglu&amp;title=%22Limits%20to%20the%20energy%20efficiency%20of%20CMOS%20microprocessors%22%20by%20Anson%20Ho%2C%20Ege%20Erdil%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Flimits-to-the-energy-efficiency-of-cmos-microprocessors&amp;created_at=2026-05-18T18%3A06%3A18.959064%2B00%3A00&amp;duration=424" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/limits-to-the-energy-efficiency-of-cmos-microprocessors</link>
      <itunes:duration>424</itunes:duration>
    </item>
    <item>
      <title>“AI capabilities can be significantly improved without expensive retraining” by Tom Davidson, Jean-Stanislas Denain, Pablo Villalobos, Guillem Bas</title>
      <description>&lt;p&gt; Subtitle: While scaling compute for training is key to improving LLM performance, some post-training enhancements can offer gains equivalent to training with 5 to 20x more compute at less than 1% the cost.&lt;/p&gt; 
&lt;p&gt; The massive computation used to train LLMs and similar foundation models has been one of the main drivers of AI progress in recent years, which has led to the recognition of the “Bitter Lesson”: that general methods that better leverage computational power are ultimately the most effective (Sutton, 2019). The cost of training frontier models has now become so high that only a handful of actors can afford it (Epoch AI, 2023).&lt;/p&gt;
&lt;p&gt; Our study explores methods of improving performance after training that don’t rely on access to vast computing resources. We divide these enhancements in five categories, presented in Table 1.&lt;/p&gt;
&lt;p&gt; You can read the full paper here. This article was a collaboration between Epoch AI, Open Philanthropy, UC Berkeley, and ORCG.&lt;/p&gt;

&lt;p&gt; Category&lt;/p&gt;&lt;p&gt; Description&lt;/p&gt;&lt;p&gt; Example&lt;/p&gt;&lt;p&gt; Tool use&lt;/p&gt;&lt;p&gt; Teaching an AI system to use new tools&lt;/p&gt;&lt;p&gt; WebGPT, Toolformer&lt;/p&gt;&lt;p&gt; Prompting&lt;/p&gt;&lt;p&gt; Changing the text-based input to the model to steer its behavior and reasoning.&lt;/p&gt;&lt;p&gt; Chain of thought&lt;/p&gt;&lt;p&gt; Scaffolding&lt;/p&gt;&lt;p&gt; Programs that structure the model's reasoning and [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:30) Key results&lt;/p&gt;&lt;p&gt;(03:24) Policy implications&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 12th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/ai-capabilities-can-be-significantly-improved-without-expensive-retraining?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/ai-capabilities-can-be-significantly-improved-without-expensive-retraining&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/ai-capabilities-can-be-significantly-improved-without-expensive-retraining/example_graph.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/ai-capabilities-can-be-significantly-improved-without-expensive-retraining/example_graph.svg" alt="Toy example where the CEG is 5x. The same performance improvement can be achieved either by applying a post-training enhancement (PTE) or by scaling pre-training compute by 5x." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/ai-capabilities-can-be-significantly-improved-without-expensive-retraining/pte_survey.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/ai-capabilities-can-be-significantly-improved-without-expensive-retraining/pte_survey.svg" alt="Summary of results: the improvement produced by the surveyed techniques, quantified using the Compute-Equivalent Gain. The x axes show the associated one-time (left) and inference (right) costs." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 12 Dec 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">934ea8d1-ac6a-4752-882f-a5918e6395a8</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/934ea8d1-ac6a-4752-882f-a5918e6395a8.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Tom%2520Davidson%252C%2520Jean-Stanislas%2520Denain%252C%2520Pablo%2520Villalobos%252C%2520Guillem%2520Bas&amp;title=%22AI%20capabilities%20can%20be%20significantly%20improved%20without%20expensive%20retraining%22%20by%20Tom%20Davidson%2C%20Jean-Stanislas%20Denain%2C%20Pablo%20Villalobos%2C%20Guillem%20Bas&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fai-capabilities-can-be-significantly-improved-without-expensive-retraining&amp;created_at=2026-05-18T18%3A06%3A19.862489%2B00%3A00&amp;duration=266" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/ai-capabilities-can-be-significantly-improved-without-expensive-retraining</link>
      <itunes:duration>266</itunes:duration>
    </item>
    <item>
      <title>“Who is leading in AI? An analysis of industry AI research” by Ben Cottier, Tamay Besiroglu, David Owen</title>
      <description>&lt;p&gt; Subtitle: Industry emerged as a driving force in AI, but which companies are steering the field? We compare leading AI companies on research impact, training runs, and contributions to algorithmic innovations.&lt;/p&gt; 
&lt;p&gt; The private sector's pivotal role in AI research and development is marked by its substantial resource investments and significant influence, evidenced by higher citation rates for industry-involved research articles and dominance in compute-intensive training runs. Recent analyses, such as the 2023 Stanford AI Index, highlight the varying research impacts of institutions across countries, with US tech giants leading in terms of research impact. Our study is informed by three datasets—OpenAlex for AI publications, the PCD database for AI training compute data, and a new dataset on key algorithmic innovations in large language models. We offer a comprehensive comparison of leading companies in publications, citations, unique authors, training runs, and algorithmic innovation adoption. This multi-faceted approach provides a nuanced understanding of industry's influence on AI development, contributing to policy discussions on key industry players.&lt;/p&gt;
&lt;p&gt; You can read the full paper here.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Key results&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Bibliometric output and impact. Our analysis shows that Google and Microsoft lead in total publications and citations over the past 13 years, reflecting [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:31) Key results&lt;/p&gt;&lt;p&gt;(04:47) Policy implications&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 27th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/who-is-leading-in-ai-an-analysis-of-industry-ai-research?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/who-is-leading-in-ai-an-analysis-of-industry-ai-research&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/publication-count.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/publication-count.png" alt="Publication count, 2010 to 2023" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/citation-count.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/citation-count.png" alt="Citation count in three-year window after publication, 2010 to 2023" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/citations-per-author.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/citations-per-author.png" alt="Citations per author, 2010 to 2023" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/training-runs.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/training-runs.png" alt="Largest publicly announced AI training runs by company" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/key-innovations-occurrence.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/key-innovations-occurrence.png" alt="Adoption frequency of key innovations in the ten largest language models" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 27 Nov 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">efd11b8a-dfd6-4876-b639-8063143df23f</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/efd11b8a-dfd6-4876-b639-8063143df23f.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ben%2520Cottier%252C%2520Tamay%2520Besiroglu%252C%2520David%2520Owen&amp;title=%22Who%20is%20leading%20in%20AI%3F%20An%20analysis%20of%20industry%20AI%20research%22%20by%20Ben%20Cottier%2C%20Tamay%20Besiroglu%2C%20David%20Owen&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fwho-is-leading-in-ai-an-analysis-of-industry-ai-research&amp;created_at=2026-05-18T18%3A13%3A10.823929%2B00%3A00&amp;duration=370" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/who-is-leading-in-ai-an-analysis-of-industry-ai-research</link>
      <itunes:duration>370</itunes:duration>
    </item>
    <item>
      <title>“Challenges in predicting AI automation” by David Owen, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: Economists have proposed several different approaches to predicting AI automation of economically valuable tasks. There is vast disagreement between different approaches and no clear winner.&lt;/p&gt;  &lt;p&gt; There's a chart here.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The automation of tasks by AI systems has the potential to generate tremendous economic value (Erdil and Besiroglu, 2023). The prospect of capturing this value can incentivize greater investments into developing AI capabilities. Accurately predicting when tasks are likely to be automated by AI could therefore help forecast the trajectory of AI investment and AI development. Understanding the impact of automation on the economy and labor force is also important for policymakers; governments may need to implement policies to help workers transition and ensure the benefits of automation are broadly shared.&lt;/p&gt;&lt;p&gt; This review examines the literature on predicting AI automation, focusing on the economics literature on AI-driven automation of occupational tasks. We also review the nascent literature on empirical validation of these predictions, examining whether we should put more trust in some predictions than others. We hope this review will help researchers engage with this important problem. We also hope that clarifying the challenges faced by existing predictions will surface promising directions for future work.&lt;/p&gt;&lt;p&gt; In [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:28) Introduction&lt;/p&gt;&lt;p&gt;(04:47) How does automation happen?&lt;/p&gt;&lt;p&gt;(08:19) What do we want from automation predictions?&lt;/p&gt;&lt;p&gt;(11:15) An overview of automatability prediction methodologies&lt;/p&gt;&lt;p&gt;(13:04) Task feature analysis&lt;/p&gt;&lt;p&gt;(13:08) Background&lt;/p&gt;&lt;p&gt;(15:15) Task-focused analyses&lt;/p&gt;&lt;p&gt;(18:27) Response to generative AI and LLMs&lt;/p&gt;&lt;p&gt;(19:52) Task-patent mapping&lt;/p&gt;&lt;p&gt;(20:59) Automation forecasting surveys&lt;/p&gt;&lt;p&gt;(22:50) Overview of predictions&lt;/p&gt;&lt;p&gt;(27:20) Empirical evidence and comparison&lt;/p&gt;&lt;p&gt;(29:12) Studies of economic effects from AI automation&lt;/p&gt;&lt;p&gt;(31:49) Case studies on AI in specific applications&lt;/p&gt;&lt;p&gt;(36:02) Comparison of prediction methodologies&lt;/p&gt;&lt;p&gt;(39:48) Discussion&lt;/p&gt;&lt;p&gt;(43:39) Acknowledgments&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 24th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/challenges-in-predicting-ai-automation?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/challenges-in-predicting-ai-automation&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/autor-duckworth.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/autor-duckworth.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/challenges-in-predicting-ai-automation/categories.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/challenges-in-predicting-ai-automation/categories.png" alt="Figure 1: The academic literature on predicting task automatability falls into three categories: task feature analysis, task-patent mapping, and automation forecasting surveys. O*NET features, discussed under Task feature analysis, are from the O*NET database of occupational information, characterizing tasks in terms of required skills, abilities, and other details." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/challenges-in-predicting-ai-automation/average-automatability.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/challenges-in-predicting-ai-automation/average-automatability.png" alt="Figure 2: Average automatability measures for twelve broad occupation categories used in Acemoglu et al. (2020) and other sources. Higher scores indicate higher automatability. Occupations are ordered from highest to lowest median wage - broadly following traditional ratings of skill level. Measures have been standardized to employment-weighted z-scores at occupation level before aggregation to broad occupation categories. Data is taken from respective publications or the comparison in Acemoglu et al. (2020). In this figure the Acemoglu and Autor (2011) measure uses the sum of cognitive and physical net routineness scores, for simplicity." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/autor-duckworth.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/autor-duckworth.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 24 Nov 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">aaf4c73c-26c5-4117-a668-5dfaa054ad12</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/aaf4c73c-26c5-4117-a668-5dfaa054ad12.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=David%2520Owen%252C%2520Tamay%2520Besiroglu&amp;title=%22Challenges%20in%20predicting%20AI%20automation%22%20by%20David%20Owen%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fchallenges-in-predicting-ai-automation&amp;created_at=2026-05-18T18%3A26%3A52.09238%2B00%3A00&amp;duration=2650" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/challenges-in-predicting-ai-automation</link>
      <itunes:duration>2650</itunes:duration>
    </item>
    <item>
      <title>“Trends in machine learning hardware” by Marius Hobbhahn, Lennart Heim, Gökçe Aydos</title>
      <description>&lt;p&gt; Subtitle: FLOP/s performance in 47 ML hardware accelerators doubled every 2.3 years. Switching from FP32 to tensor-FP16 led to a further 10x performance increase. Memory capacity and bandwidth doubled every 4 years.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Executive summary&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; There's a chart here.&lt;/p&gt;&lt;p&gt; We study the performance of GPUs for computational performance across different number representations, memory capacities and bandwidth, and interconnect bandwidth using a dataset of 47 ML accelerators (GPUs and other AI chips) commonly used in ML experiments from 2010-2023, plus 1,948 additional GPUs from 2006-2021. Our main findings are:&lt;/p&gt;&lt;ol&gt; 
&lt;li&gt; Lower-precision number formats like 16-bit floating point (FP16) and 8-bit integers (INT8), combined with specialized tensor core units, can provide order-of-magnitude performance improvements for machine learning workloads compared to traditionally used 32-bit floating point (FP32). For example, we estimate, though using limited amounts of data, that using tensor-FP16 can provide roughly 10x speedup compared to FP32.&lt;/li&gt;
&lt;li&gt; Given that the overall performance of large hardware clusters for state-of-the-art ML model training and inference depends on factors beyond just computational performance, we investigate memory capacity, memory bandwidth and interconnects, and find that:
&lt;ol&gt; 
&lt;li&gt; Memory capacity is doubling every ~4 years and memory bandwidth every ~4.1 years. They have increased at [...]&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:31) Executive summary&lt;/p&gt;&lt;p&gt;(03:30) Introduction&lt;/p&gt;&lt;p&gt;(05:25) Terminology&lt;/p&gt;&lt;p&gt;(06:26) Dataset&lt;/p&gt;&lt;p&gt;(07:17) Trends of primary performance metrics&lt;/p&gt;&lt;p&gt;(07:39) Number representations&lt;/p&gt;&lt;p&gt;(09:10) Computational performance for FP32 and FP16&lt;/p&gt;&lt;p&gt;(10:20) Computational performance gains through hardware support for less precise number formats&lt;/p&gt;&lt;p&gt;(12:54) Memory capacity and bandwidth&lt;/p&gt;&lt;p&gt;(17:11) Interconnect bandwidth&lt;/p&gt;&lt;p&gt;(21:24) Computational price-performance&lt;/p&gt;&lt;p&gt;(26:08) Energy efficiency&lt;/p&gt;&lt;p&gt;(27:51) Conclusions&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 26 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 9th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/trends-in-machine-learning-hardware?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/trends-in-machine-learning-hardware&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/peak-computational-performance-different-precisions.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/peak-computational-performance-different-precisions.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/peak-computational-performance-fp32.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/peak-computational-performance-fp32.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/performance-ratios-fp32.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/performance-ratios-fp32.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/memory-capacity.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/memory-capacity.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/interconnect-bandwidth.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/interconnect-bandwidth.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/fp32-price-performance.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/fp32-price-performance.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/peak-computational-price-performance-different-precisions.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/peak-computational-price-performance-different-precisions.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/fp32-energy-efficiency.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/fp32-energy-efficiency.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 09 Nov 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">07c7f0f4-38aa-4431-809a-08445610ad67</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/07c7f0f4-38aa-4431-809a-08445610ad67.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Marius%2520Hobbhahn%252C%2520Lennart%2520Heim%252C%2520G%25C3%25B6k%25C3%25A7e%2520Aydos&amp;title=%22Trends%20in%20machine%20learning%20hardware%22%20by%20Marius%20Hobbhahn%2C%20Lennart%20Heim%2C%20G%C3%B6k%C3%A7e%20Aydos&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Ftrends-in-machine-learning-hardware&amp;created_at=2026-05-18T18%3A13%3A12.842116%2B00%3A00&amp;duration=1887" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/trends-in-machine-learning-hardware</link>
      <itunes:duration>1887</itunes:duration>
    </item>
    <item>
      <title>“Announcing Epoch AI’s updated parameter, compute and data trends database” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: Our expanded database, which tracks the parameters, datasets, training compute, and other details of notable machine learning systems, now spans over 700 notable machine learning models.&lt;/p&gt; 
&lt;p&gt; Machine learning is advancing at breakneck speed, but what's driving its progress? It is widely recognized that the performance of machine learning models is closely related to the amount of training data, compute, and number of parameters in the model. At Epoch AI, we’re investigating the key inputs that enable today's AIs to reach new heights.&lt;/p&gt;
&lt;p&gt; Our recently expanded Parameter, Compute and Data Trends database traces these details for hundreds of landmark ML systems and research papers. Our database allows everyone to understand:&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; How models have swelled from mere dozens of parameters in early networks to over half a trillion in systems like Minerva today.&lt;/li&gt;
&lt;li&gt; How training compute has increased by nearly 8 orders of magnitude from 2012's AlexNet to 2023's GPT-4.&lt;/li&gt;
&lt;li&gt; How datasets have grown by billions of times, from 200 thousand words for early language models to the 1.9 trillion words used to train Flan-PaLM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt; There's a chart here.&lt;/p&gt;
&lt;p&gt; Building on a model and parameter dataset we first introduced in 2021 [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          October 23rd, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/announcing-updated-pcd-database?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/announcing-updated-pcd-database&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/machine-learning-systems-by-domain-stacked.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/machine-learning-systems-by-domain-stacked.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 23 Oct 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">a18cee50-1606-4425-ba6c-326539ef93d5</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/a18cee50-1606-4425-ba6c-326539ef93d5.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Announcing%20Epoch%20AI%E2%80%99s%20updated%20parameter%2C%20compute%20and%20data%20trends%20database%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fannouncing-updated-pcd-database&amp;created_at=2026-05-18T18%3A13%3A13.88604%2B00%3A00&amp;duration=163" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/announcing-updated-pcd-database</link>
      <itunes:duration>163</itunes:duration>
    </item>
    <item>
      <title>“Explosive growth from AI: A review of the arguments” by Ege Erdil, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: Our new article examines why we might (or might not) expect growth on the order of ten-fold the growth rates common in today's frontier economies once advanced AI systems are widely deployed.&lt;/p&gt; 
&lt;p&gt; The extent to which AI can automate economically valuable tasks is perhaps the most important measure of the capabilities of AI systems. As we have previously investigated, the rapid automation of such tasks has the potential to accelerate economic growth and technological development. We think the potential for explosive growth serves as a critical factor underlining AI's transformative impact on society. Yet, questions remain about why or why not extreme accelerations could occur, and if they could, how long such accelerations could last.&lt;/p&gt;
&lt;p&gt; In our new article, available as a preprint on arXiv, we take stock of the key arguments for why we might or might not expect growth that is on the order of ten-fold the growth rates common in today's frontier economies once advanced AI systems are widely deployed. We spell out these arguments in detail and tentatively assess their force. We aim to elucidate why certain mechanisms—such as regulation, the R&amp;amp;D challenges with developing capable AI, or constraints on other inputs—could [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:22) Why we might see explosive growth from AI&lt;/p&gt;&lt;p&gt;(07:36) Why we might not see explosive growth from AI&lt;/p&gt;&lt;p&gt;(16:08) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          September 23rd, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/explosive-growth-from-ai-a-review-of-the-arguments?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/explosive-growth-from-ai-a-review-of-the-arguments&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/explosive-growth-from-ai-a-review-of-the-arguments/arguments-overview.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/explosive-growth-from-ai-a-review-of-the-arguments/arguments-overview.svg" alt="Diagram comparing arguments for and against AI economic impact, showing stronger and weaker points." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Sat, 23 Sep 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">afa953bf-c2b0-4329-9827-da8595dd9171</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/afa953bf-c2b0-4329-9827-da8595dd9171.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil%252C%2520Tamay%2520Besiroglu&amp;title=%22Explosive%20growth%20from%20AI%3A%20A%20review%20of%20the%20arguments%22%20by%20Ege%20Erdil%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fexplosive-growth-from-ai-a-review-of-the-arguments&amp;created_at=2026-05-18T18%3A13%3A17.701777%2B00%3A00&amp;duration=1117" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/explosive-growth-from-ai-a-review-of-the-arguments</link>
      <itunes:duration>1117</itunes:duration>
    </item>
    <item>
      <title>“Trading off compute in training and inference” by Pablo Villalobos, David Atkinson</title>
      <description>&lt;p&gt; Subtitle: We explore several techniques that induce a tradeoff between spending more resources on training or on inference and characterize the properties of this tradeoff. We outline some implications for AI governance.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Key takeaways&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; In current machine learning systems, the performance of a system is closely related to how much compute is spent during the training process. However, it is also possible to augment the capabilities of a trained model at the cost of increasing compute usage during inference or reduce compute usage during inference at the cost of lower performance. For example, models can be pruned to reduce their inference cost, or instructed to reason via chains of thought, which increases their inference cost.&lt;/p&gt;&lt;p&gt; Based on evidence from five concrete techniques (model scaling, Monte Carlo Tree Search, pruning, resampling, and chain of thought), we expect that, relative to most current models (eg: GPT-4) it is possible to:&lt;/p&gt;&lt;ol&gt; 
&lt;li&gt; Increase the amount of compute per inference by 1-2 orders of magnitude (OOM), in exchange for saving ~1 OOM in training compute while maintaining performance. We expect this to be the case in most language tasks that don’t require specific factual knowledge or very concrete skills (eg: knowing how [...]&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:26) Key takeaways&lt;/p&gt;&lt;p&gt;(03:46) Overview&lt;/p&gt;&lt;p&gt;(03:49) Introduction&lt;/p&gt;&lt;p&gt;(05:51) The tradeoff&lt;/p&gt;&lt;p&gt;(05:54) Individual techniques&lt;/p&gt;&lt;p&gt;(08:49) Combining techniques&lt;/p&gt;&lt;p&gt;(10:01) Implications&lt;/p&gt;&lt;p&gt;(11:51) Conclusions&lt;/p&gt;&lt;p&gt;(13:25) Full report&lt;/p&gt;&lt;p&gt;(13:27) Background&lt;/p&gt;&lt;p&gt;(14:26) Contributions&lt;/p&gt;&lt;p&gt;(15:56) Techniques&lt;/p&gt;&lt;p&gt;(15:59) Varying the scaling policy&lt;/p&gt;&lt;p&gt;(18:09) Monte Carlo Tree Search&lt;/p&gt;&lt;p&gt;(21:47) Pruning&lt;/p&gt;&lt;p&gt;(24:22) Repeated sampling and filtering&lt;/p&gt;&lt;p&gt;(26:05) Unlimited Trials&lt;/p&gt;&lt;p&gt;(28:36) Limited trials&lt;/p&gt;&lt;p&gt;(29:24) Chain of thought and model cascades&lt;/p&gt;&lt;p&gt;(30:53) Combining tradeoffs&lt;/p&gt;&lt;p&gt;(32:01) Modeling the tradeoff&lt;/p&gt;&lt;p&gt;(35:21) Efficiency and optimal scaling&lt;/p&gt;&lt;p&gt;(35:54) Conclusion&lt;/p&gt;&lt;p&gt;(37:15) Acknowledgements&lt;/p&gt;&lt;p&gt;(37:32) Bibliography&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 19 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          July 28th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/trading-off-compute-in-training-and-inference?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/trading-off-compute-in-training-and-inference&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/summary.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/summary.png" alt="Summary Figure: Tradeoff diagrams of the four techniques we studied in greatest depth. The solid curves indicate constant performance. The shaded region is the span of efficient exchange: the region in which it is possible to trade off the two types of compute at a marginal exchange rate better than 6 to 1. In some cases the size of the span increases with scale, in others it decreases with scale." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/training-inference-compute.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/training-inference-compute.png" alt="Figure A: Compute required for training and running a single inference of multiple language models published since 2012. The single-inference compute is usually close to the square root of the training compute. Data from Epoch AI (2022)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/tradeoff-a.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/tradeoff-a.png" alt="Figure B: Illustrative tradeoff examples. Left: tradeoff from overtraining in language modeling. The low-inference model (black) saves one order-of-magnitude (OOM) in inference by spending 2 additional OOMs in training, relative to a Chinchilla-optimal model. Right: tradeoff from resampling in n@k code generation. The high-inference model (red circle) saves 1 OOM in training compute by spending an additional 1.5 OOM in inference compute, relative to a non-augmented model (red x). Since augmentation can be done post-training, this means that a small model (red circle) can simulate the capability of a 1 OOM larger model, after being augmented with 3 OOM of additional inference compute." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/tradeoff-shape.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/tradeoff-shape.png" alt="Figure 2: Shape of the tradeoff induced by the scaling laws from Hoffmann et al. (2022). The red dashed line corresponds to Chinchilla scaling, the blue dashed line corresponds to compute-optimal scaling, taking into account the cost of performing 1e14 inferences." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/hex-scaling.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/hex-scaling.png" alt="Figure 3: Scaling of AlphaZero agents in Hex is an S-curve in terms of both test FLOP and train FLOP (notice the concentration of points at both high and low ends of train FLOP)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/hex-tradeoff.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/hex-tradeoff.png" alt="Figure 4: Model contour lines and empirical data for MCTS tradeoff in Hex. Elo is normalized so that perfect play corresponds to 0 points, and lower Elo corresponds to lower performance." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/mcts-span.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/mcts-span.png" alt="Figure 5: The span of the MCTS tradeoff changes with scale" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/pruning-tradeoff.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/pruning-tradeoff.png" alt="Figure 6: Tradeoff for pruning, using data from Rosenfeld et al. (2020). Left: scaling network depth (number of layers). Right: scaling network width (size of layers). We include the cost of pruning as part of the training compute." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/10k-and-passk.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/10k-and-passk.png" alt="Figure 7: Comparison between 10@k and pass@k in AlphaCode, for different model sizes and values of k (sample budget). Source: Li et al. (2022)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/scaling-code.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/scaling-code.png" alt="Figure 8: Scaling model for code generation using pass@k. The dots represent real data, while the solid lines are the model fit. Data from Li et al. (2022)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/code-generation-tradeoff.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/code-generation-tradeoff.png" alt="Figure 9: Tradeoff for code generation using pass@k. Data from Li et al. (2022)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/minerva-passk-and-nk.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/minerva-passk-and-nk.png" alt="Figure 10: pass@k and n@k performance for Minerva. Data from Lewkowycz et al. (2022)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/code-generation-tradeoff-10k.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/code-generation-tradeoff-10k.png" alt="Figure 11: Tradeoff for code generation using 10@k. Data from Li et al. (2022)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/tree-of-thoughts-scaling.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/tree-of-thoughts-scaling.png" alt="Figure 12: Tree-of-Thoughts scaling with the number of search nodes, compared to Chain-of-Thought and standard prompting (in those cases, the number of nodes is just the number of generated samples). Data from Yao et al. (2023)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/tradeoff-combinations-hex.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/tradeoff-combinations-hex.png" alt="Figure 13: Combination of tradeoffs for Hex. The position and size of the crosses indicate the mean and standard deviation taken across different performance levels. Data from Jones (2021)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/pareto-loss.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/pareto-loss.png" alt="Figure 14: Pareto frontiers vs contour lines of the loss" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/pareto-tradeoff-a.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trading-off-compute-in-training-and-inference/pareto-tradeoff-a.png" alt="Figure 15: Pareto frontiers for tradeoff types a, b and c (left to right)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 28 Jul 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">865665a4-9152-4e2d-9f6e-765e928b8962</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/865665a4-9152-4e2d-9f6e-765e928b8962.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Pablo%2520Villalobos%252C%2520David%2520Atkinson&amp;title=%22Trading%20off%20compute%20in%20training%20and%20inference%22%20by%20Pablo%20Villalobos%2C%20David%20Atkinson&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Ftrading-off-compute-in-training-and-inference&amp;created_at=2026-05-18T18%3A13%3A18.356282%2B00%3A00&amp;duration=2274" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/trading-off-compute-in-training-and-inference</link>
      <itunes:duration>2274</itunes:duration>
    </item>
    <item>
      <title>“The limited benefit of recycling foundation models” by Matthew Barnett</title>
      <description>&lt;p&gt; Subtitle: While reusing pretrained models often saves training costs on large training runs, it is unlikely that model recycling will result in more than a modest increase in AI capabilities.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Reusing pre-trained models often saves training costs. In the current large-scale paradigm, foundation models are routinely re-used by fine-tuning a general foundation model on specific tasks (Bommasani et al. 2022). But it is also possible to save costs by re-using an older foundation model to train a newer foundation model. Let's call this practice “foundation model recycling”, which can be distinguished from the more general phenomenon of model re-use (Jiang et al. 2023). This short report investigates the benefits and implications of recycling foundation models.&lt;/p&gt;&lt;p&gt; There appear to be two general ways of recycling foundation models. The first method is by employing a student-teacher setup with the older model as a teacher for at least some time during training. In deep reinforcement learning, this method is generally known as kickstarting, and helps cut down on training costs by providing a dense reward signal during the early stages of training (Schmitt et al. 2018). The second method is to augment the older model with a modified architecture or [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:24) Introduction&lt;/p&gt;&lt;p&gt;(04:25) Modeling model recycling&lt;/p&gt;&lt;p&gt;(06:32) Kickstarting&lt;/p&gt;&lt;p&gt;(10:48) Model augmentation&lt;/p&gt;&lt;p&gt;(13:56) Discussion&lt;/p&gt;&lt;p&gt;(16:06) Acknowledgements&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          July 7th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/the-limited-benefit-of-recycling-foundation-models?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/the-limited-benefit-of-recycling-foundation-models&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/benefit_of_kickstarting_no_restriction.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/benefit_of_kickstarting_no_restriction.svg" alt="Figure 1: The benefit of kickstarting, as measured by the ratio of adjusted compute to real compute after effectively infinite iterations under various possible settings of the rate of growth of compute budgets r and the re-use efficiency alpha." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/benefit_of_kickstarting_as_iterations_increases.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/benefit_of_kickstarting_as_iterations_increases.svg" alt="Figure 2: The benefit of kickstarting after repeated variations under various possible settings of the rate of growth of compute budgets r and the re-use efficiency alpha." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/contribution_from_kickstarting_as_iterations_increase.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/contribution_from_kickstarting_as_iterations_increase.svg" alt="Figure 3: The contribution to adjusted compute from kickstarting for values of alpha equals 0.9 and a rate of increase in effective compute of 380 percent, where each year is one iteration." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/benefit_of_augmentation-2.0-2.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/benefit_of_augmentation-2.0-2.svg" alt="a)" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/benefit_of_augmentation-2.0-5.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/benefit_of_augmentation-2.0-5.svg" alt="b)" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/benefit_of_augmentation-4.0-2.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/benefit_of_augmentation-4.0-2.svg" alt="c)" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/benefit_of_augmentation-4.0-5.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/the-limited-benefit-of-recycling-foundation-models/benefit_of_augmentation-4.0-5.svg" alt="d)" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 07 Jul 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">2497284b-3338-4ba2-ade7-472378bd2ede</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/2497284b-3338-4ba2-ade7-472378bd2ede.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Matthew%2520Barnett&amp;title=%22The%20limited%20benefit%20of%20recycling%20foundation%20models%22%20by%20Matthew%20Barnett&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fthe-limited-benefit-of-recycling-foundation-models&amp;created_at=2026-05-18T18%3A13%3A19.639081%2B00%3A00&amp;duration=990" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/the-limited-benefit-of-recycling-foundation-models</link>
      <itunes:duration>990</itunes:duration>
    </item>
    <item>
      <title>“How predictable is language model benchmark performance?” by David Owen</title>
      <description>&lt;p&gt; Subtitle: We investigate large language model performance across five orders of magnitude of compute scaling, finding that compute-focused extrapolations are a promising way to forecast AI capabilities.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Executive summary&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; We investigate large language model performance across five orders of magnitude of compute scaling in 11 recent model architectures at 36 different model sizes.&lt;/li&gt;
&lt;li&gt; We present data on performance in BIG-Bench and MMLU, covering a range of model sizes and architectures.&lt;/li&gt;
&lt;li&gt; We examine trends in performance, showing a fairly smooth relationship between overall performance and scale, consistent with an S-curve.&lt;/li&gt;
&lt;li&gt; We outline an approach for predicting benchmark performance based on compute scaling.&lt;/li&gt;
&lt;li&gt; We back-test predictability of aggregate benchmark performance using this approach, showing that performance is moderately predictable from compute scaling.&lt;/li&gt;
&lt;li&gt; We show that individual benchmark tasks are less predictable, but remain more predictable than chance or a simple per-task average baseline.&lt;/li&gt;
&lt;li&gt; We conclude that compute-based extrapolations are a promising way to forecast AI capabilities.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt; Background&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Scaling laws allow prediction of a model's loss from model and dataset sizes. However, scaling does not directly predict a model's performance on downstream tasks - as assessed through benchmarks. To bridge this gap [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:24) Executive summary&lt;/p&gt;&lt;p&gt;(01:27) Background&lt;/p&gt;&lt;p&gt;(02:59) Results&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 9th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/how-predictable-is-language-model-benchmark-performance?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/how-predictable-is-language-model-benchmark-performance&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/loss-landscape.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/loss-landscape.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/bbh.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/bbh.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/absolute-err-overall-bbh.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/absolute-err-overall-bbh.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/task-example-human-organs-senses.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/task-example-human-organs-senses.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/mmlu.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/mmlu.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/per-task-error-vs-pt-ahead-bb.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/per-task-error-vs-pt-ahead-bb.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/task-example-english-proverbs.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/task-example-english-proverbs.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/task-example-movie-recommendations.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/task-example-movie-recommendations.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 09 Jun 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">633a372f-ad09-4e87-827d-fef5255cf5db</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/633a372f-ad09-4e87-827d-fef5255cf5db.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=David%2520Owen&amp;title=%22How%20predictable%20is%20language%20model%20benchmark%20performance%3F%22%20by%20David%20Owen&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fhow-predictable-is-language-model-benchmark-performance&amp;created_at=2026-05-18T18%3A13%3A20.646861%2B00%3A00&amp;duration=346" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/how-predictable-is-language-model-benchmark-performance</link>
      <itunes:duration>346</itunes:duration>
    </item>
    <item>
      <title>“Epoch AI and FRI mentorship program summer 2023” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: We are launching the Epoch and FRI mentorship program for women, non-binary people, and transgender people of all genders to provide guidance to individuals who want to contribute to AI forecasting.&lt;/p&gt; 
&lt;p&gt; We are thrilled to announce the Epoch AI and Forecasting Research Institute (FRI) mentorship program for women, non-binary people, and trans people of all genders. This program aims to provide guidance to individuals who want to contribute to the field of AI forecasting.&lt;/p&gt;
&lt;p&gt; The program mentors are Tegan McCaslin (FRI), Molly G Hickman (FRI), Avital Morris (FRI), David Owen (Epoch AI), Ben Cottier (Epoch AI) and Pablo Villalobos (Epoch AI).&lt;/p&gt;
&lt;p&gt; Ajeya Cotra (Open Philanthropy) is a research advisor to the project. The program is coordinated by Jaime Sevilla (Epoch AI), with the support of Maria de la Lama (Epoch AI). Kathryn Mecrow-Flynn (Magnify Mentoring) is a program advisor.&lt;/p&gt;
&lt;p&gt; Throughout the program, the mentees will work in pairs under the guidance of a mentor to produce original research on Artificial Intelligence Forecasting. Each mentor will offer to guide a selection of projects within their expertise. Examples of projects on offer might include studying trends on the context size of Large Language Models, estimating the [...]&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 8th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/epoch-and-fri-mentorship-program-summer-2023?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/epoch-and-fri-mentorship-program-summer-2023&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/epoch-and-fri-mentorship-program-summer-2023/epoch-full-standard.svg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/epoch-and-fri-mentorship-program-summer-2023/epoch-full-standard.svg" alt="Epochal AI logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/epoch-and-fri-mentorship-program-summer-2023/fri-logo.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/epoch-and-fri-mentorship-program-summer-2023/fri-logo.png" alt="Forecasting Research Institute logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 08 Jun 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">f3e0c503-7f73-4183-b036-023e83c134d9</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/f3e0c503-7f73-4183-b036-023e83c134d9.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Epoch%20AI%20and%20FRI%20mentorship%20program%20summer%202023%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fepoch-and-fri-mentorship-program-summer-2023&amp;created_at=2026-05-18T18%3A13%3A21.547869%2B00%3A00&amp;duration=188" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/epoch-and-fri-mentorship-program-summer-2023</link>
      <itunes:duration>188</itunes:duration>
    </item>
    <item>
      <title>“Direct Approach interactive model” by David Atkinson, Matthew Barnett, Edu Roldán, Ben Cottier, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: We combine the Direct Approach framework with simple models of progress in algorithms, investment, and compute costs to produce a user-adjustable forecast of when TAI will be achieved.&lt;/p&gt; 
&lt;p&gt; Summary: This post presents an interactive model for forecasting transformative AI, by which we mean AI that if deployed widely, would precipitate a change comparable to the industrial revolution. In addition to showcasing the results of the Direct Approach, we present a simple extrapolative model of key inputs (algorithmic progress, investment, hardware efficiency) that produce a user-adjustable forecast over the date transformative AI will be deployed. This model contains four parts:&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; Compute requirements estimated using the Direct Approach framework&lt;/li&gt;
&lt;li&gt; Projected algorithmic progress&lt;/li&gt;
&lt;li&gt; Projected investment in training transformative AI models&lt;/li&gt;
&lt;li&gt; Projected compute availability and cost&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt; These components are combined to estimate the probability that the compute needs for transformative AI will be met in a given future year. Under default parameter values calibrated on historical estimates, the simple extrapolative model assigns a high chance of the development of transformative AI by 2050.&lt;/p&gt;
&lt;p&gt; We take this to mean that current trends of algorithmic progress and compute scaling, if continued, will likely lead to [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:33) Compute requirements&lt;/p&gt;&lt;p&gt;(07:22) Algorithmic Progress&lt;/p&gt;&lt;p&gt;(09:08) Investment&lt;/p&gt;&lt;p&gt;(11:23) Compute&lt;/p&gt;&lt;p&gt;(12:40) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 7 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 31st, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/direct-approach-interactive-model?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/direct-approach-interactive-model&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/tai-timeline-density.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/tai-timeline-density.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/tai-timeline.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/tai-timeline.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/effective-flops.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/effective-flops.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/direct-approach-interactive-model/overview.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/direct-approach-interactive-model/overview.png" alt="Overview of the model’s components, and how they relate to each other." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/tai-requirements.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/tai-requirements.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/adjusted-tai-requirements.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/adjusted-tai-requirements.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/cumulative-adjusted-tai-requirements.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/cumulative-adjusted-tai-requirements.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/adjusted-scaled-tai-requirements.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/adjusted-scaled-tai-requirements.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/cumulative-adjusted-scaled-tai-requirements.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/cumulative-adjusted-scaled-tai-requirements.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/algorithmic-progress.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/algorithmic-progress.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/spending.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/spending.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/flops-per-dollar.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/flops-per-dollar.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/physical-flops.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/physical-flops.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/tai-timeline-density.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/tai-timeline-density.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/tai-timeline.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/tai-timeline.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/effective-flops.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/effective-flops.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 31 May 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">86982568-3b25-4f44-b5e7-bb96a6c45c55</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/86982568-3b25-4f44-b5e7-bb96a6c45c55.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=David%2520Atkinson%252C%2520Matthew%2520Barnett%252C%2520Edu%2520Rold%25C3%25A1n%252C%2520Ben%2520Cottier%252C%2520Tamay%2520Besiroglu&amp;title=%22Direct%20Approach%20interactive%20model%22%20by%20David%20Atkinson%2C%20Matthew%20Barnett%2C%20Edu%20Rold%C3%A1n%2C%20Ben%20Cottier%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fdirect-approach-interactive-model&amp;created_at=2026-05-18T18%3A40%3A08.02422%2B00%3A00&amp;duration=882" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/direct-approach-interactive-model</link>
      <itunes:duration>882</itunes:duration>
    </item>
    <item>
      <title>“A compute-based framework for thinking about the future of AI” by Matthew Barnett</title>
      <description>&lt;p&gt; Subtitle: AI's potential to automate labor is likely to alter the course of human history within decades, with the availability of compute being the most important factor driving rapid progress in AI capabilities.&lt;/p&gt; 
&lt;p&gt; How should we expect AI to unfold over the coming decades? In this article, I explain and defend a compute-based framework for thinking about AI automation. This framework makes the following claims, which I defend throughout the article:&lt;/p&gt;
&lt;ol&gt; 
&lt;li&gt; The most salient impact of AI will be its ability to automate labor, which is likely to trigger a productivity explosion later this century, greatly altering the course of history.&lt;/li&gt;
&lt;li&gt; The availability of useful compute is the most important factor that determines progress in AI, a trend which will likely continue into the foreseeable future.&lt;/li&gt;
&lt;li&gt; AI performance is likely to become relatively predictable on most important, general measures of performance, at least when predicting over short time horizons.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt; While none of these ideas are new, my goal is to provide a single article that articulates and defends the framework as a cohesive whole. In doing so, I present the perspective that Epoch AI researchers find most illuminating about the future of [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:37) Summary&lt;/p&gt;&lt;p&gt;(02:37) Part 1: Widespread automation from AI&lt;/p&gt;&lt;p&gt;(07:24) Simple models of explosive growth&lt;/p&gt;&lt;p&gt;(11:44) Part 2: A compute-centered theory of AI automation&lt;/p&gt;&lt;p&gt;(18:11) What about algorithmic progress?&lt;/p&gt;&lt;p&gt;(21:24) Part 3: Predictability of AI performance&lt;/p&gt;&lt;p&gt;(23:08) Why predicting AI performance may be tractable&lt;/p&gt;&lt;p&gt;(25:17) Predicting performance via a theoretical model&lt;/p&gt;&lt;p&gt;(30:42) Part 4: Modeling AI timelines&lt;/p&gt;&lt;p&gt;(33:14) Against very short timelines&lt;/p&gt;&lt;p&gt;(36:50) My personal AI Timelines&lt;/p&gt;&lt;p&gt;(39:02) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 11 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          May 31st, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/a-compute-based-framework-for-thinking-about-the-future-of-ai?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/a-compute-based-framework-for-thinking-about-the-future-of-ai&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/gdp-owid.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/gdp-owid.png" alt="Line graph titled "World GDP over the last two millennia" showing exponential economic growth." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/gdp-ai-impacts.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/gdp-ai-impacts.png" alt="Plot from AI Impacts." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/performance-owen-2023.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/performance-owen-2023.png" alt="Aggregate benchmark performance is fairly predictable from scale. Graph from Owen 2023." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/kl-divergence.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/kl-divergence.png" alt="Diagram showing statistical distributions and formulas for distinguishing model-generated versus human-generated text outputs." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/distinguishability-vs-compute.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/distinguishability-vs-compute.png" alt="Graph showing number of tokens versus compute for different K-performance slowdown rates." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/interactive-model.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/interactive-model.png" alt="An overview of the interactive model the Epoch AI team has developed." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/personal-timelines.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/personal-timelines.png" alt="Probability density graph showing three overlapping distribution curves from 2030 to 2100." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/personal-timelines-cumulative.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/a-compute-based-framework-for-thinking-about-the-future-of-ai/personal-timelines-cumulative.png" alt="Graph titled "Cumulative Probability" showing probability distribution curves over time from 2030 to 2100." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 31 May 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">40f768d8-c402-4917-a494-806fae0d92a9</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/40f768d8-c402-4917-a494-806fae0d92a9.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Matthew%2520Barnett&amp;title=%22A%20compute-based%20framework%20for%20thinking%20about%20the%20future%20of%20AI%22%20by%20Matthew%20Barnett&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fa-compute-based-framework-for-thinking-about-the-future-of-ai&amp;created_at=2026-05-18T18%3A35%3A01.836176%2B00%3A00&amp;duration=2402" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/a-compute-based-framework-for-thinking-about-the-future-of-ai</link>
      <itunes:duration>2402</itunes:duration>
    </item>
    <item>
      <title>“Please report your compute” by Jaime Sevilla, Anson Ho, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: Compute is essential for AI performance, but researchers often fail to report it. Adopting reporting norms would support research, enhance forecasts of AI's impacts and developments, and assist policymakers.&lt;/p&gt;  &lt;p&gt; We’ve recently published an opinion piece in the Communications of the ACM, where we ask machine learning researchers and engineers to consistently report their compute usage.&lt;/p&gt;
&lt;p&gt; If this norm was widely adopted, it would allow us to better discern to what degree improvements in AI have been driven by scale rather than novel algorithms, it would help forecast the emergence of novel AI capabilities and it would provide a tangible lever around which external and internal regulation could be developed.&lt;/p&gt;
&lt;p&gt; To facilitate the estimation and reporting of compute usage, we have prepared an interactive calculator that you can find in our report Estimating Training Compute of Deep Learning Models.&lt;/p&gt;
&lt;p&gt; You can read the opinion piece here.&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 26th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/please-report-your-compute?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/please-report-your-compute&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Wed, 26 Apr 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">81348951-7ff7-4be5-bedf-ee271bf8b256</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/81348951-7ff7-4be5-bedf-ee271bf8b256.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla%252C%2520Anson%2520Ho%252C%2520Tamay%2520Besiroglu&amp;title=%22Please%20report%20your%20compute%22%20by%20Jaime%20Sevilla%2C%20Anson%20Ho%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fplease-report-your-compute&amp;created_at=2026-05-18T18%3A40%3A09.317593%2B00%3A00&amp;duration=80" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/please-report-your-compute</link>
      <itunes:duration>80</itunes:duration>
    </item>
    <item>
      <title>“The Direct Approach” by Matthew Barnett, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: Empirical scaling laws can help predict the cross-entropy loss associated with training inputs, such as compute and data. However, in order to predict when AI will achieve some subjective level of performance, it is necessary to devise a way of interpreting the cross-entropy loss of a model. This blog post provides a discussion of one such theoretical method, which we call the Direct Approach.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Overview&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Empirical scaling laws can help predict the cross-entropy loss associated with training inputs, such as compute and data. However, in order to predict when AI will achieve some subjective level of performance, we need to interpret the cross-entropy loss of a model. This blog post discusses one such theoretical method, which we call the Direct Approach. The key to understanding the Direct Approach is that scaling laws can be used to forecast KL divergence of the true distribution from the model, which can in turn tell us how distinguishable the model is from the true distribution. Arguably, indistinguishability over sufficiently long sequences implies competence on the tasks implicit in the data distribution; if true, we can use scaling laws to upper bound the training compute necessary to achieve a particular level of [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:36) Overview&lt;/p&gt;&lt;p&gt;(06:44) Questions and answers&lt;/p&gt;&lt;p&gt;(15:59) Reviews&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 25th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/the-direct-approach?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/the-direct-approach&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Tue, 25 Apr 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">09b5232e-a55f-4f94-8333-39519dcda4f6</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/09b5232e-a55f-4f94-8333-39519dcda4f6.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Matthew%2520Barnett%252C%2520Tamay%2520Besiroglu&amp;title=%22The%20Direct%20Approach%22%20by%20Matthew%20Barnett%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fthe-direct-approach&amp;created_at=2026-05-18T18%3A40%3A12.68087%2B00%3A00&amp;duration=1004" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/the-direct-approach</link>
      <itunes:duration>1004</itunes:duration>
    </item>
    <item>
      <title>“Power laws in speedrunning and machine learning” by Ege Erdil, Jaime Sevilla</title>
      <description>&lt;p&gt; Subtitle: We develop a model for predicting record improvements in video game speedrunning and apply it to predicting machine learning benchmarks. This model suggests that machine learning benchmarks are not close to saturation, and that large sudden improvements are infrequent, but not ruled out.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Overview&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Anticipating the future of AI necessarily requires anticipating results in performance. Ideally, we would like to understand the dynamics of improvements. This would help us preempt capabilities and understand how plausible sudden improvements are.&lt;/p&gt;&lt;p&gt; However, this problem is notoriously difficult. Among other reasons, machine learning benchmarks use many different metrics for measuring performance. And the history of improvements in all domains is limited, spanning around a dozen improvements in the longest-running benchmarks.&lt;/p&gt;&lt;p&gt; To circumvent these problems, we follow Sevilla (2021) and study video game speedrunning. Using data from speedrun.com, we investigate a previously noted regularity in world record progressions - an astounding fit to a power law pattern.&lt;/p&gt;&lt;p&gt; There's a chart here.&lt;/p&gt;&lt;p&gt; Exploiting this regularity, we develop a random effects model for predicting the size of successive record improvements. We show that this model is significantly better than a baseline of predicting no improvement, and has a performance comparable to a model fit [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 21st, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/power-laws-in-speedrunning-and-machine-learning?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/power-laws-in-speedrunning-and-machine-learning&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/speedrunning.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/speedrunning.png" alt="" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 21 Apr 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">8c7d65ca-697f-41ef-88dc-e8ad44b7e0a0</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/8c7d65ca-697f-41ef-88dc-e8ad44b7e0a0.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil%252C%2520Jaime%2520Sevilla&amp;title=%22Power%20laws%20in%20speedrunning%20and%20machine%20learning%22%20by%20Ege%20Erdil%2C%20Jaime%20Sevilla&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fpower-laws-in-speedrunning-and-machine-learning&amp;created_at=2026-05-18T18%3A40%3A13.639794%2B00%3A00&amp;duration=200" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/power-laws-in-speedrunning-and-machine-learning</link>
      <itunes:duration>200</itunes:duration>
    </item>
    <item>
      <title>“Announcing Epoch AI’s dashboard of key trends and figures in machine learning” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: We are launching a dashboard that provides key data from our research on machine learning, aiming to serve as a valuable resource for understanding the present and future of the field.&lt;/p&gt; 
&lt;p&gt; Developments in machine learning have been happening extraordinarily fast, and as their impacts become increasingly visible, it becomes ever more important to develop a quantitative understanding of these changes. However, relevant data has thus far been scattered across multiple papers, has required expertise to gather accurately, or has been otherwise hard to obtain.&lt;/p&gt;
&lt;p&gt; Given this, Epoch AI is thrilled to announce the launch of our new dashboard, which covers key numbers and figures from our research to help understand the present and future of machine learning. This includes:&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; Training compute requirements&lt;/li&gt;
&lt;li&gt; Model size, measured by the number of trainable parameters&lt;/li&gt;
&lt;li&gt; The availability and use of data for training&lt;/li&gt;
&lt;li&gt; Trends in hardware efficiency&lt;/li&gt;
&lt;li&gt; Algorithmic improvements for achieving better performance with fewer resources&lt;/li&gt;
&lt;li&gt; The growth of investment in training runs over time&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt; Our dashboard gathers all of this information in a single, accessible place. The numbers and figures are accompanied by further information such as confidence intervals, labels [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          April 12th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/announcing-trends-dashboard?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/announcing-trends-dashboard&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/announcing-trends-dashboard/highlights.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/announcing-trends-dashboard/highlights.png" alt="Six metric cards showing AI training trends: compute growth 4.2x yearly, training data projection 2024, 540 billion parameters, GPU price-performance 1.32x yearly, algorithmic improvements 2.5÷ yearly, training costs 3.1x yearly." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/announcing-trends-dashboard/compute-card.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/announcing-trends-dashboard/compute-card.png" alt="Training compute growth rate: 4.2 times per year since 2010." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/announcing-trends-dashboard/performance-card.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/announcing-trends-dashboard/performance-card.png" alt="Card showing GPU price-performance growth rate of 1.32x per year for FP32 precision." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/announcing-trends-dashboard/investments-card.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/announcing-trends-dashboard/investments-card.png" alt="Training costs card showing 3.1 times per year growth rate statistic." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/announcing-trends-dashboard/compute-card.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/announcing-trends-dashboard/compute-card.png" alt="Training compute growth rate: 4.2 times per year since 2010." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/announcing-trends-dashboard/algorithms-card.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/announcing-trends-dashboard/algorithms-card.png" alt="Algorithmic improvements showing 2.5 divide by year decline rate for image classification." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 12 Apr 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">4c3339ab-91fb-4c7c-b99b-b11b4c9e47b5</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/4c3339ab-91fb-4c7c-b99b-b11b4c9e47b5.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Announcing%20Epoch%20AI%E2%80%99s%20dashboard%20of%20key%20trends%20and%20figures%20in%20machine%20learning%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fannouncing-trends-dashboard&amp;created_at=2026-05-18T18%3A40%3A14.504835%2B00%3A00&amp;duration=238" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/announcing-trends-dashboard</link>
      <itunes:duration>238</itunes:duration>
    </item>
    <item>
      <title>“Epoch AI 2022 impact report” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: Our impact report for 2022.&lt;/p&gt;  &lt;p&gt; Epoch AI is a research group forecasting the development of transformative Artificial Intelligence. We try to understand how progress in AI happens and what economic impacts we might see from advanced AI.&lt;/p&gt;
&lt;p&gt; We want to enable better governance during this economic transition by gathering information about the timing of new developments, studying which levers can be used to influence AI progress and making current and past trends in ML more understandable.&lt;/p&gt;
&lt;p&gt; Founded in April of 2022, Epoch AI currently has a staff of 13 people, corresponding to 9 FTEs. We have received 1.96 million dollars in funding through a grant from Open Philanthropy. We are fiscally sponsored and operationally supported by Rethink Priorities, whose Special Projects team has been a core part of our success as an organisation.&lt;/p&gt;
&lt;p&gt; Epoch AI is fundraising a total of 6.07 million dollars over 2 years, or approximately 2.64 million dollars for October 2023 to September 2024, and 3.42 million dollars for October 2024 to September 2025.1 A detailed budget can be found in the full report.&lt;/p&gt;
&lt;p&gt; With this funding, we expect to continue and expand our research capacity in understanding the future [...]&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 1st, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/epoch-impact-report-2022?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/epoch-impact-report-2022&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Wed, 01 Feb 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">65fe5314-7a21-4d8f-b239-50b9231a9d2c</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/65fe5314-7a21-4d8f-b239-50b9231a9d2c.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Epoch%20AI%202022%20impact%20report%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fepoch-impact-report-2022&amp;created_at=2026-05-18T18%3A40%3A15.676848%2B00%3A00&amp;duration=133" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/epoch-impact-report-2022</link>
      <itunes:duration>133</itunes:duration>
    </item>
    <item>
      <title>“Trends in the dollar training cost of machine learning systems” by Ben Cottier</title>
      <description>&lt;p&gt; Subtitle: I combine training compute and GPU price-performance data to estimate the cost of compute in US dollars for the final training run of 124 machine learning systems published between 2009 and 2022, and find that the cost has grown by approximately 0.5 orders of magnitude per year.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;  Important caveats about the results in this report&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; The cost estimates have large uncertainty bounds—the true costs could be several times larger or smaller. The cost estimates are themselves built on top of estimates (e.g. training compute estimates, GPU price-performance estimates, etc.). See the Methods section and Appendix J for discussion of the uncertainties in the respective estimates.&lt;/li&gt;
&lt;li&gt; Although the estimated growth rates in cost are more robust than any individual cost estimate, these growth rates should also be interpreted with caution—especially when extrapolated into the future.&lt;/li&gt;
&lt;li&gt; The cost estimates only cover the compute for the final training runs of ML systems—nothing more.&lt;/li&gt;
&lt;li&gt; The cost estimates are for notable publicly known ML systems according to the criteria discussed in Sevilla et al. (2022, p.16). The improvements in performance over time are irregular—this means that a 2x increase in compute budget did not always lead to the [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:33) Important caveats about the results in this report&lt;/p&gt;&lt;p&gt;(02:05) Summary&lt;/p&gt;&lt;p&gt;(06:09) Why study dollar training costs?&lt;/p&gt;&lt;p&gt;(08:04) Method&lt;/p&gt;&lt;p&gt;(08:07) Background on methods to estimate the dollar cost of training compute&lt;/p&gt;&lt;p&gt;(11:11) Estimating training cost from training compute and GPU price-performance&lt;/p&gt;&lt;p&gt;(15:11) Method 1: Using the overall GPU price-performance trend&lt;/p&gt;&lt;p&gt;(15:54) Method 2: Using the price-performance of actual hardware used to train ML systems&lt;/p&gt;&lt;p&gt;(17:21) Dataset&lt;/p&gt;&lt;p&gt;(17:33) Code&lt;/p&gt;&lt;p&gt;(17:40) Large-scale systems&lt;/p&gt;&lt;p&gt;(18:42) Results&lt;/p&gt;&lt;p&gt;(18:45) Method 1: Using the overall GPU price-performance trend for all ML systems (n=124)&lt;/p&gt;&lt;p&gt;(18:54) Growth rate of training cost for all ML systems: 0.51 OOMs/year&lt;/p&gt;&lt;p&gt;(21:21) Growth rate of training cost for large-scale ML systems: 0.2 OOMs/year&lt;/p&gt;&lt;p&gt;(24:05) Method 2: Using the price-performance of NVIDIA GPUs used to train ML systems (n=48)&lt;/p&gt;&lt;p&gt;(24:14) Growth rate of training cost for all ML systems: 0.44 OOMs/year&lt;/p&gt;&lt;p&gt;(25:55) Growth rate of training cost for large-scale ML systems: 0.2 OOMs/year&lt;/p&gt;&lt;p&gt;(27:15) Summary and comparison of all regression results&lt;/p&gt;&lt;p&gt;(28:56) Predictions of when a spending limit will be reached&lt;/p&gt;&lt;p&gt;(34:11) Recommended future work&lt;/p&gt;&lt;p&gt;(34:14) Include systems trained with Google TPUs for Method 2&lt;/p&gt;&lt;p&gt;(34:46) Estimate more reliable bounds on cost using cloud compute prices and profit margins&lt;/p&gt;&lt;p&gt;(37:28) Investigate investment, allocation of spending, and revenue&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 95 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 31st, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/trends-in-the-dollar-training-cost-of-machine-learning-systems?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/trends-in-the-dollar-training-cost-of-machine-learning-systems&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/training-compute-cost-price-performance.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/training-compute-cost-price-performance.png" alt="Figure 1: estimated cost of compute in US dollars for the final training run of ML systems. The costs here are estimated based on the trend in price-performance for all GPUs in Hobbhahn &amp;amp; Besiroglu (2022) (known as “Method 1” in this report)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/training-compute-cost-price-performance.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/training-compute-cost-price-performance.png" alt="Figure 2: Estimated training compute cost of milestone ML systems using the continuous GPU price-performance trend. See this Colab notebook cell for an interactive version of the plot with ML system labels." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/large-scale-training-compute-cost.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/large-scale-training-compute-cost.png" alt="Figure 3: Estimated training compute cost of large-scale ML systems using Method 1. See this Colab notebook cell for an interactive version of the plot with ML system labels." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/training-compute-cost-actual-gpu.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/training-compute-cost-actual-gpu.png" alt="Figure 4: Estimated training compute cost of milestone ML systems using the peak price-performance of the actual NVIDIA GPUs used in training. See this Colab notebook cell for an interactive version of the plot with ML system labels." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/large-scale-training-actual-gpu.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/large-scale-training-actual-gpu.png" alt="Figure 5: Estimated training compute cost of large-scale ML systems using Method 2. See this Colab notebook cell for an interactive version of the plot with ML system labels." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/historical-extrapolation.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/historical-extrapolation.png" alt="Figure 6: Extrapolation of training cost to 1% of current US GDP, based only on the current most expensive cost estimate (Minerva, $3.27M) and the historical growth rate found for “all systems”." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/best-guess-extrapolation.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/trends-in-the-dollar-training-cost-of-machine-learning-systems/best-guess-extrapolation.png" alt="Figure 7: Extrapolation of training cost to 1% of current US GDP, based on my best-guess parameters for the most expensive cost in 2025 and the growth rate. Note that the years I reported in the text are about 1 year later than the deterministic calculations I used in this plot—I suspect this is due to the Monte Carlo estimation method used in Guesstimate." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 31 Jan 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">ba78ce4a-6c49-492a-b83f-633f4cd2e973</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/ba78ce4a-6c49-492a-b83f-633f4cd2e973.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ben%2520Cottier&amp;title=%22Trends%20in%20the%20dollar%20training%20cost%20of%20machine%20learning%20systems%22%20by%20Ben%20Cottier&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Ftrends-in-the-dollar-training-cost-of-machine-learning-systems&amp;created_at=2026-05-18T19%3A03%3A30.410141%2B00%3A00&amp;duration=2361" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/trends-in-the-dollar-training-cost-of-machine-learning-systems</link>
      <itunes:duration>2361</itunes:duration>
    </item>
    <item>
      <title>“Scaling laws literature review” by Pablo Villalobos</title>
      <description>&lt;p&gt; Subtitle: I have collected a database of scaling laws for different tasks and architectures, and reviewed dozens of papers in the scaling law literature.&lt;/p&gt;  &lt;p&gt; Common shape of a scaling law, taken from Hestness et al. (2017)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Executive summary&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; Scaling laws are predictable relations between the scale of a mode and performance or other useful properties.&lt;/li&gt;
&lt;li&gt; I have collected a&amp;nbsp;database of scaling laws for different tasks and architectures, and reviewed dozens of papers in the scaling law literature.&lt;/li&gt;
&lt;li&gt; My main takeaways are:
&lt;ul&gt; 
&lt;li&gt; Functional forms: a basic power law can effectively model the scaling behavior in the power-law region but not the transitions to the other two regions. For this, either the M4 estimator or the BNSL estimator introduced below seem to be the best options right now.&lt;/li&gt;
&lt;li&gt; Transfer learning: there is not a simple universal scaling law for transfer learning between arbitrary tasks. When the tasks are similar enough, upstream loss and downstream performance are closely related, but when tasks are very different, the details of the architecture and hyperparameters become very relevant.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt; See the full table of scaling laws here.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The term “scaling laws” in deep [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:29) Executive summary&lt;/p&gt;&lt;p&gt;(01:31) Introduction&lt;/p&gt;&lt;p&gt;(03:03) Overview&lt;/p&gt;&lt;p&gt;(06:13) Takeaways&lt;/p&gt;&lt;p&gt;(06:15) Upstream loss&lt;/p&gt;&lt;p&gt;(07:27) Transfer and downstream accuracy&lt;/p&gt;&lt;p&gt;(08:00) Theoretical analyses&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 26th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/scaling-laws-literature-review?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/scaling-laws-literature-review&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/scaling-laws-literature-review/scaling-law.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/scaling-laws-literature-review/scaling-law.png" alt="Common shape of a scaling law, taken from Hestness et al. (2017)" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 26 Jan 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">9b339226-bd94-4545-a8de-6e6b4c9a0d8b</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/9b339226-bd94-4545-a8de-6e6b4c9a0d8b.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Pablo%2520Villalobos&amp;title=%22Scaling%20laws%20literature%20review%22%20by%20Pablo%20Villalobos&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fscaling-laws-literature-review&amp;created_at=2026-05-18T18%3A47%3A11.201472%2B00%3A00&amp;duration=521" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/scaling-laws-literature-review</link>
      <itunes:duration>521</itunes:duration>
    </item>
    <item>
      <title>“An interactive model of AI takeoff speeds” by Jaime Sevilla, Edu Roldán</title>
      <description>&lt;p&gt; Subtitle: We have developed an interactive website showcasing a new model of AI takeoff speeds.&lt;/p&gt;  &lt;p&gt; Tom Davidson from Open Philanthropy has released What a compute-centric framework says about AI takeoff speeds, a report investigating how fast AI capabilities might transform the economy.&lt;/p&gt;
&lt;p&gt; Epoch AI has supported this project by coding the model and running the simulation experiments required for the investigation. As a supplement to the report, we have developed an interactive website presenting the model and some of the report's results.&lt;/p&gt;
&lt;p&gt; This website includes several sections and features, which we briefly describe below.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Playground&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; In the playground, you will find an interface to enter the parameters of the Full Takeoff Model, and see how these affect the results. It includes graphs of the most important variables of the model, as well as tables summarising the results.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Reports&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; In this section we show four reports:&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; The Monte Carlo analysis shows a summary of 10 thousand samples of the model's parameter values.&lt;/li&gt;
&lt;li&gt; The aggressive Monte Carlo analysis is the same, but using a more aggressive distribution for the amount of 2022 FLOP required to automate all productive tasks.&lt;/li&gt;
&lt;li&gt; The parameter importance analysis [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:54) Playground&lt;/p&gt;&lt;p&gt;(01:25) Reports&lt;/p&gt;&lt;p&gt;(02:51) Description&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 24th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/interactive-model-of-takeoff-speeds?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/interactive-model-of-takeoff-speeds&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/interactive-model-of-takeoff-speeds/playground.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/interactive-model-of-takeoff-speeds/playground.png" alt="Graph showing compute decomposition with multiple trend lines including FLOP globally, Hardware, and Software metrics from 2022 to 2044." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/interactive-model-of-takeoff-speeds/reports.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/interactive-model-of-takeoff-speeds/reports.png" alt="Analysis interface showing probability distributions and quantile tables for AGI automation timelines." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/interactive-model-of-takeoff-speeds/takeoff-dist.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/interactive-model-of-takeoff-speeds/takeoff-dist.png" alt="Line graph titled "20-100% economic automation" showing PDF declining over years." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/interactive-model-of-takeoff-speeds/timelines-dist.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/interactive-model-of-takeoff-speeds/timelines-dist.png" alt="Graph titled "AI Timelines Metrics" showing three CDF curves over years 2020-2100." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2023/interactive-model-of-takeoff-speeds/description.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2023/interactive-model-of-takeoff-speeds/description.png" alt="Diagram showing Full Takeoff Model with Investment, Automation, Production, R&amp;amp;D, and Reinvestment components." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 24 Jan 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">2702958a-070f-477f-bbaa-bc36166cd194</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/2702958a-070f-477f-bbaa-bc36166cd194.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla%252C%2520Edu%2520Rold%25C3%25A1n&amp;title=%22An%20interactive%20model%20of%20AI%20takeoff%20speeds%22%20by%20Jaime%20Sevilla%2C%20Edu%20Rold%C3%A1n&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Finteractive-model-of-takeoff-speeds&amp;created_at=2026-05-18T18%3A40%3A18.635914%2B00%3A00&amp;duration=240" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/interactive-model-of-takeoff-speeds</link>
      <itunes:duration>240</itunes:duration>
    </item>
    <item>
      <title>“Literature review of transformative artificial intelligence timelines” by Keith Wynroe, David Atkinson, Jaime Sevilla</title>
      <description>&lt;p&gt; Subtitle: We summarize and compare several models and forecasts predicting when transformative AI will be developed.&lt;/p&gt;  &lt;p&gt; Previous work: Grokking “Forecasting TAI with biological anchors“, Grokking “Semi-informative priors over AI timelines”&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Highlights&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; The review includes quantitative models, including both outside and inside view, and judgment-based forecasts by (teams of) experts.&lt;/li&gt;
&lt;li&gt; While we do not necessarily endorse their conclusions, the inside-view model the Epoch AI team found most compelling is Ajeya Cotra's “Forecasting TAI with biological anchors”, the best-rated outside-view model was Tom Davidson's “Semi-informative priors over AI timelines”, and the best-rated judgment-based forecast was Samotsvety's AGI Timelines Forecast.&lt;/li&gt;
&lt;li&gt; The inside-view models we reviewed predicted shorter timelines (e.g. bioanchors has a median of 2052) while the outside-view models predicted longer timelines (e.g. semi-informative priors has a median over 2100). The judgment-based forecasts are skewed towards agreement with the inside-view models, and are often more aggressive (e.g. Samotsvety assigned a median of 2043).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Over the last few years, we have seen many attempts to quantitatively forecast the arrival of transformative and/or general Artificial Intelligence (TAI/AGI) using very different methodologies and assumptions. Keeping track of and assessing these models’ relative strengths can be daunting for [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:32) Highlights&lt;/p&gt;&lt;p&gt;(01:32) Introduction&lt;/p&gt;&lt;p&gt;(04:33) Results&lt;/p&gt;&lt;p&gt;(04:36) Model based forecasts&lt;/p&gt;&lt;p&gt;(05:02) Ajeya Cotra's bio anchors&lt;/p&gt;&lt;p&gt;(05:11) Semi-informative priors&lt;/p&gt;&lt;p&gt;(05:20) Insights-based model&lt;/p&gt;&lt;p&gt;(05:29) Whole Brain Emulation&lt;/p&gt;&lt;p&gt;(05:38) Phase transitions and AGI&lt;/p&gt;&lt;p&gt;(05:48) Weighted Linear Average of Probabilities&lt;/p&gt;&lt;p&gt;(05:54) Judgment based forecasts&lt;/p&gt;&lt;p&gt;(06:20) AI Impacts survey (2022)&lt;/p&gt;&lt;p&gt;(06:31) Metaculus&lt;/p&gt;&lt;p&gt;(06:39) Samotsvety report&lt;/p&gt;&lt;p&gt;(06:48) Ajeya Cotra&lt;/p&gt;&lt;p&gt;(06:57) Holden Karnofsky&lt;/p&gt;&lt;p&gt;(07:05) Weighted Geometric Average of Odds&lt;/p&gt;&lt;p&gt;(07:52) Model-based forecasts&lt;/p&gt;&lt;p&gt;(08:18) Forecasting TAI with biological anchors (inside view)&lt;/p&gt;&lt;p&gt;(10:14) Semi-informative priors over AI timelines (outside view)&lt;/p&gt;&lt;p&gt;(12:36) Insight-based AI timelines model (outside view)&lt;/p&gt;&lt;p&gt;(13:57) Whole Brain Emulation (inside view)&lt;/p&gt;&lt;p&gt;(16:13) Phase Transitions and AGI (outside view)&lt;/p&gt;&lt;p&gt;(17:25) Judgment-based forecasts&lt;/p&gt;&lt;p&gt;(17:57) AI Impacts Survey (2022)&lt;/p&gt;&lt;p&gt;(18:41) Metaculus: Date of Artificial General Intelligence as of 2022-10-11&lt;/p&gt;&lt;p&gt;(19:19) Samotsvety's AGI Timelines Forecasts&lt;/p&gt;&lt;p&gt;(20:41) Two-year update on my personal AI timelines&lt;/p&gt;&lt;p&gt;(21:28) Forecasting transformative AI: what's the burden of proof?&lt;/p&gt;&lt;p&gt;(22:39) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 17th, 2023 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/literature-review-of-transformative-artificial-intelligence-timelines?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/literature-review-of-transformative-artificial-intelligence-timelines&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Tue, 17 Jan 2023 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">36dcc196-207b-4634-9dd7-ca374dd8f2fe</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/36dcc196-207b-4634-9dd7-ca374dd8f2fe.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Keith%2520Wynroe%252C%2520David%2520Atkinson%252C%2520Jaime%2520Sevilla&amp;title=%22Literature%20review%20of%20transformative%20artificial%20intelligence%20timelines%22%20by%20Keith%20Wynroe%2C%20David%20Atkinson%2C%20Jaime%20Sevilla&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fliterature-review-of-transformative-artificial-intelligence-timelines&amp;created_at=2026-05-18T18%3A40%3A19.823159%2B00%3A00&amp;duration=1542" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/literature-review-of-transformative-artificial-intelligence-timelines</link>
      <itunes:duration>1542</itunes:duration>
    </item>
    <item>
      <title>“Revisiting algorithmic progress” by Ege Erdil, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: We use a dataset of over a hundred computer vision models from the last decade to investigate how better algorithms and architectures have enabled researchers to use compute and data more efficiently. We find that every 9 months, the introduction of better algorithms contribute the equivalent of a doubling of compute budgets.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Overview&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; How much progress in ML depends on algorithmic progress, scaling compute, or scaling relevant datasets is relatively poorly understood. In our paper, we make progress on this question by investigating algorithmic progress in image classification on ImageNet, perhaps the most well-known test bed for computer vision.&lt;/p&gt;&lt;p&gt; Using a dataset of a hundred computer vision models, we estimate a model—informed by neural scaling laws—that enables us to analyse the rate and nature of algorithmic advances. We use Shapley values to produce decompositions of the various drivers of progress computer vision and estimate the relative importance of algorithms, compute, and data.&lt;/p&gt;&lt;p&gt; Our main results include:&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; Every nine months, the introduction of better algorithms contributes the equivalent of a doubling of compute budgets. This is much faster than the gains from Moore's law; that said, there's uncertainty (our 95% CI spans 4 to 25 months)&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt; [...]&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 12th, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/revisiting-algorithmic-progress?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/revisiting-algorithmic-progress&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/revisiting-algorithmic-progress/figure2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/revisiting-algorithmic-progress/figure2.png" alt="Attribution of progress to algorithmic progress, compute scaling and data scaling between model pairs based on Shapley decomposition. “NS” indicates that there was no scaling of the relevant input between these models. Numbers may not all add up to 100 due to rounding." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/revisiting-algorithmic-progress/figure3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/revisiting-algorithmic-progress/figure3.png" alt="Shares of algorithmic progress that is compute- vs. data-augmenting." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 12 Dec 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">669f7a4a-680c-4bc5-848c-9b3c43010758</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/669f7a4a-680c-4bc5-848c-9b3c43010758.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Ege%2520Erdil%252C%2520Tamay%2520Besiroglu&amp;title=%22Revisiting%20algorithmic%20progress%22%20by%20Ege%20Erdil%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Frevisiting-algorithmic-progress&amp;created_at=2026-05-18T19%3A08%3A57.545027%2B00%3A00&amp;duration=258" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/revisiting-algorithmic-progress</link>
      <itunes:duration>258</itunes:duration>
    </item>
    <item>
      <title>“Predicting GPU performance” by Marius Hobbhahn, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: We develop a simple model that predicts progress in the performance of field-effect transistor-based GPUs under the assumption that transistors can no longer miniaturize after scaling down to roughly the size of a single silicon atom. Our model forecasts that the current paradigm of field-effect transistor-based GPUs will plateau sometime between 2027 and 2035, offering a performance of between 1e14 and 1e15 FLOP/s in FP32.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Executive summary&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; We develop a simple model that predicts progress in the performance of field-effect transistor-based GPUs under the assumption that transistors can no longer miniaturize after scaling down to roughly the size of a single silicon atom. We construct a composite model from a performance model (a model of how GPU performance relates to the features of that GPU), and a feature model (a model of how GPU features change over time given the constraints imposed by the physical limits of miniaturization), each of which are fit on a dataset of 1948 GPUs released between 2006 and 2021. We find that almost all progress can be explained by two variables: transistor size and the number of cores. Using estimates of the physical limits informed by the relevant literature, our model predicts [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:45) Executive summary&lt;/p&gt;&lt;p&gt;(03:10) Introduction&lt;/p&gt;&lt;p&gt;(06:03) A simple model of GPU performance&lt;/p&gt;&lt;p&gt;(08:13) Feature and model selection&lt;/p&gt;&lt;p&gt;(12:05) Physical limits of transistor miniaturization&lt;/p&gt;&lt;p&gt;(14:21) Limits on the number of cores&lt;/p&gt;&lt;p&gt;(15:42) Predictions&lt;/p&gt;&lt;p&gt;(18:26) Limitations&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 4 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 1st, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/predicting-gpu-performance?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/predicting-gpu-performance&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/predicting-gpu-performance/projected-top-perf-fet-900px.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/predicting-gpu-performance/projected-top-perf-fet-900px.png" alt="Figure 1. Model predictions of peak theoretical performance of top GPUs, assuming that transistors can no longer miniaturize after scaling down transistors to around 0.7nm. Left: GPU performance projections; Top right: GPU performance when the limit is hit; Middle right: Our distribution over the physical limits of transistor miniaturization; Bottom right: Our distribution over the transistors per core ratio with relevant historical comparisons." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/predicting-gpu-performance/image12.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/predicting-gpu-performance/image12.png" alt="Figure 2: Overview of different approaches to modeling GPU performance. In this piece, we choose a factor model with two variables (process size and number of cores) and include the limits of miniaturization." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/predicting-gpu-performance/image17.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/predicting-gpu-performance/image17.png" alt="Figure 3: Distribution over the limits of miniaturization we use in our model. The parameters of the distribution are chosen to reflect what we think of as a hard boundary at 0.7nm and such that most probability mass is smaller than 3nm. We use a log-normal distribution with mu=0, sigma=0.5 and shift it by 0.5 to the right." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/predicting-gpu-performance/image27.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/predicting-gpu-performance/image27.png" alt="Figure 4: Comparison of process size and minimum transistor dimension. We find that both scale very similarly even when we account for a MOSFET vs FinFET distinction. Therefore, we conclude that we can use process size as a decent approximation for transistor size in the rest of this piece." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/predicting-gpu-performance/image25.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/predicting-gpu-performance/image25.png" alt="Figure 5: With the information from the previous figure, we construct a distribution that reflects our best guess of what the limit of the number of transistors per core is. We think it is likely lower than what current state-of-the-art ML GPUs can achieve (since they are likely not optimized for that ratio) but a bit higher than the historical minimum. We use a log-normal distribution with parameters mu=0.3 and sigma=0.5" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/predicting-gpu-performance/projected-top-perf-fet-1100px.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/predicting-gpu-performance/projected-top-perf-fet-1100px.png" alt="Figure 6: A distribution over projections of TOP GPU performance. The historical projection comes from a 90th-percentile regression of the dataset. The model predicts that the median dates for the two limits are ~2030 and ~2033 and that the limiting performance is between 1e14 and 1e15 FLOP/s." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 01 Dec 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">fb1de64f-79cb-4d37-bd0f-d4baf5ef5716</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/fb1de64f-79cb-4d37-bd0f-d4baf5ef5716.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Marius%2520Hobbhahn%252C%2520Tamay%2520Besiroglu&amp;title=%22Predicting%20GPU%20performance%22%20by%20Marius%20Hobbhahn%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fpredicting-gpu-performance&amp;created_at=2026-05-18T19%3A08%3A58.632908%2B00%3A00&amp;duration=1237" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/predicting-gpu-performance</link>
      <itunes:duration>1237</itunes:duration>
    </item>
    <item>
      <title>“Will we run out of ML data? Evidence from projecting dataset size trends” by Pablo Villalobos, Jaime Sevilla, Lennart Heim, Tamay Besiroglu, Marius Hobbhahn, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: Based on our previous analysis of trends in dataset size, we project the growth of dataset size in the language and vision domains. We explore the limits of this trend by estimating the total stock of available unlabeled data over the next decades.&lt;/p&gt;  &lt;p&gt; Our projections predict that we will have exhausted the stock of low-quality language data by 2030 to 2050, high-quality language data before 2026, and vision data by 2030 to 2060. This might slow down ML progress.&lt;/p&gt;
&lt;p&gt; All of our conclusions rely on the unrealistic assumptions that current trends in ML data usage and production will continue and that there will be no major innovations in data efficiency. Relaxing these and other assumptions would be promising future work.&lt;/p&gt;

&lt;p&gt; There's a chart here. The chart title reads: en-US-AvaMultilingualNeural__ Low-quality language data &lt;/p&gt;&lt;p&gt; There's a chart here. The chart title reads: en-US-AvaMultilingualNeural__ High-quality language data &lt;/p&gt;&lt;p&gt; There's a chart here. The chart title reads: en-US-AvaMultilingualNeural__ Image data &lt;/p&gt;

Historical projectionCompute projectionLow-quality language stock2032.4
&lt;br&gt; [2028.4 ; 2039.2]2040.5
&lt;br&gt; [2034.6 ; 2048.9]High-quality language stock2024.5
&lt;br&gt; [2023.5 ; 2025.7]2024.1
&lt;br&gt; [2023.2 ; 2025.3]Image stock2046
&lt;br&gt; [2037 ; 2062.8]2038.8
&lt;br&gt; [2032 ; 2049.8]&lt;p&gt; Table 1: Median and 90% [...]&lt;/p&gt;&lt;/br&gt;&lt;/br&gt;&lt;/br&gt;&lt;/br&gt;&lt;/br&gt;&lt;/br&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:32) Background&lt;/p&gt;&lt;p&gt;(02:47) Results&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 2 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 10th, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/will-we-run-out-of-ml-data-evidence-from-projecting-dataset?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/will-we-run-out-of-ml-data-evidence-from-projecting-dataset&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/charts/lq.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/lq.png" alt="Low-quality language data" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/hq.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/hq.png" alt="High-quality language data" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/charts/v.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/charts/v.png" alt="Image data" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 10 Nov 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">ef6dce7f-1485-4eb7-aae3-1d681609d962</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/ef6dce7f-1485-4eb7-aae3-1d681609d962.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Pablo%2520Villalobos%252C%2520Jaime%2520Sevilla%252C%2520Lennart%2520Heim%252C%2520Tamay%2520Besiroglu%252C%2520Marius%2520Hobbhahn%252C%2520Anson%2520Ho&amp;title=%22Will%20we%20run%20out%20of%20ML%20data%3F%20Evidence%20from%20projecting%20dataset%20size%20trends%22%20by%20Pablo%20Villalobos%2C%20Jaime%20Sevilla%2C%20Lennart%20Heim%2C%20Tamay%20Besiroglu%2C%20Marius%20Hobbhahn%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fwill-we-run-out-of-ml-data-evidence-from-projecting-dataset&amp;created_at=2026-05-18T19%3A09%3A02.726368%2B00%3A00&amp;duration=279" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/will-we-run-out-of-ml-data-evidence-from-projecting-dataset</link>
      <itunes:duration>279</itunes:duration>
    </item>
    <item>
      <title>“Trends in training dataset sizes” by Pablo Villalobos, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: We collected a database of notable ML models and their training dataset sizes. We use this database to find historical growth trends in dataset size for different domains, particularly language and vision.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Key takeaways&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; We collected over 200 notable ML models and estimated their training dataset size.&lt;/li&gt;
&lt;li&gt; Vision and language datasets have historically grown at 0.1 and 0.2 orders of magnitude (OOMs) per year, respectively.&lt;/li&gt;
&lt;li&gt; There seems to be some transition around 2014-2015, after which training datasets became much bigger and (in the case of language) smaller datasets disappeared. This might be just an artefact of our small sample size.&lt;/li&gt;
&lt;li&gt; We also provide trends for games, speech, recommendation and drawing, but since our sample size is very small in these domains we would advise some level of scepticism.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt; Figure 1: Training datasets for language (left) and vision (right).&lt;/p&gt;DomainScale (data points)Yearly growth &lt;br&gt; (OOMs/year)Yearly growth &lt;br&gt; (OOMs/year) (95% CI)#systemsLanguage1e2- 2e120.22[0.18 ; 0.28]79Vision2e3 - 3e90.09[0.08 ; 0.11]55Speech9e2 - 3e120.21[0.17 ; 0.30]13Games7e5 - 4e110.09[0.08 ; 0.15]12Recommendation1e8 - 1e100.05[0.00 ; 0.47]11Drawing6e4 - 4e90.43[0.17 ; 0.64]10&lt;p&gt; Table 1: Summary of trends for each domain. Scale is the maximum and minimum observed dataset size, and yearly growth is [...]&lt;/p&gt;&lt;/br&gt;&lt;/br&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:26) Key takeaways&lt;/p&gt;&lt;p&gt;(01:35) Introduction&lt;/p&gt;&lt;p&gt;(02:13) Methods&lt;/p&gt;&lt;p&gt;(02:16) Measuring dataset size&lt;/p&gt;&lt;p&gt;(04:16) Our database&lt;/p&gt;&lt;p&gt;(05:20) Dataset size trends&lt;/p&gt;&lt;p&gt;(05:23) Vision&lt;/p&gt;&lt;p&gt;(06:33) Language&lt;/p&gt;&lt;p&gt;(07:38) Other domains&lt;/p&gt;&lt;p&gt;(08:22) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          September 20th, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/trends-in-training-dataset-sizes?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/trends-in-training-dataset-sizes&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://39669.cdn.cke-cs.com/rQvD3VnunXZu34m86e5f/images/f86256c701f69ef08c5849ab82876c288ef26053205ed89f.png/w_1440" target="_blank"&gt;&lt;img src="https://39669.cdn.cke-cs.com/rQvD3VnunXZu34m86e5f/images/f86256c701f69ef08c5849ab82876c288ef26053205ed89f.png/w_1440" alt="Figure 1: Training datasets for language (left) and vision (right)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://lh4.googleusercontent.com/h-s09_jXPxpctDaKjMqc53Ru94xFXe3gh5AHSykJrr_AdBcusgDhthVdM4fhuYaKJ_t_fb1WEgOOzGBUR6_uRJFUGoP3nzk7qwb9QFgcJ-_cR8LVTuNGlsltxUv54JoK2G2APd47lI5MM3jwzxyJNfiyQUWhlYgysOCA69mRjfc7f1XO87DES5pbpA" target="_blank"&gt;&lt;img src="https://lh4.googleusercontent.com/h-s09_jXPxpctDaKjMqc53Ru94xFXe3gh5AHSykJrr_AdBcusgDhthVdM4fhuYaKJ_t_fb1WEgOOzGBUR6_uRJFUGoP3nzk7qwb9QFgcJ-_cR8LVTuNGlsltxUv54JoK2G2APd47lI5MM3jwzxyJNfiyQUWhlYgysOCA69mRjfc7f1XO87DES5pbpA" alt="Figure 2: Evolution of vision datasets. A significant number of models is concentrated near 6e4 and 1e6, which are the sizes of MNIST and ImageNet, respectively." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://lh5.googleusercontent.com/hzXQNeAUvLJqdLO77JtPwDzgUygrvv6rmp_tfV5s2kQij44Or7duj82ST_FBv7Holr_DoRdKOKMD943WV8uBjA7SnzXrs6L3CGWgNanOF371IRFV0FI0U1iKtaXsM3KjSiuuRvu04icJlOQ4UtG8KKnMXbpUuj9P9SIHdljQfd4G7d1OS88eZ-wR5A" target="_blank"&gt;&lt;img src="https://lh5.googleusercontent.com/hzXQNeAUvLJqdLO77JtPwDzgUygrvv6rmp_tfV5s2kQij44Or7duj82ST_FBv7Holr_DoRdKOKMD943WV8uBjA7SnzXrs6L3CGWgNanOF371IRFV0FI0U1iKtaXsM3KjSiuuRvu04icJlOQ4UtG8KKnMXbpUuj9P9SIHdljQfd4G7d1OS88eZ-wR5A" alt="Figure 3: Evolution of language datasets" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://39669.cdn.cke-cs.com/rQvD3VnunXZu34m86e5f/images/07922f5724e21c43ed6e9f1aab135417a9eb4ffffa7f306d.png/w_1440" target="_blank"&gt;&lt;img src="https://39669.cdn.cke-cs.com/rQvD3VnunXZu34m86e5f/images/07922f5724e21c43ed6e9f1aab135417a9eb4ffffa7f306d.png/w_1440" alt="Figure 4: Trends for Recommendation (top left), Speech (top right), Drawing (bottom left) and Games (bottom right)" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 20 Sep 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">3d3c7a1b-c721-4ddf-bcce-da7dca5988d1</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/3d3c7a1b-c721-4ddf-bcce-da7dca5988d1.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Pablo%2520Villalobos%252C%2520Anson%2520Ho&amp;title=%22Trends%20in%20training%20dataset%20sizes%22%20by%20Pablo%20Villalobos%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Ftrends-in-training-dataset-sizes&amp;created_at=2026-05-18T19%3A09%3A03.048793%2B00%3A00&amp;duration=564" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/trends-in-training-dataset-sizes</link>
      <itunes:duration>564</itunes:duration>
    </item>
    <item>
      <title>“The longest training run” by Jaime Sevilla, Tamay Besiroglu, Owen Dudney, Anson Ho</title>
      <description>&lt;p&gt; Subtitle: Training runs of large machine learning systems are likely to last less than 14-15 months. This is because longer runs will be outcompeted by runs that start later and therefore use better hardware and better algorithms.&lt;/p&gt;  &lt;p&gt; In short: Training runs of large machine learning systems are likely to last less than 14-15 months. This is because longer runs will be outcompeted by runs that start later and therefore use better hardware and better algorithms.&lt;/p&gt;
&lt;p&gt; Larger compute budgets and a better understanding of how to effectively use compute (through, for example, using scaling laws) are two major driving forces of progress in recent machine learning.&lt;/p&gt;
&lt;p&gt; There are many ways to increase your effective compute budget: better hardware, rising investments in AI R&amp;amp;D and improvements in algorithmic efficiency. In this article we investigate one often-overlooked but plausibly important factor: how long—in terms of wall-clock time—you are willing to train your model for.&lt;/p&gt;
&lt;p&gt; Here we explore a simple mathematical framework for estimating the optimal duration of a training run. A researcher is tasked with training a model by some deadline, and must decide when to start their training run. The researcher is faced with a key problem [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(02:52) A simple framework for training run lengths&lt;/p&gt;&lt;p&gt;(06:28) Accounting for increasing dollar-budgets&lt;/p&gt;&lt;p&gt;(07:48) Accounting for increased algorithmic efficiency&lt;/p&gt;&lt;p&gt;(09:29) Accounting for hardware swapping&lt;/p&gt;&lt;p&gt;(12:34) Accounting for stochasticity&lt;/p&gt;&lt;p&gt;(13:28) Fixed deadlines&lt;/p&gt;&lt;p&gt;(14:28) Renting hardware&lt;/p&gt;&lt;p&gt;(15:41) Conclusion&lt;/p&gt;&lt;p&gt;(18:00) Acknowledgements&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 5 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          August 17th, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/the-longest-training-run?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/the-longest-training-run&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/the-longest-training-run/training-run.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/the-longest-training-run/training-run.png" alt="In blue, total amount of compute consumed by training runs starting at different years, given a deadline T equals 2030 and an investment of $1B. In brown, the hardware price-performance, assuming an initial price-performance of There's a complex formula here. in 2022 and a rate of improvement of There's a complex formula here. (see Hobbhahn and Besiroglu, 2022)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 17 Aug 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">aee509bd-b469-4d84-b1bd-d8e2568d50ad</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/aee509bd-b469-4d84-b1bd-d8e2568d50ad.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla%252C%2520Tamay%2520Besiroglu%252C%2520Owen%2520Dudney%252C%2520Anson%2520Ho&amp;title=%22The%20longest%20training%20run%22%20by%20Jaime%20Sevilla%2C%20Tamay%20Besiroglu%2C%20Owen%20Dudney%2C%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fthe-longest-training-run&amp;created_at=2026-05-18T19%3A09%3A04.258555%2B00%3A00&amp;duration=1144" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/the-longest-training-run</link>
      <itunes:duration>1144</itunes:duration>
    </item>
    <item>
      <title>“A time-invariant version of Laplace’s rule” by Jaime Sevilla, Ege Erdil</title>
      <description>&lt;p&gt; Subtitle: We explore how to estimate the probability of an event given information of past occurrences. We explain a problem with the naive application of Laplace's rule in this context, and suggest a modification to correct it.&lt;/p&gt;  &lt;p&gt; What is the probability that the sun will rise tomorrow? What are the chances of a pandemic happening next year? What are the odds of survival of a new surgery that has been successfully executed only once?&lt;/p&gt;
&lt;p&gt; These and many other questions can be answered appealing to a general rule: Laplace's rule of succession. This rule describes the probability of a positive outcome given information about past successes. The versatility and generality of the rule makes it an invaluable tool to forecasters, who use it to estimate base rates1.&lt;/p&gt;
&lt;p&gt; Laplace's rule can be stated in simple terms. If we have repeated an experiment (T) times, and observed (S) successes, we can estimate the posterior probability of obtaining a success in the next trial as (p = frac{S+1}{T+2}). &lt;/p&gt;
&lt;p&gt; However, there is a fatal problem when applying the rule to observations over a time period, where the definition of what constitutes a trial is not as clear. For example [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(04:20) A refresher: Laplace's rule of succession&lt;/p&gt;&lt;p&gt;(05:36) The problem of time&lt;/p&gt;&lt;p&gt;(07:08) A problem of priors&lt;/p&gt;&lt;p&gt;(08:24) Continuous improvement&lt;/p&gt;&lt;p&gt;(09:47) The scale invariant prior&lt;/p&gt;&lt;p&gt;(13:01) Inference with the scale-invariant prior&lt;/p&gt;&lt;p&gt;(14:03) Unprecedented success&lt;/p&gt;&lt;p&gt;(16:34) Adjusting for a variable observation period&lt;/p&gt;&lt;p&gt;(18:10) Putting it all together&lt;/p&gt;&lt;p&gt;(18:41) An example: Earthquakes in Chile&lt;/p&gt;&lt;p&gt;(23:38) Conclusion&lt;/p&gt;&lt;p&gt;(25:48) Acknowledgements&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 11 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          July 15th, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/a-time-invariant-version-of-laplace-s-rule?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/a-time-invariant-version-of-laplace-s-rule&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/a-time-invariant-version-of-laplace-s-rule/time-invariant-laplace.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/a-time-invariant-version-of-laplace-s-rule/time-invariant-laplace.png" alt="Graph showing values of the formula \((1+\frac{t}{T})^{-S}\) for various values of \(S\)" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Fri, 15 Jul 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">ba81a431-09da-4efe-96da-ba121a36bb20</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/ba81a431-09da-4efe-96da-ba121a36bb20.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla%252C%2520Ege%2520Erdil&amp;title=%22A%20time-invariant%20version%20of%20Laplace%E2%80%99s%20rule%22%20by%20Jaime%20Sevilla%2C%20Ege%20Erdil&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fa-time-invariant-version-of-laplace-s-rule&amp;created_at=2026-05-18T19%3A09%3A05.253212%2B00%3A00&amp;duration=1608" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/a-time-invariant-version-of-laplace-s-rule</link>
      <itunes:duration>1608</itunes:duration>
    </item>
    <item>
      <title>“Machine learning model sizes and the parameter gap” by Pablo Villalobos, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Anson Ho, Marius Hobbhahn</title>
      <description>&lt;p&gt; Subtitle: The model size of notable machine learning systems has grown ten times faster than before since 2018. After 2020 growth has not been entirely continuous: there was a jump of one order of magnitude which persists until today. This is relevant for forecasting model size and thus AI capabilities.&lt;/p&gt; 
&lt;p&gt; Summary: The model size of notable machine learning systems has grown ten times faster than before since 2018. After 2020 growth has not been entirely continuous: there was a jump of one order of magnitude which persists until today. This is relevant for forecasting model size and thus AI capabilities.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Trends in model size&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; In current ML systems, model size (number of parameters) is related to performance via known scaling laws. We used our dataset to analyze trends in the model size of 237 milestone machine learning systems. The systems are categorized into Language, Vision, Games and Other according to the task they solve.&lt;/p&gt;&lt;p&gt; Model size slowly increased by 7 orders of magnitude from the 1950s to around 2018. Since 2018, growth has accelerated for language models, with model size increasing by another 4 orders of magnitude in the four years from 2018 to 2022 (see Figure [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:57) Trends in model size&lt;/p&gt;&lt;p&gt;(02:23) The parameter gap&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          July 5th, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/machine-learning-model-sizes-and-the-parameter-gap?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/machine-learning-model-sizes-and-the-parameter-gap&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/machine-learning-model-sizes-and-the-parameter-gap/983b0ddcc0d4551a.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/machine-learning-model-sizes-and-the-parameter-gap/983b0ddcc0d4551a.png" alt="Figure 1. Left: Transition period around 2018, assuming a single post-2018 trend. Right: the same period, assuming two separate post-2018 trends." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/machine-learning-model-sizes-and-the-parameter-gap/7554b34cf56529b1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/machine-learning-model-sizes-and-the-parameter-gap/7554b34cf56529b1.png" alt="Figure 2: Model size over time, separated by domain. Red lines highlight the parameter gap. Most systems above the gap are language or multimodal models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Tue, 05 Jul 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">ad01ba7d-e81b-41da-b6fb-30e2278a8a0f</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/ad01ba7d-e81b-41da-b6fb-30e2278a8a0f.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Pablo%2520Villalobos%252C%2520Jaime%2520Sevilla%252C%2520Tamay%2520Besiroglu%252C%2520Lennart%2520Heim%252C%2520Anson%2520Ho%252C%2520Marius%2520Hobbhahn&amp;title=%22Machine%20learning%20model%20sizes%20and%20the%20parameter%20gap%22%20by%20Pablo%20Villalobos%2C%20Jaime%20Sevilla%2C%20Tamay%20Besiroglu%2C%20Lennart%20Heim%2C%20Anson%20Ho%2C%20Marius%20Hobbhahn&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fmachine-learning-model-sizes-and-the-parameter-gap&amp;created_at=2026-05-18T19%3A09%3A06.425911%2B00%3A00&amp;duration=237" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/machine-learning-model-sizes-and-the-parameter-gap</link>
      <itunes:duration>237</itunes:duration>
    </item>
    <item>
      <title>“Trends in GPU price-performance” by Marius Hobbhahn, Tamay Besiroglu</title>
      <description>&lt;p&gt; Subtitle: Using a dataset of 470 models of graphics processing units released between 2006 and 2021, we find that the amount of floating-point operations/second per $ doubles every ~2.5 years.&lt;/p&gt;  &lt;p&gt; We would like to thank Alyssa Vance, Ashwin Acharya, Jessica Taylor and the Epoch AI team for helpful feedback and comments.&lt;/p&gt;
&lt;br&gt; 
&lt;p&gt;&lt;strong&gt; Executive Summary&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Using a dataset of 470 models of graphics processing units (GPUs) released between 2006 and 2021, we find that the amount of floating-point operations/second per $ (hereafter FLOP/s per $) doubles every ~2.5 years. For top GPUs at any point in time, we find a slower rate of improvement (FLOP/s per $ doubles every 2.95 years), while for models of GPU typically used in ML research, we find a faster rate of improvement (FLOP/s per $ doubles every 2.07 years). GPU price-performance improvements have generally been slightly slower than the 2-year doubling time associated with Moore's law, much slower than what is implied by Huang's law, yet considerably faster than was generally found in prior work on trends in GPU price-performance. We aim to provide a more precise characterization of GPU price-performance trends based on more or higher-quality data, that is more robust [...]&lt;/p&gt;&lt;/br&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:36) Executive Summary&lt;/p&gt;&lt;p&gt;(02:11) Introduction&lt;/p&gt;&lt;p&gt;(05:07) Dataset&lt;/p&gt;&lt;p&gt;(07:17) Empirical analysis&lt;/p&gt;&lt;p&gt;(07:40) Empirical trend vs. other predictions&lt;/p&gt;&lt;p&gt;(10:06) Trends across precision for floating formats&lt;/p&gt;&lt;p&gt;(11:24) Trends of GPUs used in ML&lt;/p&gt;&lt;p&gt;(13:48) Trend of top-performing GPUs&lt;/p&gt;&lt;p&gt;(14:49) All trends (table &amp;amp; figure)&lt;/p&gt;&lt;p&gt;(15:30) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 10 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 27th, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/trends-in-gpu-price-performance?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/trends-in-gpu-price-performance&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image3.png" alt="Figure 1. Plots of FLOP/s and FLOP/s per dollar for our dataset and relevant trends from the existing literature" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image1.png" alt="Figure 2. Plots of FLOP/s and FLOP/s per dollar for Median Group’s and Sun et al., 2019’s datasets." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image11.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image11.png" alt="Figure 3. Plots of FLOP/s and FLOP/s per dollar for the dataset used in our analysis" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image5.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image5.png" alt="Figure 4. FLOP/s per dollar for our dataset and relevant trends found elsewhere" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image9.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image9.png" alt="Figure 5. FLOP/s per dollar for FP32 and FP16 performance" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image12.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image12.png" alt="Figure 6. FLOP/s per dollar for our dataset and separately for GPU models commonly used in ML research compared to relevant trends found elsewhere" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image10.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image10.png" alt="Figure 7. FLOP/s per dollar for our dataset and separately for top-performing GPUs compared to relevant trends found elsewhere" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/trends-in-gpu-price-performance/image3.png" alt="Figure 8. FLOP/s per dollar for our dataset and various subgroups compared to relevant trends found elsewhere" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 27 Jun 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">27067cc4-248a-4f93-b509-7455e366f499</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/27067cc4-248a-4f93-b509-7455e366f499.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Marius%2520Hobbhahn%252C%2520Tamay%2520Besiroglu&amp;title=%22Trends%20in%20GPU%20price-performance%22%20by%20Marius%20Hobbhahn%2C%20Tamay%20Besiroglu&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Ftrends-in-gpu-price-performance&amp;created_at=2026-05-18T19%3A09%3A07.337224%2B00%3A00&amp;duration=1006" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/trends-in-gpu-price-performance</link>
      <itunes:duration>1006</itunes:duration>
    </item>
    <item>
      <title>“Announcing Epoch AI: A research initiative investigating the road to transformative AI” by The Epoch AI Team</title>
      <description>&lt;p&gt; Subtitle: We are a new research initiative forecasting developments in AI. Come join us!&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt; Summary&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; We are a new research initiative working on investigating trends in machine learning and forecasting the development of transformative Artificial Intelligence&lt;/li&gt;
&lt;li&gt; This work is done in close collaboration with other organizations, like Rethink Priorities and Open Philanthropy&lt;/li&gt;
&lt;li&gt; We will be hiring for 2-4 full-time roles this summer – more information here&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt; What is Epoch AI?&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Epoch AI is a new research organization that works to support AI strategy and improve forecasts around the development of transformative Artificial Intelligence – AI systems that have the potential to have an effect on society as large as that of the industrial revolution.&lt;/p&gt;&lt;p&gt; Our founding team consists of seven members – Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Pablo Villalobos, Edu Roldán, Marius Hobbhahn, and Anson Ho. Collectively, we have backgrounds in Machine Learning, Statistics, Economics, Forecasting, Physics, Computer Engineering, and Software Engineering.&lt;/p&gt;&lt;p&gt; Our work involves close collaboration with other organizations, Open Philanthropy, and Rethink Priorities’ AI Governance and Strategy team. We are advised by Tom Davidson from Open Philanthropy and Neil Thompson. Rethink Priorities is also our fiscal sponsor.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Our mission&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:21) Summary&lt;/p&gt;&lt;p&gt;(00:45) What is Epoch AI?&lt;/p&gt;&lt;p&gt;(01:55) Our mission&lt;/p&gt;&lt;p&gt;(02:29) Our research agenda&lt;/p&gt;&lt;p&gt;(03:45) Our work so far&lt;/p&gt;&lt;p&gt;(04:50) Hiring&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 23rd, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/announcing-epoch?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/announcing-epoch&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/announcing-epoch/research-agenda-sketch.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/announcing-epoch/research-agenda-sketch.png" alt="A sketch of Epoch AI’s research agenda. We plan to develop quantitative models to forecast advanced AI capabilities, and to research and extrapolate trends in machine learning." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/announcing-epoch/bioanchors-diagram.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/announcing-epoch/bioanchors-diagram.png" alt="Diagram summarizing Ajeya Cotra’s biological anchors model." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/announcing-epoch/founding-members.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/announcing-epoch/founding-members.png" alt="Six people gathered around laptops at a conference table." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/announcing-epoch/op-logo.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/announcing-epoch/op-logo.png" alt="Open Philanthropy logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/announcing-epoch/rp-logo-old.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/announcing-epoch/rp-logo-old.png" alt="Rethink Priorities logo" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 23 Jun 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">ebc5abe3-d999-4cb6-9c22-c10e93068707</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/ebc5abe3-d999-4cb6-9c22-c10e93068707.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=The%2520Epoch%2520AI%2520Team&amp;title=%22Announcing%20Epoch%20AI%3A%20A%20research%20initiative%20investigating%20the%20road%20to%20transformative%20AI%22%20by%20The%20Epoch%20AI%20Team&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fannouncing-epoch&amp;created_at=2026-05-18T19%3A09%3A08.29497%2B00%3A00&amp;duration=340" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/announcing-epoch</link>
      <itunes:duration>340</itunes:duration>
    </item>
    <item>
      <title>“Grokking “Semi-informative priors over AI timelines”” by Anson Ho</title>
      <description>&lt;p&gt; Subtitle: I give visual explanations for Tom Davidson's report, Semi-informative priors over AI timelines, and summarise the key assumptions and intuitions. &lt;/p&gt;  &lt;p&gt; Notes:&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; I give visual explanations for Tom Davidson's report, Semi-informative priors over AI timelines, and summarise the key assumptions and intuitions&lt;/li&gt;
&lt;li&gt; The diagrams can be found here – you can click on some of the boxes to get linked to the part of the report that you’re interested in1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt; Thanks to the Epoch AI team for feedback and support! Thanks especially to Jaime Sevilla and Tom Davidson for providing detailed feedback.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Executive Summary&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; The framework in Semi-informative priors over AI timelines assumes a model of AGI development which consists of a sequence of Bernoulli trials, i.e. it treats each calendar year as a “trial” at building AGI with constant probability p of succeeding.&lt;/p&gt;&lt;p&gt; Image source: Davidson, 2021&lt;/p&gt;&lt;p&gt; However, we don’t know what this value of p is, so we use a generalisation of Laplace's rule of succession to estimate —there's a complex formula here. See the original text—. This is done by specifying a first-trial probability, the probability of successfully building AGI in the first year of AI research, together [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:51) Executive Summary&lt;/p&gt;&lt;p&gt;(04:26) Motivation&lt;/p&gt;&lt;p&gt;(05:25) Laplace's Rule of Succession&lt;/p&gt;&lt;p&gt;(09:09) Making the priors less uninformative&lt;/p&gt;&lt;p&gt;(12:33) Semi-informative priors demystified&lt;/p&gt;&lt;p&gt;(13:05) First-trial probability&lt;/p&gt;&lt;p&gt;(14:50) Number of virtual successes&lt;/p&gt;&lt;p&gt;(15:43) Regime start time&lt;/p&gt;&lt;p&gt;(17:13) Trial definition&lt;/p&gt;&lt;p&gt;(19:43) Putting things together: Final distribution&lt;/p&gt;&lt;p&gt;(19:47) Model Extensions&lt;/p&gt;&lt;p&gt;(20:25) Final Distribution&lt;/p&gt;&lt;p&gt;(22:08) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 18 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 13th, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/grokking-semi-informative-priors?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/grokking-semi-informative-priors&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/318010ae017131d2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/318010ae017131d2.png" alt="Image source: Davidson, 2021" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/5801542dc4b941f6.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/5801542dc4b941f6.png" alt="Flowchart showing probability distribution model for AGI prediction by year." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/0e96023dbd8299e3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/0e96023dbd8299e3.png" alt="Flowchart showing relationships between AGI probability distributions and trial parameters across different time measurements." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/6ad4f0c977586ace.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/6ad4f0c977586ace.png" alt="Line graph showing weighted average probability of AGI by 2036: 7.5%" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/318010ae017131d2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/318010ae017131d2.png" alt="Image source: Davidson, 2021" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/5801542dc4b941f6.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/5801542dc4b941f6.png" alt="Adapted from Davidson (2021)" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/85d9ac5d722caad3.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/85d9ac5d722caad3.png" alt="Flowchart showing reference classes for evaluating first-trial probability of building AGI." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/3cba76c34b918e4c.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/3cba76c34b918e4c.png" alt="Diagram showing how number of virtual successes affects AGI probability updates." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/fb1ee9154e99e21b.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/fb1ee9154e99e21b.png" alt="Flowchart showing factors affecting observed failed trials in AGI development timing." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/d850b3372d993609.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/d850b3372d993609.png" alt="Flowchart showing how R&amp;amp;D input increases are measured and their relationship to AI economic growth models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/662be296ebfe5911.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/662be296ebfe5911.png" alt="Flowchart showing relationships between AGI probability distributions and trial data across different time frameworks." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/db0063f50c2027d1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-semi-informative-priors/db0063f50c2027d1.png" alt="Line graph showing weighted average probability of AGI by 2036: 7.5%" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 13 Jun 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">dc324953-bbc4-4eab-96fc-11ecd9da0c99</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/dc324953-bbc4-4eab-96fc-11ecd9da0c99.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho&amp;title=%22Grokking%20%E2%80%9CSemi-informative%20priors%20over%20AI%20timelines%E2%80%9D%22%20by%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fgrokking-semi-informative-priors&amp;created_at=2026-05-18T19%3A09%3A09.260626%2B00%3A00&amp;duration=1391" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/grokking-semi-informative-priors</link>
      <itunes:duration>1391</itunes:duration>
    </item>
    <item>
      <title>“Grokking “Forecasting TAI with biological anchors”” by Anson Ho</title>
      <description>&lt;p&gt; Subtitle: I give a visual explanation of Ajeya Cotra's draft report, Forecasting TAI with biological anchors, summarising the key assumptions, intuitions, and conclusions.&lt;/p&gt;  &lt;p&gt; Notes:&lt;/p&gt;
&lt;ul&gt; 
&lt;li&gt; I give a visual explanation of Ajeya Cotra's draft report, Forecasting TAI with biological anchors (Cotra, 2020), summarising the key assumptions, intuitions, and conclusions&lt;/li&gt;
&lt;li&gt; The diagrams can be found here – you can click on some of the boxes to get linked to the part of the report that you’re interested in1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt; Thanks to Michael Aird, Ashwin Acharya, and the Epoch AI team for suggestions and feedback! Special thanks to Jaime Sevilla and Ajeya Cotra for detailed feedback.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Executive Summary&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Click here to skip the summary&lt;/p&gt;&lt;p&gt; Ajeya Cotra's biological anchors framework attempts to forecast the development of Transformative AI (TAI) by treating compute as a key bottleneck to AI progress. This lets us focus on a concrete measure (compute, measured in FLOP) as a proxy for the question “when will TAI be developed?” Given this, we can decompose the question into two main questions:&lt;/p&gt;&lt;ol&gt; 
&lt;li&gt; 2020 training compute requirements: How much compute will we need to train TAI, using 2020 machine learning architectures and algorithms?&lt;/li&gt;
&lt;li&gt; Affordability of [...]&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:56) Executive Summary&lt;/p&gt;&lt;p&gt;(04:18) Motivation&lt;/p&gt;&lt;p&gt;(05:23) Why focus on compute?&lt;/p&gt;&lt;p&gt;(08:19) Framework&lt;/p&gt;&lt;p&gt;(13:59) Zooming Into the Biological Anchors&lt;/p&gt;&lt;p&gt;(14:33) Evolution anchor&lt;/p&gt;&lt;p&gt;(15:37) Lifetime anchor&lt;/p&gt;&lt;p&gt;(17:35) Neural network anchors&lt;/p&gt;&lt;p&gt;(19:04) Genome anchor&lt;/p&gt;&lt;p&gt;(20:00) Affordability of compute&lt;/p&gt;&lt;p&gt;(22:12) Putting Things Together: Final distribution&lt;/p&gt;&lt;p&gt;(24:04) Conclusion&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 13 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 6th, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/grokking-bioanchors?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/grokking-bioanchors&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/ebc9a4adec5a148f.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/ebc9a4adec5a148f.png" alt="Flowchart showing relationships between transformative AI model probabilities and computational requirements." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/ebc9a4adec5a148f.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/ebc9a4adec5a148f.png" alt="Flowchart showing relationships between transformative AI model probabilities and computational requirements." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/98e6d00adea86322.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/98e6d00adea86322.png" alt="Flowchart showing training FLOP components for genome and neural network anchors." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/efa53c18d6469743.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/efa53c18d6469743.png" alt="Flowchart showing evolution anchor based on total neuron FLOP over evolution." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/4236e30b7b4906d2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/4236e30b7b4906d2.png" alt="Flowchart showing "Lifetime anchor" based on brain compute with supporting factors and contradictions." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/ff6309123d58174e.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/ff6309123d58174e.png" alt="Image source: (For the evolutionary tree) evogeneao Tree of Life Explorer" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/96a5a3f4a62e9e4c.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/96a5a3f4a62e9e4c.png" alt="Flowchart showing neural network anchors based on brain FLOP per second and parameter scaling laws." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/f962121d61f59381.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/f962121d61f59381.png" alt="Flowchart showing genome anchor concept with FLOP calculations and training data requirements." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/e47121ce6fda1588.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/e47121ce6fda1588.png" alt="Flowchart diagram showing factors influencing affordability of compute with multiple connected boxes." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/1fbd692d136208f4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/grokking-bioanchors/1fbd692d136208f4.png" alt="Graph showing cumulative probability of affordable transformative model training by year, conditional on different AI development scenarios." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 06 Jun 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">067cd64e-2a6f-46f5-b1bf-9caaed549d4b</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/067cd64e-2a6f-46f5-b1bf-9caaed549d4b.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Anson%2520Ho&amp;title=%22Grokking%20%E2%80%9CForecasting%20TAI%20with%20biological%20anchors%E2%80%9D%22%20by%20Anson%20Ho&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fgrokking-bioanchors&amp;created_at=2026-05-18T19%3A12%3A20.882206%2B00%3A00&amp;duration=1509" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/grokking-bioanchors</link>
      <itunes:duration>1509</itunes:duration>
    </item>
    <item>
      <title>“Projecting compute trends in machine learning” by Tamay Besiroglu, Lennart Heim, Jaime Sevilla</title>
      <description>&lt;p&gt; Subtitle: Projecting forward 70 years' worth of trends in the amount of compute used to train machine learning models.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Summary&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; Using our dataset of milestone machine learning models, and our recent analysis of compute trends in ML, we project forward 70 years’ worth of trends in the amount of compute used to train machine learning models. Our simulations account for (a) uncertainty in estimates of the growth rates in compute usage during the Deep Learning (DL)-era and Pre-DL era, and (b) uncertainty over the ‘reversion date’, i.e. the date when the current DL-era compute trend (with a ~6 month doubling time) will end and revert to the historically more common trend associated with Moore's law. Assuming a reversion date of between 8 to 18 years, and without accounting for algorithmic progress, our projections suggest that the median of Cotra 2020's biological anchors may be surpassed around August 2046 [95% CI: Jun 2039, Jul 2060]. This suggests that historical rates of compute scaling, if sustained briefly (relative to how long these trends have been around so far), could result in the emergence of transformative models.&lt;/p&gt;&lt;p&gt; Our work can be replicated using this Colab notebook.&lt;/p&gt;&lt;p&gt; Note: we present projections, not [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:21) Summary&lt;/p&gt;&lt;p&gt;(01:57) Introduction&lt;/p&gt;&lt;p&gt;(04:19) When will the current scaling trend revert back to Moore's law?&lt;/p&gt;&lt;p&gt;(07:42) Projecting ML compute trends&lt;/p&gt;&lt;p&gt;(09:15) Conclusion&lt;/p&gt;&lt;p&gt;(10:02) Details of the simulations&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 1 footnote which was omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          March 7th, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/projecting-compute-trends?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/projecting-compute-trends&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/projecting-compute-trends/cafe0092e57715bb.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/projecting-compute-trends/cafe0092e57715bb.jpg" alt="Figure 1. Contrasting our work with that of Cotra 2020" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/projecting-compute-trends/5c252528be80c64b.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/projecting-compute-trends/5c252528be80c64b.jpg" alt="Fig 2. Distributions that roughly correspond to the three scenarios that come out of our replication of Carey, 2018 __T3A_FOOTNOTE_REMOVED__" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/projecting-compute-trends/28e15ef408cc9434.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/projecting-compute-trends/28e15ef408cc9434.jpg" alt="Fig 3. our best-guess for a prior over reversion dates, formed by mixing the previous distributions" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/projecting-compute-trends/d7251f9ab5fd82f3.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/projecting-compute-trends/d7251f9ab5fd82f3.jpg" alt="Fig 4. 10,000 projected compute paths. Solid line represents the median projected compute at each date, and the shaded region represents 2-standard deviations around the median." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/projecting-compute-trends/c9758f1068a60ecd.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/projecting-compute-trends/c9758f1068a60ecd.jpg" alt="Line graph showing w(t) declining over time for three reversion dates: 2025, 2030, 2035." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 07 Mar 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">19466908-8961-4f1e-9321-5a6c882dafc0</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/19466908-8961-4f1e-9321-5a6c882dafc0.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Tamay%2520Besiroglu%252C%2520Lennart%2520Heim%252C%2520Jaime%2520Sevilla&amp;title=%22Projecting%20compute%20trends%20in%20machine%20learning%22%20by%20Tamay%20Besiroglu%2C%20Lennart%20Heim%2C%20Jaime%20Sevilla&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fprojecting-compute-trends&amp;created_at=2026-05-18T19%3A12%3A20.922858%2B00%3A00&amp;duration=805" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/projecting-compute-trends</link>
      <itunes:duration>805</itunes:duration>
    </item>
    <item>
      <title>“Compute trends across three eras of machine learning” by Jaime Sevilla, Lennart Heim, Anson Ho, Tamay Besiroglu, Marius Hobbhahn, Pablo Villalobos</title>
      <description>&lt;p&gt; Subtitle: We’ve compiled a dataset of the training compute for over 120 machine learning models, highlighting novel trends and insights into the development of AI since 1952, and what to expect going forward.". &lt;/p&gt; 
&lt;p&gt; Summary: We have collected a dataset and analysed key trends in the training compute of machine learning models since 1950. We identify three major eras of training compute - the pre-Deep Learning Era, the Deep Learning Era, and the Large-Scale Era. Furthermore, we find that the training compute has grown by a factor of 10 billion since 2010, with a doubling rate of around 5-6 months. See our recent paper, Compute Trends Across Three Eras of Machine Learning, for more details.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; It is well known that progress in machine learning (ML) is driven by three primary factors - algorithms, data, and compute. This makes intuitive sense - the development of algorithms like backpropagation transformed the way that machine learning models were trained, leading to significantly improved efficiency compared to previous optimisation techniques (Goodfellow et al., 2016; Rumelhart et al., 1986). Data has been becoming increasingly available, particularly with the advent of “big data” in recent years. At the same time, progress in [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:00) Introduction&lt;/p&gt;&lt;p&gt;(03:30) Methodology&lt;/p&gt;&lt;p&gt;(05:42) Results&lt;/p&gt;&lt;p&gt;(06:42) Compute trends are slower than previously reported&lt;/p&gt;&lt;p&gt;(07:31) Three eras of machine learning&lt;/p&gt;&lt;p&gt;(09:44) Implications and further work&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 5 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          February 16th, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/compute-trends?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/compute-trends&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Wed, 16 Feb 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">1485721b-dbf0-46e8-927c-b6ce3c5f9e0d</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/1485721b-dbf0-46e8-927c-b6ce3c5f9e0d.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla%252C%2520Lennart%2520Heim%252C%2520Anson%2520Ho%252C%2520Tamay%2520Besiroglu%252C%2520Marius%2520Hobbhahn%252C%2520Pablo%2520Villalobos&amp;title=%22Compute%20trends%20across%20three%20eras%20of%20machine%20learning%22%20by%20Jaime%20Sevilla%2C%20Lennart%20Heim%2C%20Anson%20Ho%2C%20Tamay%20Besiroglu%2C%20Marius%20Hobbhahn%2C%20Pablo%20Villalobos&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fcompute-trends&amp;created_at=2026-05-18T19%3A12%3A21.81308%2B00%3A00&amp;duration=677" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/compute-trends</link>
      <itunes:duration>677</itunes:duration>
    </item>
    <item>
      <title>“Estimating training compute of deep learning models” by Jaime Sevilla, Lennart Heim, Marius Hobbhahn, Tamay Besiroglu, Anson Ho, Pablo Villalobos</title>
      <description>&lt;p&gt; Subtitle: We describe two approaches for estimating the training compute of Deep Learning systems, by counting operations and looking at GPU time.&lt;/p&gt; 
&lt;p&gt; ML Models trained on more compute have better performance and more advanced capabilities (see e.g. Kaplan et al., 2020 or Hoffman et al., 2022). Due to this, estimating and reporting compute usage is crucial to enable accurate comparisons between ML models.&lt;/p&gt;
&lt;p&gt; Compute usage is commonly measured as the number of floating point operations (FLOP) required to train the final version of the system. To estimate this we can resort to two strategies: a) using information about the architecture and amount of training data, or b) using information about the hardware used and training time.&lt;/p&gt;
&lt;p&gt; Below we provide two calculators that illustrate these methods.&lt;/p&gt;


&lt;p&gt; Do you see a mistake or do you want to submit missing information about hardware specs? Fill this form and we will look into it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; In this article we will explain (with examples) how to estimate the amount of compute used to train an AI system. We will explain two procedures, one based on the architecture of the network and number of training batches processed; and another based [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(01:16) Introduction&lt;/p&gt;&lt;p&gt;(02:05) Method 1: Counting operations in the model&lt;/p&gt;&lt;p&gt;(04:07) The ratio of backward pass operations to forward pass operations&lt;/p&gt;&lt;p&gt;(05:03) Forward pass compute and parameter counts of common layers&lt;/p&gt;&lt;p&gt;(06:45) Example: CNN-LSTM-FCN model&lt;/p&gt;&lt;p&gt;(10:12) Example: Transformer&lt;/p&gt;&lt;p&gt;(13:26) Method 2: GPU time&lt;/p&gt;&lt;p&gt;(14:18) Estimating the number of FLOP from the GPU time&lt;/p&gt;&lt;p&gt;(15:07) Which number representation is used?&lt;/p&gt;&lt;p&gt;(19:00) Imputing GPU performance when the hardware model is not known&lt;/p&gt;&lt;p&gt;(20:36) About GPU utilization rates&lt;/p&gt;&lt;p&gt;(24:42) Example: Image GPT&lt;/p&gt;&lt;p&gt;(26:58) Conclusion&lt;/p&gt;&lt;p&gt;(28:12) Acknowledgements&lt;/p&gt;&lt;p&gt;(28:43) Bibliography&lt;/p&gt; &lt;p&gt;&lt;i&gt;The original text contained 8 footnotes which were omitted from this narration.&lt;/i&gt; &lt;/p&gt;&lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          January 20th, 2022 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/estimating-training-compute?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/estimating-training-compute&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/estimating-training-compute/0498ba4b4a26776f.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/estimating-training-compute/0498ba4b4a26776f.png" alt="Figure 1: Diagram of the many-to-one CNN-LSTM-FCN from the example. Source __T3A_LINK_IN_POST__." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/estimating-training-compute/e4bf1125a4957cb1.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/estimating-training-compute/e4bf1125a4957cb1.png" alt="Figure 2: Diagram of the example Transformer architecture. Source __T3A_LINK_IN_POST__." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/estimating-training-compute/9aa2a0a25102d108.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/estimating-training-compute/9aa2a0a25102d108.png" alt="Figure 3: Specification sheet of NVIDIA A100 Tensor Core GPU. Source __T3A_LINK_IN_POST__. The asterisk indicates the performance assuming sparsity (which is only relevant for inference)." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/estimating-training-compute/48dfc7af53605ca8.jpg" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/estimating-training-compute/48dfc7af53605ca8.jpg" alt="Figure 4: Typical peak performance of commonly used hardware over time." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2022/estimating-training-compute/c96249b5ae32053e.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2022/estimating-training-compute/c96249b5ae32053e.png" alt="Figure 5: Specification sheet of NVIDIA V100 GPU." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 20 Jan 2022 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">fedba029-049b-41db-baac-289eb199e1bc</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/fedba029-049b-41db-baac-289eb199e1bc.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla%252C%2520Lennart%2520Heim%252C%2520Marius%2520Hobbhahn%252C%2520Tamay%2520Besiroglu%252C%2520Anson%2520Ho%252C%2520Pablo%2520Villalobos&amp;title=%22Estimating%20training%20compute%20of%20deep%20learning%20models%22%20by%20Jaime%20Sevilla%2C%20Lennart%20Heim%2C%20Marius%20Hobbhahn%2C%20Tamay%20Besiroglu%2C%20Anson%20Ho%2C%20Pablo%20Villalobos&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Festimating-training-compute&amp;created_at=2026-05-18T19%3A12%3A22.839928%2B00%3A00&amp;duration=1764" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/estimating-training-compute</link>
      <itunes:duration>1764</itunes:duration>
    </item>
    <item>
      <title>“What’s the backward-forward FLOP ratio for neural networks?” by Marius Hobbhahn, Jaime Sevilla</title>
      <description>&lt;p&gt; Subtitle: Determining the backward-forward FLOP ratio for neural networks, to help calculate their total training compute.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt; Summary&lt;/strong&gt;&lt;/p&gt;&lt;ol&gt; 
&lt;li&gt; Classic settings, i.e. deep networks with convolutional layers and large batch sizes, almost always have backward-forward FLOP ratios close to 2:1.&lt;/li&gt;
&lt;li&gt; Depending on the following criteria we can encounter ratios between 1:1 and 3:1
&lt;ol&gt; 
&lt;li&gt; Type of layer: Passes through linear layers have as many FLOP as they use to do weight updates. Convolutional layers have many more FLOP for passes than for weight updates. Therefore, in CNNs, FLOP for weight updates basically play no role.&lt;/li&gt;
&lt;li&gt; Batch size: Weights are updated after the gradients of the batch have been aggregated. Thus, FLOP for passes increase with batch size but stay constant for weight updates.&lt;/li&gt;
&lt;li&gt; Depth: The first layer has a backward-forward ratio of 1:1 while all others have 2:1. Therefore, the overall ratio is influenced by the fraction of FLOP in first vs. FLOP in other layers.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt; We assume the network is being optimized by stochastic gradient descent (w += ɑ⋅dw) and count the weight update as part of the backward pass. Other optimizers would imply different FLOP counts and could create ratios [...]&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:21) Summary&lt;/p&gt;&lt;p&gt;(01:47) Introduction&lt;/p&gt;&lt;p&gt;(02:22) Theory&lt;/p&gt;&lt;p&gt;(04:49) Empirical results&lt;/p&gt;&lt;p&gt;(06:00) Backward and forward FLOP in the first and the rest of the layers&lt;/p&gt;&lt;p&gt;(06:39) Type of layer&lt;/p&gt;&lt;p&gt;(07:44) Batch size&lt;/p&gt;&lt;p&gt;(08:15) Depth&lt;/p&gt;&lt;p&gt;(09:41) Combining all above&lt;/p&gt;&lt;p&gt;(11:05) Conclusion&lt;/p&gt;&lt;p&gt;(11:37) Acknowledgment&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          December 13th, 2021 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/backward-forward-FLOP-ratio?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/backward-forward-FLOP-ratio&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/0696f12e1baa58bd.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/0696f12e1baa58bd.png" alt="Black screen with no visible content." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/6c88508315ca7e45.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/6c88508315ca7e45.png" alt="Diagram of a neural network with input, two hidden, and output layers." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/441155e1c6a9eb72.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/441155e1c6a9eb72.png" alt="Table showing neural network operations with direction, operation type, and FLOPs columns." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/5520b38d8fd4635d.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/5520b38d8fd4635d.png" alt="Table comparing parameters and floating-point operations for fully connected and CNN layers." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/5ab73032d2e6c9c7.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/5ab73032d2e6c9c7.png" alt="Table showing neural network layers with direction, operation type, and FLOPs columns." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/cefd5499ebe5f467.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/cefd5499ebe5f467.png" alt="Table showing FLOP counts and ratios for neural network operations across different batch sizes." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/24b350f25504745f.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/24b350f25504745f.png" alt="Bar graph titled "FLOP backward-forward ratios" showing ratios across intermediate layer counts." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/2002871ef45a034c.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/2002871ef45a034c.png" alt="Backward-forward FLOP ratio in different architectures. Read the labels as architecture_batchsize." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/4b8e17a80ffe29b2.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/backward-forward-FLOP-ratio/4b8e17a80ffe29b2.png" alt="Bar chart titled "FLOP backward-forward ratios" comparing three network types across different layer configurations." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 13 Dec 2021 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">94b1c153-7ca5-4f39-bcd7-b6aa03f39bb6</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/94b1c153-7ca5-4f39-bcd7-b6aa03f39bb6.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Marius%2520Hobbhahn%252C%2520Jaime%2520Sevilla&amp;title=%22What%E2%80%99s%20the%20backward-forward%20FLOP%20ratio%20for%20neural%20networks%3F%22%20by%20Marius%20Hobbhahn%2C%20Jaime%20Sevilla&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fbackward-forward-FLOP-ratio&amp;created_at=2026-05-18T19%3A12%3A23.848363%2B00%3A00&amp;duration=738" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/backward-forward-FLOP-ratio</link>
      <itunes:duration>738</itunes:duration>
    </item>
    <item>
      <title>“How to measure FLOP for neural networks empirically?” by Marius Hobbhahn</title>
      <description>&lt;p&gt; Subtitle: Computing the utilization rate for multiple Neural Network architectures.&lt;/p&gt;  &lt;p&gt; Experiments and text by Marius Hobbhahn. I would like to thank Jaime Sevilla, Jean-Stanislas Denain, Tamay Besiroglu, Lennart Heim, and Anson Ho for their feedback and support. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt; Summary&lt;/strong&gt;&lt;/p&gt;&lt;p&gt; We measure the utilization rate of a Tesla P100 GPU for training different ML models. Most architectures and methods result in a utilization rate between 0.3 and 0.75. However, two architectures result in implausible low utilization rates of lower than 0.04. The most probable explanation for these outliers is that FLOP for inverted bottleneck layers are not counted correctly by the profiler. In general, the profiler we use shows signs of under- and overcounting and there is a possibility we made errors.&lt;/p&gt;&lt;p&gt;&lt;strong&gt; Findings&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt; 
&lt;li&gt; Counting the FLOP for a forward pass is very simple and many different packages give correct answers.&lt;/li&gt;
&lt;li&gt; Counting the FLOP for the backward pass is harder and our estimator of choice makes weird overcounting and undercounting errors.&lt;/li&gt;
&lt;li&gt; After cleaning mistakes, it is very likely that the backward/forward ratio is 2:1 (at least for our setup).&lt;/li&gt;
&lt;li&gt; After correcting for the overcounting issues, we get empirical utilization rates between 0.3 and 0.75 for [...]&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(00:29) Summary&lt;/p&gt;&lt;p&gt;(01:07) Findings&lt;/p&gt;&lt;p&gt;(01:58) Introduction&lt;/p&gt;&lt;p&gt;(02:47) Methods for counting FLOP&lt;/p&gt;&lt;p&gt;(04:40) Our experimental setup&lt;/p&gt;&lt;p&gt;(07:19) Analysis&lt;/p&gt;&lt;p&gt;(07:21) Something is fishy with profiler_nvtx&lt;/p&gt;&lt;p&gt;(09:21) Investigating profiler_nvtx further&lt;/p&gt;&lt;p&gt;(12:33) Results&lt;/p&gt;&lt;p&gt;(12:51) Comparing batch sizes&lt;/p&gt;&lt;p&gt;(13:56) Backward-forward pass ratios&lt;/p&gt;&lt;p&gt;(15:06) Utilization rates&lt;/p&gt;&lt;p&gt;(16:46) Conclusion&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          November 29th, 2021 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/measure-FLOP-empirically?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/measure-FLOP-empirically&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/1e38577879d66f91.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/1e38577879d66f91.png" alt="Estimated GPU utilization rates on different architectures, using four different estimation setups." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/17eb46b1137bf2a4.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/17eb46b1137bf2a4.png" alt="Bar graph titled "FLOPs estimate by different methods" comparing neural network models across five measurement methods." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/33ed18c3620d86c0.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/33ed18c3620d86c0.png" alt="Bar graph showing "forward pass (optimally all bars have the same height)" with FLOPs ratio to batch size 1 across neural network models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/50a36d4525b3fe6b.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/50a36d4525b3fe6b.png" alt="Bar graph showing FLOPs ratio across batch sizes for various neural network architectures." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/da729afc6b132b13.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/da729afc6b132b13.png" alt="Table showing neural network layer statistics including operations, parameters, and ratios." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/06b82e28a1758fdb.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/06b82e28a1758fdb.png" alt="Table showing Conv2d layer parameters with rows 7-22 highlighted in blue." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/56d6509413aaf232.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/56d6509413aaf232.png" alt="Table showing neural network operations with FLOPs and execution time metrics." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/642689e7295b40e6.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/642689e7295b40e6.png" alt="Bar graph showing FLOPs ratio to batch size 1 for forward pass (clean) across neural network models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/5b2fe2bac0aea7bc.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/5b2fe2bac0aea7bc.png" alt="Bar chart showing "backward pass (clean) (optimally all bars have the same height)" across different neural network models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/efbffad93c36bc2b.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/efbffad93c36bc2b.png" alt="Bar graph titled "Training time by batch size" comparing performance across different neural network models." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/4dec606d793e5404.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/4dec606d793e5404.png" alt="Bar graph titled "comparing backward-forward ratios (clean)" showing ratios across different model configurations." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/1e38577879d66f91.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/measure-FLOP-empirically/1e38577879d66f91.png" alt="Bar graph titled "Comparing utilization rates (clean)" showing performance across different model configurations." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 29 Nov 2021 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">9c8a1e19-a431-48b6-955d-99dd2943426f</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/9c8a1e19-a431-48b6-955d-99dd2943426f.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Marius%2520Hobbhahn&amp;title=%22How%20to%20measure%20FLOP%20for%20neural%20networks%20empirically%3F%22%20by%20Marius%20Hobbhahn&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fmeasure-FLOP-empirically&amp;created_at=2026-05-18T19%3A12%3A24.735748%2B00%3A00&amp;duration=1051" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/measure-FLOP-empirically</link>
      <itunes:duration>1051</itunes:duration>
    </item>
    <item>
      <title>“Parameter counts in machine learning” by Jaime Sevilla, Pablo Villalobos, Juan Felipe Cerón</title>
      <description>&lt;p&gt; Subtitle: Compiling a large dataset of machine learning models to determine changes in the parameters counts of systems since 1952.&lt;/p&gt;  &lt;p&gt; In short: we have compiled information about the date of development and trainable parameter counts of n=139 machine learning systems between 1952 and 2021. This is, as far as we know, the biggest public dataset of its kind. You can access our dataset here, and the code to produce an interactive visualization is available here.&lt;/p&gt;
&lt;p&gt; We chose to focus on parameter count because previous work indicates that it is an important variable for model performance [1], because it helps as a proxy of model complexity and because it is information usually readily available or easily estimable from descriptions of model architecture. &lt;/p&gt;
&lt;p&gt; We hope our work will help AI researchers and forecasters understand one way in which models have become more complex over time, and ground their predictions of how the field will progress in the future. In particular, we hope this will help us tease apart how much of the progress in machine learning has been due to algorithmic improvements versus increases in model complexity.&lt;/p&gt;
&lt;p&gt; It is hard to draw firm conclusions from our [...]&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Outline:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;(03:02) Features of the dataset&lt;/p&gt;&lt;p&gt;(04:31) Caveats&lt;/p&gt;&lt;p&gt;(06:02) Insights&lt;/p&gt;&lt;p&gt;(08:35) Open questions&lt;/p&gt;&lt;p&gt;(09:47) Next steps&lt;/p&gt;&lt;p&gt;(10:37) Acknowledgements&lt;/p&gt;&lt;p&gt;(11:09) Bibliography&lt;/p&gt; &lt;p&gt;---&lt;/p&gt;
          &lt;p&gt;&lt;b&gt;First published:&lt;/b&gt;&lt;br/&gt;
          June 19th, 2021 &lt;/p&gt;
        
        &lt;p&gt;&lt;b&gt;Source:&lt;/b&gt;&lt;br/&gt;
        &lt;a href="https://epoch.ai/publications/parameter-counts?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Source+URL+in+episode+description&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;https://epoch.ai/publications/parameter-counts&lt;/a&gt; &lt;/p&gt;
        &lt;p&gt;---&lt;/p&gt;
        &lt;p&gt;Narrated by &lt;a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&amp;utm_medium=Podcast&amp;utm_content=Narrated+by+TYPE+III+AUDIO&amp;utm_term=epoch_ai&amp;utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank"&gt;TYPE III AUDIO&lt;/a&gt;.&lt;/p&gt;
       &lt;p&gt;---&lt;/p&gt;&lt;div style="max-width: 100%";&gt;&lt;p&gt;&lt;strong&gt;Images from the article:&lt;/strong&gt;&lt;/p&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/parameter-counts/194bf4d3224e543d.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/parameter-counts/194bf4d3224e543d.png" alt="Model size of popular new machine learning systems between 1954 and 2021. Includes n=139 datapoints. See expanded and interactive version of this graph here __T3A_LINK_IN_POST__." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;hr style="margin-top: 24px; margin-bottom: 24px;" /&gt;&lt;a href="https://epoch.ai/assets/images/posts/2021/parameter-counts/e1df66d747aa8b7b.png" target="_blank"&gt;&lt;img src="https://epoch.ai/assets/images/posts/2021/parameter-counts/e1df66d747aa8b7b.png" alt="Model size of popular new machine learning systems between 2000 and 2021. Includes n=114 datapoints. See expanded and interactive version of this graph here __T3A_LINK_IN_POST__." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Apple Podcasts and Spotify do not show images in the episode description. Try &lt;a href="https://pocketcasts.com/" target="_blank" rel="noreferrer"&gt;Pocket Casts&lt;/a&gt;, or another podcast app.&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</description>
      <pubDate>Sat, 19 Jun 2021 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">302a07f5-448f-4bce-8a8a-95941e7ce316</guid>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:explicit>false</itunes:explicit>
      <enclosure url="https://dl.type3.audio/episode/302a07f5-448f-4bce-8a8a-95941e7ce316.mp3?request_source=rss&amp;client_id=epoch_ai&amp;feed_id=epoch_ai&amp;type=ai_narration&amp;author=Jaime%2520Sevilla%252C%2520Pablo%2520Villalobos%252C%2520Juan%2520Felipe%2520Cer%25C3%25B3n&amp;title=%22Parameter%20counts%20in%20machine%20learning%22%20by%20Jaime%20Sevilla%2C%20Pablo%20Villalobos%2C%20Juan%20Felipe%20Cer%C3%B3n&amp;source_url=https%3A%2F%2Fepoch.ai%2Fpublications%2Fparameter-counts&amp;created_at=2026-05-18T19%3A12%3A25.714937%2B00%3A00&amp;duration=686" length="0" type="audio/mpeg"/>
      <link>https://epoch.ai/publications/parameter-counts</link>
      <itunes:duration>686</itunes:duration>
    </item>
    <itunes:category text="Technology"/>
    <itunes:category text="Business">
      <itunes:category text="Non-Profit"/>
    </itunes:category>
    <link>https://epoch.ai</link>
    <itunes:image href="https://files.type3.audio/clients/epoch/cover.jpg?v=2"/>
    <itunes:owner>
      <itunes:email>infra@epoch.ai</itunes:email>
      <itunes:name>Epoch AI</itunes:name>
    </itunes:owner>
    <atom:link href="https://feeds.type3.audio/epoch-ai.rss" rel="self" type="application/rss+xml"/>
  </channel>
</rss>