一站式 Web3 探索中心 | 去中心化應用商店 & Web3 線下活動 | OKX

探索 Momentum 生態，瓜分 $101.5 萬獎勵

熱門話題

Bonk 生態迷因幣展現強韌勢頭

有消息稱 Pump.fun 計劃 40 億估值發幣，引發市場猜測

Solana 新代幣發射平臺 Boop.Fun 風頭正勁

BOOP+24.41%

Boopa+20.33%

PORK+228.03%

Daniel Kang

UIUC CS 助理教授。曾任職於斯坦福大學 DAWN 實驗室和伯克利天空實驗室。

Daniel Kang8月12日 01:27

目前的共識是，計算能力是前沿 AI 訓練中最重要的因素。我們認為這是錯誤的：數據才是 AI 訓練中最昂貴和最重要的組成部分。我們收集了主要數據標註公司的收入估算，並將其與 2024 年訓練頂級模型的邊際計算成本進行比較。我們的估算顯示，數據標註的成本約為邊際訓練計算的 3 倍。 1/8

144.99K

Daniel Kang8月5日 05:23

我們在伯克利 AgentX 峰會的基準和評估專題中獲得了第一名！恭喜團隊 :)

Daniel Kang2025年7月9日

As AI agents near real-world use, how do we know what they can actually do? Reliable benchmarks are critical but agentic benchmarks are broken! Example: WebArena marks "45+8 minutes" on a duration calculation task as correct (real answer: "63 minutes"). Other benchmarks misestimate agent competence by 1.6-100%. Why are the evaluation foundations for agentic systems fragile? See below for thread and links 1/8

983

Daniel Kang2025年7月29日

我不擅長準時發佈東西！(我的藉口是我今年又在 Addis Coder 教書) 這篇論文的海報會議正在進行中！ Session5: V-Gather 找到 2025年7月28日 18:00-19:30 跟 @ChuxuanHu 打個招呼 :)

Daniel Kang2025年7月29日

Can AI agents assess the reproducibility of research findings? Our #ACL2025 paper shows that they fall short with REPRO-Bench, a new benchmark that evaluates agents on real-world social science reproducibility tasks of 112 papers, full PDFs, code, and data. Our highest performing agent scores <40%! 1/6

2.72K

Daniel Kang2025年7月29日

AI 代理能否評估研究結果的可重複性？我們的 #ACL2025 論文顯示，使用 REPRO-Bench 的 AI 代理在現實社會科學可重複性任務中表現不佳，該基準評估了 112 篇論文的完整 PDF、代碼和數據。我們表現最好的代理得分不到 40%！ 1/6

6.7K

Daniel Kang2025年7月23日

SWE-bench Verified 是評估編碼代理的黃金標準：500 個真實世界的問題 + OpenAI 的測試。聽起來無懈可擊？其實不然。我們展示了通過單元測試不等於匹配真實情況。在我們的 ACL 論文中，我們修正了有缺陷的評估：24% 的代理在排行榜上上升或下降了！ 1/7

24.97K

Daniel Kang2025年6月26日

強化學習使 LLM 能夠在程式設計/數學競賽中擊敗人類，並推動了最近的進步（OpenAI 的 o 系列、Anthropic 的 Claude 4） RL 會像預訓練一樣實現廣泛的泛化嗎？不是用當前的技術 🧵 1/7

2.7K

Daniel Kang2025年6月24日

我將在 SIGMOD 的海報會議 2 上發表演講（週三 16：00 波茨坦 II）。快來打個招呼吧！

Daniel Kang2025年6月24日

近似查詢處理（AQP）可以將長時間運行的分析查詢加速幾個數量級。但為什麼 AQP 在生產中仍然很少見呢？為了解決這個問題，我們開發了 PilotDB，這是一個在線 AQP 中間版本，對 DBMS 進行 0 次更改，提供具有先驗錯誤保證的結果，並實現了高達 126 倍的加速。 1/8

748

Daniel Kang2025年6月24日

1.64K

Daniel Kang2025年4月29日

@ZhanQiusi1將在週三上午 11 點的海報會議和週六的 TrustNLP 研討會（焦點演講）上展示我們的工作！如果你看到她，就打個招呼

Daniel Kang2025年3月13日

AI agents are increasingly popular (e.g., OpenAI's operator) but can be attacked to harm users! We show that even with defenses, AI agents can still be compromised via indirect prompt injections via "adaptive attacks" in our NAACL 2025 findings paper 🧵 and links below

211

Daniel Kang2025年4月20日

我今年會參加 #ICLR2025！將在Alignment研討會上發表演講，並在ML Safety Social 上發表小組會議。如果你看到我，請打個招呼

696