DApp Store | Web3 Hub tapahtumille ja peleille

Trendaavat aiheet

Jürgen Schmidhuber

Invented principles of meta-learning (1987), GANs (1990), Transformers (1991), very deep learning (1991), etc. Our AI is used many billions of times every day.

Who invented convolutional neural networks (CNNs)? 1969: Fukushima had CNN-relevant ReLUs [2]. 1979: Fukushima had the basic CNN architecture with convolution layers and downsampling layers [1]. Compute was 100 x more costly than in 1989, and a billion x more costly than today. 1987: Waibel applied Linnainmaa's 1970 backpropagation [3] to weight-sharing TDNNs with 1-dimensional convolutions [4]. 1988: Wei Zhang et al. applied "modern" backprop-trained 2-dimensional CNNs to character recognition [5]. All of the above was published in Japan 1979-1988. 1989: LeCun et al. applied CNNs again to character recognition (zip codes) [6,10]. 1990-93: Fukushima’s downsampling based on spatial averaging [1] was replaced by max-pooling for 1-D TDNNs (Yamaguchi et al.) [7] and 2-D CNNs (Weng et al.) [8]. 2011: Much later, my team with Dan Ciresan made max-pooling CNNs really fast on NVIDIA GPUs. In 2011, DanNet achieved the first superhuman pattern recognition result [9]. For a while, it enjoyed a monopoly: from May 2011 to Sept 2012, DanNet won every image recognition challenge it entered, 4 of them in a row. Admittedly, however, this was mostly about engineering & scaling up the basic insights from the previous millennium, profiting from much faster hardware. Some "AI experts" claim that "making CNNs work" (e.g., [5,6,9]) was as important as inventing them. But "making them work" largely depended on whether your lab was rich enough to buy the latest computers required to scale up the original work. It's the same as today. Basic research vs engineering/development - the R vs the D in R&D. REFERENCES [1] K. Fukushima (1979). Neural network model for a mechanism of pattern recognition unaffected by shift in position — Neocognitron. Trans. IECE, vol. J62-A, no. 10, pp. 658-665, 1979. [2] K. Fukushima (1969). Visual feature extraction by a multilayered network of analog threshold elements. IEEE Transactions on Systems Science and Cybernetics. 5 (4): 322-333. This work introduced rectified linear units (ReLUs), now used in many CNNs. [3] S. Linnainmaa (1970). Master's Thesis, Univ. Helsinki, 1970. The first publication on "modern" backpropagation, also known as the reverse mode of automatic differentiation. (See Schmidhuber's well-known backpropagation overview: "Who Invented Backpropagation?") [4] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. Backpropagation for a weight-sharing TDNN with 1-dimensional convolutions. [5] W. Zhang, J. Tanida, K. Itoh, Y. Ichioka. Shift-invariant pattern recognition neural network and its optical architecture. Proc. Annual Conference of the Japan Society of Applied Physics, 1988. First backpropagation-trained 2-dimensional CNN, with applications to English character recognition. [6] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, 1989. See also Sec. 3 of [10]. [7] K. Yamaguchi, K. Sakamoto, A. Kenji, T. Akabane, Y. Fujimoto. A Neural Network for Speaker-Independent Isolated Word Recognition. First International Conference on Spoken Language Processing (ICSLP 90), Kobe, Japan, Nov 1990. A 1-dimensional convolutional TDNN using Max-Pooling instead of Fukushima's Spatial Averaging [1]. [8] Weng, J., Ahuja, N., and Huang, T. S. (1993). Learning recognition and segmentation of 3-D objects from 2-D images. Proc. 4th Intl. Conf. Computer Vision, Berlin, pp. 121-128. A 2-dimensional CNN whose downsampling layers use Max-Pooling (which has become very popular) instead of Fukushima's Spatial Averaging [1]. [9] In 2011, the fast and deep GPU-based CNN called DanNet (7+ layers) achieved the first superhuman performance in a computer vision contest. See overview: "2011: DanNet triggers deep CNN revolution." [10] How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23, Swiss AI Lab IDSIA, 14 Dec 2023. See also the YouTube video for the Bower Award Ceremony 2021: J. Schmidhuber lauds Kunihiko Fukushima.

AGI? One day, but not yet. The only AI that works well right now is the one behind the screen [12-17]. But passing the Turing Test [9] behind a screen is easy compared to Real AI for real robots in the real world. No current AI-driven robot could be certified as a plumber [13-17]. Hence, the Turing Test isn't a good measure of intelligence (and neither is IQ). And AGI without mastery of the physical world is no AGI. That’s why I created the TUM CogBotLab for learning robots in 2004 [5], co-founded a company for AI in the physical world in 2014 [6], and had teams at TUM, IDSIA, and now KAUST work towards baby robots [4,10-11,18]. Such soft robots don't just slavishly imitate humans and they don't work by just downloading the web like LLMs/VLMs. No. Instead, they exploit the principles of Artificial Curiosity to improve their neural World Models (two terms I used back in 1990 [1-4]). These robots work with lots of sensors, but only weak actuators, such that they cannot easily harm themselves [18] when they collect useful data by devising and running their own self-invented experiments. Remarkably, since the 1970s, many have made fun of my old goal to build a self-improving AGI smarter than myself and then retire. Recently, however, many have finally started to take this seriously, and now some of them are suddenly TOO optimistic. These people are often blissfully unaware of the remaining challenges we have to solve to achieve Real AI. My 2024 TED talk [15] summarises some of that. REFERENCES (easy to find on the web): [1] J. Schmidhuber. Making the world differentiable: On using fully recurrent self-supervised neural networks (NNs) for dynamic reinforcement learning and planning in non-stationary environments. TR FKI-126-90, TUM, Feb 1990, revised Nov 1990. This paper also introduced artificial curiosity and intrinsic motivation through generative adversarial networks where a generator NN is fighting a predictor NN in a minimax game. [2] J. S. A possibility for implementing curiosity and boredom in model-building neural controllers. In J. A. Meyer and S. W. Wilson, editors, Proc. of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, pages 222-227. MIT Press/Bradford Books, 1991. Based on [1]. [3] J.S. AI Blog (2020). 1990: Planning & Reinforcement Learning with Recurrent World Models and Artificial Curiosity. Summarising aspects of [1][2] and lots of later papers including [7][8]. [4] J.S. AI Blog (2021): Artificial Curiosity & Creativity Since 1990. Summarising aspects of [1][2] and lots of later papers including [7][8]. [5] J.S. TU Munich CogBotLab for learning robots (2004-2009) [6] NNAISENSE, founded in 2014, for AI in the physical world [7] J.S. (2015). On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning (RL) Controllers and Recurrent Neural World Models. arXiv 1210.0118. Sec. 5.3 describes an RL prompt engineer which learns to query its model for abstract reasoning and planning and decision making. Today this is called "chain of thought." [8] J.S. (2018). One Big Net For Everything. arXiv 1802.08864. See also patent US11853886B2 and my DeepSeek tweet: DeepSeek uses elements of the 2015 reinforcement learning prompt engineer [7] and its 2018 refinement [8] which collapses the RL machine and world model of [7] into a single net. This uses my neural net distillation procedure of 1991: a distilled chain of thought system. [9] J.S. Turing Oversold. It's not Turing's fault, though. AI Blog (2021, was #1 on Hacker News) [10] J.S. Intelligente Roboter werden vom Leben fasziniert sein. (Intelligent robots will be fascinated by life.) F.A.Z., 2015 [11] J.S. at Falling Walls: The Past, Present and Future of Artificial Intelligence. Scientific American, Observations, 2017. [12] J.S. KI ist eine Riesenchance für Deutschland. (AI is a huge chance for Germany.) F.A.Z., 2018 [13] H. Jones. J.S. Says His Life's Work Won't Lead To Dystopia. Forbes Magazine, 2023. [14] Interview with J.S. Jazzyear, Shanghai, 2024. [15] J.S. TED talk at TED AI Vienna (2024): Why 2042 will be a big year for AI. See the attached video clip. [16] J.S. Baut den KI-gesteuerten Allzweckroboter! (Build the AI-controlled all-purpose robot!) F.A.Z., 2024 [17] J.S. 1995-2025: The Decline of Germany & Japan vs US & China. Can All-Purpose Robots Fuel a Comeback? AI Blog, Jan 2025, based on [16]. [18] M. Alhakami, D. R. Ashley, J. Dunham, Y. Dai, F. Faccio, E. Feron, J. Schmidhuber. Towards an Extremely Robust Baby Robot With Rich Interaction Ability for Advanced Machine Learning Algorithms. Preprint arxiv 2404.08093, 2024.

Johtavat

Rankkaus

Suosikit

Ketjussa trendaava

Trendaa X:ssä

Viimeisimmät suosituimmat rahoitukset

Merkittävin