Inteligentne Systemy Autonomiczne

Inteligentne Systemy Autonomiczne
Sieci Hopfielda W oparciu o wyklad Prof. Geoffrey Hinton University of Toronto materialy z oraz Prof. Włodzisława Ducha Uniwersytet Mikołaja Kopernika Janusz A. Starzyk Wyzsza Szkola Informatyki i Zarzadzania w Rzeszowie

Sieci Hopfielda John Hopfield opracowal model pamięci autoasocjacyjnej. Założenia: Siec Hopfielda jest siecia rekurencyjna Wszystkie neurony są ze sobą połączone (fully connected network) z wagami synaps Wij. Macierz wag połączeń jest symetryczna, Wi,i=0, Wij = Wji. Symetria jest wygodna z teoretycznego punktu widzenia, pozwala wprowadzić funkcje energii; jest nierealistyczna z biologicznego punktu widzenia.

Struktura Sieci Hopfielda
Dyskretny stan neuronu - Vi = ±1 = sign (I(V)) Siec Hopfielda moze miec albo jednostki o wartosciach 1 lub -1, albo jednostki o wartosciach 1 lub 0. Binarne zasady decyzyjne w oparciu o okreslony prog powoduja zbieznosc sieci do minimum funkcji energii. J. J. Hopfield, "Neural networks and physical systems with emergent collective computational abilities", Proc. of the National Academy of Sciences of the USA, vol. 79 no. 8 pp , April 1982.

Model Hopfielda - dynamika
Wektor potencjałów wejściowych V(0)=Vini , czyli wejście = wyjście. Dynamika (iteracje) : sieć Hopfielda osiąga stany stacjonarne = odpowiedzi sieci (wektory aktywacji elementów) na zadane pytanie Vini (autoasocjacja). Stany stacjonarne = atraktory punktowe. t - czas dyskretny (numer iteracji).

Funkcja Energii Dla sieci o symetrycznych wagach taka dynamika prowadzi do minimalizacji funkcji typu energii. Energia ukladu jest suma wielu skladowych. Kazda skladowa zalezy od wagi polaczenia i binarnych stanow dwoch neuronow : Prosta funkcja energii sumy kwadratow pozwala latwo obliczyc jak stan neuronu wplywa na calkowita energie ukladu:

Minimalizacja energii
Wybiez jeden nuron i odwroc jego stan jesli zmniejsza to funkcje energii. Znajduje minima w sieci Hopfielda Jesli dwie jednostki sa zmienione jednoczesnie to energia moze wzrosnoac. -4 -100 +5 +5

Minimalizacja energii
W teorii układów dynamicznych funkcja energii odpowiada funkcji Lapunova, w fizyce statystycznej funkcji Hamiltona, w teorii optymalizacji funkcji celu lub kosztu, w obliczeniach ewolucyjnych funkcja przystosowania etc. Zmiana energii w czasie iteracji jest 0 Jeśli Ii  0 to Vi nie może zmaleć, więc energia zmaleje; Jeśli Ii < 0 to D Vi < 0, energia również zmaleje.

Atraktory Dynamika: Ruch po hiperpowierzchni energii, zależnej od potencjałów neuronów, aż do osiągnięcia lokalnego minimum na takiej powierzchni. Jeśli Vi dyskretne to ruch po rogach hipersześcianu.

Zbieznosc do minimum Atraktory punktowe - tylko dla symetrycznych połączeń. Stany stabilne: minima lokalne E(W) odpowiadające pamiętanym wzorcom Vi - pamięć asocjacyjna. Prawdopodobieństwo aktywacji: sigmoidalne. W wysokiej T przypadkowe błądzenie, stopniowe studzenie pozwala unikać płytkich minimów lokalnych.

Uczenie Jesli uzyjemy aktywizacje 1 -1, to mozemy zapamietac wektor stanu poprzez zwiekszanie wagi miedzy dwoma jednostkami uzywajac iloczynu ich aktywizacji. Wartosci progowe traktujemy jako wagi jednostek ciagle aktywnych Dla aktywizacji 0 1 zasada jest nieco bardziej zlozona. Jeśli korelacja pomiędzy wzorcami słaba to zbieżność. Lepsze rezultaty: metoda pseudoinwersji:

Pojemność modelu Hopfielda
Odwracania macierzy V można uniknąć iteracyjną metodą rzutowania: 2N możliwych stanów sieci binarnej złożonej z N neuronów. Zbyt wiele wzorców prowadzi do chaosu i zapominania. Liczba poprawnie pamiętanych wzorców: dla procentu błędów 0.37% wynosi a= 0.138*N Około 7 neuronów/N-bitowy wzorzec lub 7 połączeń/bit.

Jak uzyc ten typ uczenia
Hopfield zaproponowal ze pamiec moga stanowic minima funkcji energii w sieci neuronowej. Binarne, progowe metody decyzyjne moga byc uzyte do “wyczyszczenia” niepelnych lub zaszumionych pamieci. Prowadzi to do pamięci adresowalnych kontekstowo (content-addressable memory – CAM) ktore moga byc odczytywane poprzez znajomosc czesci zapamietanej informacji (tak jak w Google) Jest to pamiec odporna na uszkodzenia hardware’u.

Optymalizacja Zagadnienia NP-trudne: jak zastosować sieć Hopfielda?
Przykład: najkrótsza droga pomiędzy N miastami. Macierz nia i=1,2..N, nr. miasta a - kolejność Funkcja kosztów: min. droga + 1 w wierszu + 1 w kolumnie Jak dobrać W?

Dobór wag Zagadnienia NP-trudne: jak zastosować sieć Hopfielda?
Przykład: najkrótsza droga pomiędzy N miastami. Odległość + 1 w wierszu + 1 w kolumnie N miast

Spełnianie ograniczeń
Rozwiązania mogą nie spełniać ograniczeń, obliczanie odbywa się wewnątrz hiperkostki, ma końcu osiągany jest stan poprawny. Metody optymalizacji - operacje dyskretne, zawsze poprawne. Zagadnienia wymagające spełniania ograniczeń i optymalizacji: Problem N królowych: umieścić je na szachownicy NxN tak, by się nie szachowały. Problem ustawienia skoczków, problem plecakowy ... Problem rutowania pakietów w sieciach pakietowych. Dobór funkcji kosztu, metody minimalizacji - intensywnie badane. Metody wyspecjalizowane radzą sobie lepiej ale wyrafinowane wersje metod pola średniego dają doskonałe rezultaty.

KOT Model Hopfielda i percepcja
Interpretacja sygnałów dochodzących do mózgu nie jest jednoznaczna. Interpretacja musi spełniać ograniczenia: Tylko jedna litera na danej pozycji. Obecność danej litery aktywizuje rozpoznanie słowa. Cecha na danej pozycji aktywizuje rozpoznanie litery. KOT

3 słowa K.. Ą.. .A. ..T ..P KAT KĄT KAP

Pytania?

Sieci Hopfielda Dalsza czesc materialu nie jest wymagana do egzaminu

Avoiding spurious minima by unlearning
Hopfield, Feinstein and Palmer suggested the following strategy: Let the net settle from a random initial state and then do unlearning. This will get rid of deep , spurious minima and increase memory capacity. Crick and Mitchison proposed unlearning as a model of what dreams are for. That’s why you don’t remember them (Unless you wake up during the dream) But how much unlearning should we do? And can we analyze what unlearning achieves?

Faza snu Sen może być okresem, w którym mózg prowadzi optymalizację zużycia swoich zasobów, utrwalając pewne zdarzenia/fakty i usuwając z pamięci pozostałe. W modelu CAM Hopfielda wartości ostatnio poznane są szybciej przypominane. Wzorce odpowiadające fałszywym minimom można wyeliminować pokazując antywzorce, związane z fałszywymi, płytkimi minimami. Przypadkowe błądzenie wśród zniekształconych wzorców - sen? Niektóre neurochipy do prawidłowej pracy muszą działać przez pewien czas bez żadnych sygnałów wejściowych - okres kalibracji.

An iterative storage method
Instead of trying to store vectors in one shot as Hopfield does, cycle through the training set many times. use the perceptron convergence procedure to train each unit to have the correct state given the states of all the other units in that vector. This uses the capacity of the weights efficiently.

Another computational role for Hopfield nets
Hidden units. Used to represent an interpretation of the inputs Instead of using the net to store memories, use it to construct interpretations of sensory input. The input is represented by the visible units. The interpretation is represented by the states of the hidden units. The badness of the interpretation is represented by the energy This raises two difficult issues: How do we escape from poor local minima to get good interpretations? How do we learn the weights on connections to the hidden units? Visible units. Used to represent the inputs

An example: Interpreting a line drawing
Use one “2-D line” unit for each possible line in the picture. Any particular picture will only activate a very small subset of the line units. Use one “3-D line” unit for each possible 3-D line in the scene. Each 2-D line unit could be the projection of many possible 3-D lines. Make these 3-D lines compete. Make 3-D lines support each other if they join in 3-D. Make them strongly support each other if they join at right angles. 3-D lines Join in 3-D at right angle Join in 3-D 2-D lines picture

Noisy networks find better energy minima
A Hopfield net always makes decisions that reduce the energy. This makes it impossible to escape from local minima. We can use random noise to escape from poor minima. Start with a lot of noise so its easy to cross energy barriers. Slowly reduce the noise so that the system ends up in a deep minimum. This is “simulated annealing”. A B C

Stochastic units Replace the binary threshold units by binary stochastic units that make biased random decisions. The “temperature” controls the amount of noise Decreasing all the energy gaps between configurations is equivalent to raising the noise level. temperature

The annealing trade-off
At high temperature the transition probabilities for uphill jumps are much greater. At low temperature the equilibrium probabilities of good states are much better than the equilibrium probabilities of bad ones. Energy increase

How temperature affects transition probabilities
High temperature transition probabilities A B Low temperature transition probabilities A B

Thermal equilibrium Thermal equilibrium is a difficult concept!
It does not mean that the system has settled down into the lowest energy configuration. The thing that settles down is the probability distribution over configurations. The best way to think about it is to imagine a huge ensemble of systems that all have exactly the same energy function. The probability distribution is just the fraction of the systems that are in each possible configuration. We could start with all the systems in the same configuration, or with an equal number of systems in each possible configuration. After running the systems stochastically in the right way, we eventually reach a situation where the number of systems in each configuration remains constant even though any given system keeps moving between configurations

An analogy Imagine a casino in Las Vegas that is full of card dealers (we need many more than 52! of them). We start with all the card packs in standard order and then the dealers all start shuffling their packs. After a few time steps, the king of spades still has a good chance of being next to queen of spades. The packs have not been fully randomized. After prolonged shuffling, the packs will have forgotten where they started. There will be an equal number of packs in each of the 52! possible orders. Once equilibrium has been reached, the number of packs that leave a configuration at each time step will be equal to the number that enter the configuration. The only thing wrong with this analogy is that all the configurations have equal energy, so they all end up with the same probability.

Learning multilayer networks
We want to learn models with multiple layers of non-linear features. Perceptrons: Use a layer of hand-coded, non-adaptive features followed by a layer of adaptive decision units. Needs supervision signal for each training case. Only one layer of adaptive weights. Back-propagation: Use multiple layers of adaptive features and train by backpropagating error derivatives Learning time scales poorly for deep networks. Support Vector Machines: Use a very large set of fixed features Does not learn multiple layers of features

Learning multi-layer networks (continued)
Boltzmann Machines: Nice local learning rule that works in arbitrary networks. Getting the pairwise statistics required for learning can be very slow if the hidden units are interconnected. Maximum likelihood learning requires unbiased samples from the model’s distribution. These are very hard to get. Restricted Boltzmann Machines: Exact inference is very easy when the states of the visible units are known because then the hidden states are independent. Maximum likelihood learning is still slow because of the need to get samples from the model, but contrastive divergence learning is fast and often works well. But we can only learn one layer of adaptive features!

A guarantee Can we prove that adding more layers will always help?
It would be very nice if we could learn a big model one layer at a time and guarantee that as we add each new hidden layer the model gets better. We can actually guarantee the following: There is a lower bound on the log probability of the data. Provided that the layers do not get smaller and the weights are initialized correctly (which is easy), every time we learn a new hidden layer this bound is improved (unless its already maximized). The derivation of this guarantee is quite complicated.

Realizacja sprzętowa

Równania - sprzętowo Prosta realizacja sprzętowa, elektroniczna lub optyczna. W stanie stacjonarnym wejście=wyjście. Równania na sygnały wejściowe: Ui - napięcie wejściowe i-tego wzmacniacza Vi - napięcie wyjściowe i-tego wzmacniacza C - pojemność wejściowa Ii - zewnętrzny prąd i-tego wzmacniacza

Inteligentne Systemy Autonomiczne

Podobne prezentacje

Prezentacja na temat: "Inteligentne Systemy Autonomiczne"— Zapis prezentacji:

Podobne prezentacje

О projekcie

Zwrotny adres

Wejść

Zaloguj się poprzez sieć społeczną:

Inteligentne Systemy Autonomiczne

Podobne prezentacje

Prezentacja na temat: "Inteligentne Systemy Autonomiczne"— Zapis prezentacji:

Podobne prezentacje

О projekcie

Zwrotny adres