A suitable problem is search problem where the value of each solution can be estimated from the value of a number of quick and simple tests. The quicker and simpler the tests, the better. SDS is very tolerant of noise, so it tends to be more suitable in real-world problems rather than theoretical problems.
An intuitive example is locating the string 'hello' in a larger text, each location in the text is a potential solution and can be evaluated by the results of these five tests:
To define an SDS using this library you largely just need to define a function which takes no input and returns a random hypothesis (a potential solution), and define a list of functions each of which take a hypothesis and return the result of a quick and simple test.
To define an SDS to perform the example task you'd therefore need something like these functions.
This function takes a random generator which can be an instance of the Random class, or a reference to the random module, or anything that performs randint, and returns an integer which represents a hypothesised location of the model in the larger text.
larger_text = 'xxxhelxxxelloxxxxxxhelloxxxxx' model = 'hello' def random_hypothesis(random_obj): return random_obj.randint(0, len(larger_text)-len(model))Of course the larger_text variable can be much larger than the tiny string used here, by using a file object and the seek method, text files representing entire genomes have been successfully searched with SDS.
This defines microtests, a list of functions which each take a hypothesis generated by random_hypothesis and returns the result of a simple test.
microtests = [ lambda hyp: larger_text[hyp] == 'h', lambda hyp: larger_text[hyp+1] == 'e', lambda hyp: larger_text[hyp+2] == 'l', lambda hyp: larger_text[hyp+3] == 'l', lambda hyp: larger_text[hyp+4] == 'o', ]These functions have been defined manually, but Python provides a number of ways to automatically define functions for more complex tasks. Even this example could have been produced by looping over a single function which returns test functions like the ones defined here.
With the random hypothesis function defined and the list of microtests defined all that is required is to define a swarm, a swarm is simply a list of Agents and Agents are simple data structures which each maintain a single hypothesis and a boolean state variable which defines whether or not they are active.
import sds agent_count = 1000 swarm = sds.Agent.initialise(agent_count=agent_count)After this code runs swarm will be a list of 1000 inactive agents with their hypothesis uninitialised.
The SDS library implements three diffusion functions, Passive, Context-Free and Context-Sensitive. You should experiment with their different behaviours. When you need to choose a diffusion function use one of sds.passive_diffusion, sds.context_free_diffusion, or sds.context_sensitive_diffusion. For now stick with Passive Diffusion.
The entry point for running SDS is the run function, the required arguments are as follows:
The run function returns its results as a collections.Counter of clusters, which details where the active agents have decided to congregate.
clusters = sds.run( swarm, microtests, random_hypothesis, max_iterations=100, diffusion_function=sds.passive_diffusion, random,) print(clusters.most_common(1))There are other functions available but these are the main ones.
The combination of make_microtest function and the microtests list comprehension is equivalent to the five manually defined functions in the Section ``List of microtest functions''.
In the search_space of this example are four intances of four of the five letter in the word 'hello' appearing in sequence, the locations marked with a | in the code. Play around with the search_space variable, and see how the results change.
import random import sds search_space = "xxhellxelloxhexhelxoxxxhxlloxxx" # | | | | model = "hello" def random_hyp(rnd): return rnd.randint(0,len(search_space)-len(model)) def make_microtest(offset): return lambda hyp: search_space[hyp+offset] == model[offset] microtests = [ make_microtest(offset) for offset in range(len(model)) ] swarm = sds.Agent.initialise(agent_count=1000) clusters = sds.run( swarm=swarm microtests=microtests, random_hypothesis_function=random_hyp, max_iterations=300, diffusion_function=sds.passive_diffusion, random=random.Random(), report_iterations=10, ) print(clusters.most_common())
Running this script should produce something similar to the following
0 Activity: 0.184. 23: 32, 6: 29, 15: 28, 2: 22, 12: 20, 3: 14, 5: 11, 14: 7, 24: 7, 7: 6, 22: 6, 1: 2 10 Activity: 0.763. 6: 216, 2: 208, 23: 171, 15: 164, 1: 2, 3: 1, 24: 1 20 Activity: 0.766. 2: 232, 23: 200, 6: 183, 15: 148, 5: 1, 12: 1, 14: 1 30 Activity: 0.780. 2: 223, 6: 213, 23: 187, 15: 152, 3: 2, 12: 2, 22: 1 40 Activity: 0.745. 6: 185, 23: 185, 2: 184, 15: 184, 14: 2, 1: 1, 3: 1, 22: 1, 24: 1, 12: 1 50 Activity: 0.768. 15: 213, 2: 198, 6: 197, 23: 157, 12: 2, 5: 1 60 Activity: 0.772. 2: 213, 15: 205, 6: 179, 23: 169, 12: 2, 1: 1, 22: 1, 7: 1, 24: 1 70 Activity: 0.777. 15: 230, 2: 218, 6: 179, 23: 146, 1: 2, 24: 1, 12: 1 80 Activity: 0.768. 15: 240, 2: 199, 23: 178, 6: 146, 22: 1, 24: 1, 7: 1, 12: 1, 14: 1 90 Activity: 0.748. 15: 272, 2: 191, 23: 143, 6: 134, 1: 3, 3: 2, 7: 1, 12: 1, 14: 1 100 Activity: 0.780. 15: 285, 2: 199, 23: 164, 6: 124, 24: 2, 14: 2, 3: 1, 22: 1, 7: 1, 12: 1 110 Activity: 0.766. 15: 280, 2: 193, 23: 145, 6: 139, 12: 3, 1: 2, 3: 2, 5: 1, 22: 1 120 Activity: 0.757. 15: 273, 23: 169, 6: 163, 2: 146, 14: 3, 3: 1, 5: 1, 22: 1 130 Activity: 0.766. 15: 282, 23: 172, 6: 163, 2: 146, 24: 1, 7: 1, 12: 1 140 Activity: 0.724. 15: 262, 6: 159, 23: 156, 2: 144, 12: 2, 7: 1 150 Activity: 0.741. 15: 269, 23: 171, 2: 164, 6: 129, 1: 3, 12: 3, 22: 1, 14: 1 160 Activity: 0.775. 15: 260, 2: 180, 23: 169, 6: 161, 12: 3, 22: 1, 5: 1 170 Activity: 0.754. 15: 251, 2: 186, 23: 176, 6: 134, 12: 3, 1: 1, 3: 1, 22: 1, 7: 1 180 Activity: 0.750. 15: 234, 23: 186, 2: 174, 6: 154, 7: 1, 24: 1 190 Activity: 0.751. 15: 251, 23: 181, 2: 159, 6: 155, 12: 2, 1: 1, 5: 1, 24: 1 200 Activity: 0.746. 15: 241, 23: 192, 6: 175, 2: 135, 14: 2, 12: 1 210 Activity: 0.749. 15: 214, 6: 204, 23: 181, 2: 146, 1: 1, 3: 1, 24: 1, 7: 1 220 Activity: 0.763. 6: 207, 15: 196, 23: 191, 2: 164, 24: 2, 5: 1, 12: 1, 14: 1 230 Activity: 0.756. 2: 192, 6: 192, 23: 182, 15: 182, 14: 4, 12: 2, 22: 1, 7: 1 240 Activity: 0.750. 2: 222, 6: 181, 23: 173, 15: 170, 12: 2, 3: 1, 7: 1 250 Activity: 0.771. 2: 206, 23: 200, 6: 184, 15: 178, 1: 1, 22: 1, 7: 1 260 Activity: 0.753. 2: 194, 23: 192, 15: 187, 6: 174, 7: 2, 12: 2, 22: 1, 14: 1 270 Activity: 0.744. 15: 214, 23: 185, 2: 179, 6: 154, 5: 5, 1: 2, 12: 2, 24: 1, 22: 1, 14: 1 280 Activity: 0.767. 15: 214, 6: 192, 2: 176, 23: 174, 14: 3, 5: 2, 22: 2, 12: 2, 7: 1, 24: 1 290 Activity: 0.772. 2: 214, 15: 188, 6: 184, 23: 181, 1: 2, 3: 1, 5: 1, 14: 1
[(2, 195), (15, 191), (23, 190), (6, 186), (22, 2), (3, 1), (5, 1), (24, 1), (12, 1)]Ignoring the activity lines for now, each of the tuples in the list at the bottom represent a cluster, the left number is the location of the cluster and the right numeber is the number of active agents at that location, the list is ordered with the largest clusters first, so (2, 195) means 195 agents are at location 2, and being the first tuple in the list means this must be the largest cluster.