GenoKey enables large scale combinatorial analysis because of its geometric approach to problem solving – eliminating non-significant combinations and then running massively parallel computations on remaining combinations. GenoKey first builds a model of the problem space consisting of a set of variables and a set of relations specifying the constraints on these variables. Rules are not limited to the usual IF-THEN constructions, but can represent any relation between given variables on finite domains. Each user-defined relation is compiled into a truth-table specifying the valid combinations. Usually, a single relation can represent the same constraints as a lot of IF-THEN rules and is easy to auto-generate from existing structured data, e.g. a relational database.
Real-Time Reasoning on Mobile Devices
GenoKey’s array-based logic approach allows the output of the analysis to be compiled into a very highly compressed index (typically on a few KBs of size), which contains all of solutions found by the analysis. In use, GenoKey’s runtime engine only has to perform “table look ups” on these normalized nested arrays representing the complete solution space. This means that solutions to very complex problems can be identified in milliseconds on a low-power device such as a smartphone or tablet.
In the simplest use, the input is a state vector representing the valid elements of each variable, giving the user selections or other kinds of measurements from the environment – and the output is a state vector with the deduced constraints on all variables. This state vector deduction is carried out very fast by a simple state propagation and with predictable processing time and memory allocation – i.e. in real time. Moreover, the state deduction is complete in the sense that all constraints on all variables are deduced. Therefore it is easy to trace contradictions or invalid elements by “roll-back”.
The simple state vector interaction between the model and the environment also makes it easy to design solutions with more instances of the model (sub-modeling or “nested models”) or solutions with external constraints, e.g. linear constraints, procedures etc.
In the GWAS example, GenoKey’s data mining technology works on a group of cases (e.g. patients) and a group of controls, with the entire population described by a set of common parameters, which could be SNP genotypes, clinical data or any other measurement. For example, the table below is a relation representing SNP indices (left column), associated genotypes (middle column) and indices of patients (right column). Thus, each row describes a 4-combination of SNPs and genotypes, which is found only in the given subgroup of patients.
|548 172 500 691||2 0 0||11 96 222 327 380 419 463 476 591 599|
|548 253 604 626||2 0 0||96 102 327 341 380 419 463 476 591|
|548 370 720 791||2 0 1||96 102 327 341 380 463 471 476 591 599|
|548 372 535 702||2 0 0||11 96 102 222 463 471 476 591 599|
|548 500 503 696||2 0 0||96 222 327 380 419 463 476 591 599|
The following four step procedure is used:
- All significant (N+1)-combinations are identified from pre-determined significant sub-populations of cases and controls on common N-combinations, e.g., significant 5-combinations of SNPs and genotypes can be derived from analysis of sub-populations of common 4-combinations. SNP combinations are thus analyzed recursively one layer at a time.
- All large and significant subgroups of cases with a common profile are easily identified, e.g., subgroups counting at least 100 cases and no controls based on 3-combinations of SNPs and genotypes.
- Many subgroups may have one or more SNPs and associated genotypes in common. Such clusters are identified by looking for symmetries or patterns in these nested arrays. A single SNP may be critical as it is found in a large number of combinations; i.e. in significant interactions with other SNPs.
- In order to test whether or not findings or observations are statistically significant, the calculations are carried out at least 1,000 times on random permutations of the entire population, i.e., with dummy “cases” and “controls”.
GenoKey’s technology can perform such studies in hours or days, rather than the weeks or months required by traditional methods. It scales to the combinatorial analysis of hundreds of thousands of SNPs across tens or hundreds of thousands of patients – sufficient to make the large scale exhaustive multivariate analysis of whole clinical populations a reality for the first time.
The benchmark data below shows the time taken for a GWAS analysis of 3-SNP combinations across 2,000 patients and 2,000 controls each with a 479,000 SNP genotype. The times include the time required to validate the SNP combinations by computing significant 3-SNP combinations for 1,000 random permutations of ‘patients’ and ‘controls’.
1 NVIDIA GTX590 with 960 1.2 GHz GPU cores
2 Laptop computer with dual core 3 GHz CPU
3 Based on data provided by major research centre running broadly similar diabetes-I study
4 Extrapolation based on number of combinations required