Since we claim to deliver an agent you can teach to find concepts in eDiscovery and compliance, I thought I better check the definition of the term “concept” to make sure we are using it correctly. While I didn’t expect this to lead me all the way back to Philosophy 101 with references to Kant, Locke, Mill etc., I was pleased that our approach and use of this term are fundamentally consistent with the excerpts below from Wikipedia.
John Locke‘s description of a general idea corresponds to a description of a concept. According to Locke, a general idea is created by abstracting, drawing away, or removing the uncommon characteristic or characteristics from several particular ideas. The remaining common characteristic is that which is similar to all of the different individuals…
John Stuart Mill argued that general conceptions are formed through abstraction. A general conception is the common element among the many images of members of a class. “…[W]hen we form a set of phenomena into a class, that is, when we compare them with one another to ascertain in what they agree, some general conception is implied in this mental operation” (A System of Logic, Book IV, Ch. II)…
Philosopher Arthur Schopenhauer argued that concepts are “mere abstractions from what is known through intuitive perception, and they have arisen from our arbitrarily thinking away or dropping of some qualities and our retention of others.” (Parerga and Paralipomena, Vol. I, “Sketch of a History of the Ideal and the Real”). …
By contrast to the above philosophers, Immanuel Kant held that the account of the concept as an abstraction of experience is only partly correct. He called those concepts that result from abstraction “a posteriori concepts” …
A concept is a common feature or characteristic. Kant investigated the way that empirical a posteriori concepts are created.
‘The logical acts of the understanding by which concepts are generated as to their form are:
- comparison, i.e., the likening of mental images to one another in relation to the unity of consciousness;
- reflection, i.e., the going back over different mental images, how they can be comprehended in one consciousness; and finally
- abstraction or the segregation of everything else by which the mental images differ …
In order to make our mental images into concepts, one must thus be able to compare, reflect, and abstract, for these three logical operations of the understanding are essential and general conditions of generating any concept whatever. For example, I see a fir, a willow, and a linden. In firstly comparing these objects, I notice that they are different from one another in respect of trunk, branches, leaves, and the like; further, however, I reflect only on what they have in common, the trunk, the branches, the leaves themselves, and abstract from their size, shape, and so forth; thus I gain a concept of a tree.’ — Logic, §6
Our core is perfect for this task
Our core technology from ai-one processes each line of text much the way our brains do it, learning the patterns of language, the “key” words, their importance and the words most closely associated with them. NathanApp provides commands to extract those keywords and associations, their direction and values (strengths). While NathanApp is commonly described to be able to find the keywords and associations that are most important, my observation from years of using it is that it’s even more proficient at filtering out the noise, giving low values to the unimportant words and associations.
This filtering is fundamental to our accuracy. Locke describes a concept as an idea “created by abstracting, drawing away, or removing the uncommon characteristic or characteristics”. This is very close to the way our solution finds a concept after processing the examples the user provides to teach the agent. The fingerprint we extract from that text is an array that represents the concept in a way that we can use to compare with other fingerprints.
Warning – I like lawyers
Just because we can create a fingerprint, it doesn’t mean its a concept if there’s no concept there in the first place. Which is why I like lawyers.
While some may argue this point, lawyers create the best documents for software to understand. They have been educated to use language to communicate concepts clearly, organize those concepts into paragraphs and keep related paragraphs together in the document. This produces very precise similarity scoring in our tests, even in the absence of unique keywords.
That said, our intelligent agent can’t find a concept if it is written in a way that a human can’t see it either. The reviewer has to teach the agent with language that describes the concept in the way it’s written in the discovery documents. Also, the same concept may be written differently and vary from formal contracts to email to text messages or tweets. Just read your kids Facebook page. You may know what they’re talking about but I would guess it’s not written the way you would write it.
Building a WOW application for eDiscovery and compliance experts starts with the right core technology. If you’re trying to find concepts, start with a core that extracts concepts the same way your brain does. Finding concepts buried inside documents is our mission. Now that I think about it, maybe Philosophy 101 wasn’t a liberal arts waste of my parents money after all.
Posted by: Tom Marsh