US20080104004A1 - Method and Apparatus for Identifying, Extracting, Capturing, and Leveraging Expertise and Knowledge - Google Patents

Method and Apparatus for Identifying, Extracting, Capturing, and Leveraging Expertise and Knowledge Download PDF

Info

Publication number
US20080104004A1
US20080104004A1 US11/874,137 US87413707A US2008104004A1 US 20080104004 A1 US20080104004 A1 US 20080104004A1 US 87413707 A US87413707 A US 87413707A US 2008104004 A1 US2008104004 A1 US 2008104004A1
Authority
US
United States
Prior art keywords
user
search
content
document
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/874,137
Inventor
Scott Brave
Robert Bradshaw
Jack Jia
Christopher Minson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Monetate Inc
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US11/874,137 priority Critical patent/US20080104004A1/en
Assigned to BAYNOTE, INC. reassignment BAYNOTE, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BRADSHAW, ROBERT, BRAVE, SCOTT, JIA, JACK, MINSON, CHRISTOPHER
Publication of US20080104004A1 publication Critical patent/US20080104004A1/en
Assigned to Glenn Patent Group reassignment Glenn Patent Group LIEN (SEE DOCUMENT FOR DETAILS). Assignors: BAYNOTE, INC.
Assigned to BAYNOTE, INC. reassignment BAYNOTE, INC. LIEN RELEASE Assignors: Glenn Patent Group
Assigned to KIBO SOFTWARE, INC. reassignment KIBO SOFTWARE, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BAYNOTE, INC.
Assigned to Monetate, Inc. reassignment Monetate, Inc. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KIBO SOFTWARE, INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • the invention relates to electronic access to information. More particularly, the invention relates to a method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge.
  • PC or desktop search can be compared with finding stuff in your messy garage. You know you have it somewhere but just cannot find it. So to locate is the only goal. And when you do find it, you are the sole judge to decide if you have indeed found the right content or document because you collected or wrote the content in the first place. You are the only expert and authority that matters.
  • Web search The other spectrum of the search is Web search. There, the story is more like driving in Boston for the first time. You are not necessarily the expert of the topics you are looking for and you are learning a new subject. Sometimes, you search to find new services such as weather, travel, or shopping. With Web search, you are counting on millions of people on the Web to help you and you do not necessarily know or care who is the real expert or authority. As a result, you sometimes get bad advice or may shop in the wrong places.
  • FIG. 1 is a flow diagram showing the state of the art in enterprise search.
  • Dave is searching for particular information and retrieves 2,800 documents. There is no useful result that Dave found in the top ten results returned so, Dave calls Sam. Sam, in turn, searches and, finding nothing, e-mails marketing. Mark and Tina in marketing search and find nothing as well. Mark calls Eric, Nancy, and Ganesh and the answer is found in Ganesh's design document. Tina calls Eric, Nancy, and Ganesh again and everybody is now upset.
  • it would have been more useful for Dave if he had found Ganesh's design document in his initial search. In fact, the document may have been there but among the 2,800 documents located, but it was not possible for Dave to identify the most useful document.
  • Enterprise search tends to repeat itself quickly based on the user's role and the situations he is in.
  • This person who may have 300 roles and profiles in their personal life, has a much smaller number of work roles, e.g. a half dozen at most. He might be an engineer, working in the Paris office while he is a member of the cross-functional cultural committee.
  • the keywords in the enterprise searches are more like hints, even fishing bait, to documents a person is looking for. It is thought that eighty percent of people seek information they have seen before. Given the enterprise user predictability, we can safely rely on self-motivated actions and behaviors to collect unbiased feedback.
  • the Web or consumer world is very heterogeneous, while an enterprise is the opposite: homogeneous, or more precisely, segmented homogeneous, meaning that different departments or groups (sales vs. marketing vs. engineering) in a company might be different (segmented), but within a group, people are very similar or homogeneous in the way they work regardless how different their profiles are.
  • the problem with enterprise search technology has become acute to many CIO's and business executives. In the inventor's own limited surveys of a dozen CIOs and business executives, people ranked the enterprise search priority problem as a 9-10 out of 10.
  • the challenge of traditional full-text engines is poor relevancy. They are good for everything (all content) and good for nothing (irrelevant results) at the same time.
  • the NLP technology achieves better relevancy by focusing on one application and one domain where human language becomes more deterministic.
  • the problem with the NLP is that the solution is placed in a silo and good only within that specific application, while enterprises are operating on hundreds to thousands of applications. It is not possible for employees to log on to these many systems one by one to look for information. Both classes of solutions also suffer from the inability to adapt to changes once deployed. Taxonomies and structures change quickly over time in enterprises.
  • the invention addresses the above limitations of state of the art enterprise search by leveraging what should be depended on for enterprise search: one's peers and experts in and out of the company.
  • the invention provides systems that identify, extract, analyze, and use the expertise ranking to produce personalized, precise search results to the user so that they do not have to call, email, etc.
  • the inventors have discovered a set of unique approaches to enterprise search that is different from all existing IR (information retrieval) based solutions, such as Verity, Autonomy, FAST, Endeca, and Google Appliance.
  • the inventors carefully analyzed the characteristics of enterprises in contrast to the Web search environment, and applied a set of methodologies in related disciplines from technology development, academic research, and social behavior.
  • the invention provides a technique that can work standalone or embed itself in other applications via a plug-n-play interface with minimum effort. The result is a huge improvement in search usefulness, relevancy, search federation across applications, and cost savings.
  • the preferred embodiment of the invention also leverages traditional search technologies.
  • the invention provides relevant information discovery by taking a completely opposite approach to that of traditional search theories and technologies.
  • traditional content search technology and products use content as the basis for guiding searches. It employs techniques such as information retrieval (IR) algorithms, natural language processing (NLP) techniques and rules, product or structural taxonomy, or page ranking by link count.
  • IR information retrieval
  • NLP natural language processing
  • Traditional data search relies on building database indexes on key words or numbers in database rows or columns. It crawls and indexes the content and data, generates inverted full-text indexes or database indexes with word tokens or phrases, potentially assisted by taxonomy and paging ranking for improving search results.
  • the search results using traditional search technology are poor, with large amounts of low-relevancy hits. For many business processes, when a search fails, users have to resort to alternative, expensive ways of acquiring information that either take a significant amount of time for the user, or worse yet, involves others to help find the information (see FIG. 1 ).
  • the invention provides a system that starts with the people in and around enterprises. After all, enterprises are made of specialists and experts possessing expertise and know-how. They conduct work and repeat their work patterns frequently on a role-by-role basis. The system detects and captures the expertise and work patterns stored in people's brains and exhibited in their daily behavior, and creates a behavioral based knowledge index. The knowledge index is then, in turn, used to produce expert-guided, personalized information. This process is transparent to the experts themselves, and therefore efficient and extremely economical to employ.
  • FIG. 1 is a flow diagram showing the current state of the art in enterprise search
  • FIG. 2 is a flow diagram showing traditional IR-based search models
  • FIG. 3 is a block schematic diagram showing system architecture according to the invention.
  • FIG. 4 is a flow diagram showing the capture of behavioral relevancy by an embedded application according to the invention.
  • FIG. 5 is a screen shot showing an inline user interface according to the invention.
  • FIG. 6 is a screen shot showing an inline user interface rendered using Java Script tags according to the invention.
  • FIG. 7 is a screen shot showing a popup user interface according to the invention.
  • FIG. 8 is a block schematic diagram showing expert-guided personalized search across applications according to the invention.
  • FIG. 9 is a screen shot showing a user library according to the invention.
  • FIG. 10 is a second screen shot showing a user library according to the invention.
  • FIG. 11 is a third screen shot showing a user library according to the invention.
  • FIG. 12 is a flow diagram showing a document recommendation according to the invention.
  • FIG. 13 is a flow diagram showing an augmented search according to the invention.
  • the invention comprises a set of complementary techniques that dramatically improve enterprise search and navigation results.
  • the core of the invention is an expertise or knowledge index, also referred to as an expertise repository, that makes observations of website and web application visitors.
  • the expertise-index is designed to focus on the four key discoveries of enterprise search: Subject Authority, Work Patterns, Content Freshness, and Group Know-how.
  • the invention produces relevant, timely, cross-application, expertise-based search results.
  • traditional Information Retrieval technologies such as inverted index, NLP, or taxonomy tackle the same problem with an opposite set of attributes than what the enterprise needs: Content Population, Word Patterns, Content Existence, and Statistical Trends.
  • a further embodiment of the invention makes the novel technology work within existing an enterprise application and repository environment transparently so that no user training or adoption of new interfaces is required. It also supports all legacy full-text or NLP search technologies, such as Verity, Autonomy, Endeca, and the Google Appliance. In fact, it works on top of those technologies and uses their base result as a foundation for refinement.
  • a third embodiment of the invention comes from leveraging open source technology, such as Lucene, for building a scalable network query engine that binds all dimensions of the information source and indexes into one set of meaningful results.
  • the invention embeds itself in any existing Web applications such as www, CRM, ERP, and portals etc. via a simple change to the search results interface.
  • SOA service-oriented architecture
  • stub code so that search traffic can be inspected and re-ranked by an expertise index.
  • any web page, not just search results pages can be configured with the invention to provide active guidance to the user without requiring the user to enter a query.
  • the invention does not require users to explicitly vote, provide feedback, or utilize other mechanisms that commonly result in collaborative filtering. It relies on people doing their normal jobs selfishly and leaving a trail of evidence of what they need and prefer to get their job done.
  • the reliance on selfishness is fundamentally different and far more reliable guidance than the traditional collaborative filtering, where users are instructed to vote for other people.
  • users are asked to give feedback, most people do not do it because they lack time or it is not a priority.
  • Both Amazon and eBay have negative experiences in using traditional collaborative filtering techniques to accomplish ranking by similarity.
  • Implicit action buttons are embedded as part of the search results to capture critical cues of user intention and preferences as the users do their job. For example, a common portal may give users “view,” “download,” “print,” and “email” buttons as the actions to reflect their intention when discovering relevant content. “View” might be a weaker indication, while the others are strong indications of preference.
  • the invention develops additional implicit observations that predict visitor intentions with strong confidence. These observations include the ability to detect think time, virtual bookmarks, virtual print and virtual email. In all cases, visitors have not performed a bookmark, print, or email against the content, but they keep the content up on the computer screen for a long time, i.e. long enough to use the content as a reference for work. These observations are cross checked among peers and experts before they truly become useful for the community.
  • buttons can be added to track clear cues of user behavior. “Save to library” indicates a strong, explicit endorsement of content given the query, while “remove” or “demote” indicates strong dislike of the content given a query and a role.
  • the library is virtual and does not physically live on a browser or even on the PC. It is the main user behavior tracking object or a journal. Again, explicit relevance ranks higher than implicit relevance, but both are managed under one per-user library object.
  • the inventive system can identify and demote results that are less relevant or irrelevant to a group of users of the same role, in a manner that is analogous to the spam email reporting scheme. For example, if three engineers remove their interest in a document, other similar engineers should not see this document highly ranked, while sales employees may still give this document a high ranking.
  • the data in the library can be mined or learned before the system goes live. It continues to improve itself, adjust, and adapt to the real business usage of the content and their queries.
  • the library stores user profiles and attributes, all queries, relevant content URIs, one or more indexes for all relevant content and data, query caching, content and data caching, access time, personalized ranking formula, proximity hashing, and a loading ratio control for the privacy policy considered.
  • a desktop version of My Library can be added to provide content caching, content push/update/alert, and disconnected content access.
  • One aspect of the invention concerns examining multiple personal libraries or behavioral journals.
  • enterprises start to analyze many people's journals from peer and expert dimensions, great insights on information consumption and employee productivity emerge.
  • Peers are defined herein as a group of users with common interests, such as products, topic sets, events, job roles, locations etc.
  • Experts are defined herein as a group of visitors with a different knowledge and skill sets than the person querying or browsing, but the person querying and browsing depends on the experts to do his job effectively.
  • an engineer has a peer group of other engineers, and also has an expert group made up of product managers, sales people, some customers, HR staff, and office assistants. Peers and Experts can change when context is changed.
  • An engineer, John may play roles beyond the organization he is in. He could be a cross-functional committee member, and physically work in London office. So John has three contexts that he is part of. This context is referred to herein as a Domain Expertise Network or DEN.
  • An employee may belong to several DENs. Various types of DENs are discussed in greater detail below.
  • Expertise Index System also referred to as an Expertise and Behavioral Repository: This element is key to the invention. It is a server based system with a service-oriented architecture (SOA) using Web services, XML, J2EE, and other foundational technologies.
  • SOA service-oriented architecture
  • Behavioral Instrumentation Also referred to as a Work Monitor, this element is responsible for implanting and recording user behavior on various business applications.
  • the application search form is one of many observation posts that the invention implements. Browser and application navigation, file upload, Web plug-in, page-tags, email server and client integration, content management, document management, records management, and collaboration systems are all common places for instrumentation.
  • the invention also goes back in time, and parses common log files, such as Web server logs, query logs, directory files, e.g. LDAP, to build and extract historical or base level expertise.
  • This element is a per user object described above.
  • This element is the work relationship object, described above, that connects personal, peer, and expert associations, and that records repeated role-based enterprise work patterns.
  • NUNI Non-Uniformed Network Index
  • Contextual Mapping and Dynamic Navigation With the help of personal, peer, and expert journals, the NUNI index can not only produce good search results, but also provide additional contextual information that the users are not directly asking via their search queries or keywords. The contextual results can be presented back to the users in a search result sidebar, or as part of personalized, dynamic navigation. Dynamic navigation is discussed in greater detail below.
  • This element generates various reports based on the behavioral journals and NUNI index.
  • the Expertise Index System focuses on enterprise Subject authorities, Work Patterns, Content Freshness, and Group Know-how to deliver expert-guided, personalized information.
  • FIG. 3 is a block schematic diagram showing the system architecture of a preferred embodiment of the invention. A more detailed discussion of various aspects of the architecture is provided below.
  • the architecture consists of a server farm 20 , a customer enterprise 22 , and a user browser 21 .
  • the user browser is instrumented with an extension 23 and accesses both customer servers 25 at the customer enterprise 22 and the server farm 20 via a load balancer 27 . Communication with the server farm is currently effected using the HTTPS protocol. User access to the customer server is in accordance with the enterprise protocols.
  • the browser extension 23 is discussed in greater detail below.
  • the extension 23 as well as the enterprise extension 24 , are constructed such that, if the server farm 20 does not respond in a successful fashion, the extension is shut down and the enterprise and browser interact in a normal manner.
  • the features of the invention are only provided in the event that the server is active and performing its operations correctly. Therefore, failure of the server does not in any way impair operation of the enterprise for users of the enterprise.
  • an extension 24 is also provided for the enterprise which communicates with the load balancer 27 at the server farm 20 via the HTTPS protocol.
  • the enterprise also includes a helper 26 which communicates with the server farm via an agency 31 using the HTTPS protocol.
  • the agency retrieves log information from the enterprise and provides it to log analyzers 28 , which produce a result that is presented to the usage repository 29 .
  • Information is exchanged between the affinity engine 32 and the browser and enterprise via various dispatchers 30 .
  • the browser itself provides observations to the server and receives displays in response to search queries therefrom. These observation and displays are discussed in greater detail below.
  • a key feature of the invention is the affinity engine 32 which comprises a plurality of processors 33 / 34 and a configuration administration facility 35 .
  • a form of information also referred to as wisdom, is collected in a wisdom database 36 .
  • the operation of the affinity engine is discussed in greater detail below.
  • the inventive system is typically installed as an enhancement to an existing search system based on conventional engines provided by vendors, such as Verity, Autonomy, Google, etc.
  • This content and data search system based on conventional technology is referred to as the existing search mechanism.
  • the inventive system is implemented as a wrapper for an existing search mechanism.
  • the query is handled initially by the system.
  • the system typically forwards the query to the existing search mechanism. It may also perform one or more searches or related operations against its own internal indexes and databases. Once the results from the various searches have been obtained, they are merged together into a single set of results.
  • the actual presentation of these results is at the discretion of the customer, who may either take the raw results data from the system and present them using a JSP, CGI, or similar mechanism, or else use the default search results page provided with the system, possibly customized using cascading style sheets or other similar techniques.
  • Each document in the results is generally presented along with a variety of possible actions for the user to take on the document.
  • the available actions are site-configurable, and can include, for example, “think”, “view,” “download,” “email,” or “print.”
  • the system is informed when a user selects one of these actions for a particular document. That data are then used to infer the relevance of a particular document with respect to the query that yielded it.
  • the system might infer that the document has certain actual value to the user for that query, while if the user selects a more permanent action such as “print” or “download,” the system might infer that the document is highly relevant to the user.
  • the system can detect virtual print or download to give an accurate approximation as if a physical print, download, or bookmark has happened.
  • the techniques rely on detecting activities of users on the browser for a certain amount of time, e.g. over one minutes, where documents remain open for a long time, i.e. long dwell.
  • the system might infer that the results were irrelevant to the user. This data are retained and used to influence the results of future queries by the user and to generate quality metrics.
  • the system maintains a library of content reference and/or use for each user.
  • the library is also called the behavioral journal. This library is similar in some sense to bookmarks in a Web browser, though it is not necessarily visible to the user. Indeed, the user may not even be aware of its presence.
  • a document name and its location may be added to a user's library automatically when certain actions for a document are selected from the search results.
  • a document could also be added to a user's library explicitly with an optional “add to library” action from the search results.
  • the presence of a document reference in a user's library generally indicates that the document is of particular interest to the user. Thus, if the results of a query produce a document that also appears in the user's library, its ranking is typically improved.
  • a document it is possible to add a document to a user's library directly, without first encountering it in search results.
  • a document need not be indexed by, or even accessible to, the existing search mechanism.
  • it because it is present in the user's library, it can still be merged into the final search results if it matches a query, and it is therefore available in the results produced by the system. Content discovered in this manner is typically quite valuable and so is usually given particular preference in the result rankings.
  • People in businesses relate to each other in a number of different ways. For example, there are relationships between peers in a group, between superiors and subordinates, or between subject matter experts and seekers. When these different kinds of relationships are modeled and observed, they reveal insights that can be used to influence and refine search results. For example, if several members of a group of peers all find a particular document to be helpful, then there is good chance that other members of that same group would find the document helpful as well because members of a peer group typically have similar interests. Similarly, if someone is seeking information about a particular subject, then documents that a known expert in that subject found useful would probably be valuable to the seeker as well.
  • the system maintains one or more named relations for each user to represent these kinds of relationships between one user (the subject) and other users (the related users) in the system.
  • a relation is formally the set of users that have a particular relationship with the subject.
  • a relationship can be two-way or one-way.
  • a two-way relationship applies equally in both directions between the subject and the related user.
  • user A has a two-way relationship with user B
  • user B has the same kind of relationship with user A.
  • An example of this might be a peer relationship, which could describe two users who are in the same organizational department or who have similar job descriptions: if user A is a peer of user B, then it is also the case that user B is a peer of user A.
  • a one-way relationship is directed: if user A has a one-way relationship with user B, it is not necessarily true that user B has that same kind of relationship with user A.
  • An example of this might be a superior-subordinate relationship: if user A is a subordinate of user B, then it is not the case that user B is a subordinate of user A.
  • related users are users of the system, they have libraries of their own.
  • the system can search the libraries of some or all related users as part of a query and merge any hits into the results.
  • the degree to which results from a related user's library biases the baseline results can be configured both at the relationship level, e.g. experts have a larger bias than peers, and also at the user level, e.g. some peers may exert more influence than others.
  • Peers This is a two-way relationship intended to represent users with common interests, job roles, locations and other factors. People can belong to multiple peer groups based on different contexts. The system develops the peer groups through learning. Peer group change and adapt according to community and business changes.
  • Experts This relationship represents skill sets or knowledge a person possesses. The system detects experts by examining the community and individuals who have the ability to discover and collect the most and useful documents having the most impact. Experts are relative. An expert today may become less so if the person stops to be the connection to the most useful content.
  • DEN Domain Expertise Network
  • Multiple peers relations or DENs allow a user to identify several different peer groups that are each relevant at different times, e.g. a departmental group for day-to-day operations, a special interest group representing a committee membership, etc.
  • Multiple experts groups allow a user to have several different sets of experts focused on different subject areas.
  • FIG. 4 is a flow diagram showing the capture of behavioral relevancy by an embedded application.
  • an information seeker is using a business application.
  • a search such as “sales preso on bonds”
  • the server performs various data mining activities and produces a result for the information seeker.
  • the invention makes observations of implicit relevance actions, such as “view,” “download,” “print,” and “e-mail.”
  • the server also makes observations with regard to explicit relevance actions, such as “save to library,” and actions similar to spam control, such as “remove.” These items are discussed in greater detail below.
  • the observations made by the system are used to determine the value of a particular document to a searcher.
  • the system accumulates information about the value of the document and then develops a usefulness measure for the document, as discussed in greater detail below.
  • FIG. 5 is a screen shot showing an inline user interface according to the invention. Because the tags used in the system are configurable and customizable, the user interface can be made to blend into an existing Web site for a particular enterprise. The example given in FIG. 5 of a public Web site.
  • FIG. 6 is a screen shot showing an inline user interface (UI) rendered using JavaScript tags according to the invention.
  • UI inline user interface
  • This particular example shows the “most popular” tag, which gives a list of the most popular documents to the end user.
  • the UI is rendered using JavaScript tags.
  • Other tags, such as “next step,” “similar documents,” and “preferred” are rendered in a similar fashion.
  • FIG. 7 is a screen shot showing a pop-up user interface according to the invention.
  • this interface is rendered using JavaScript tags. This particular example shows a “next step.” This tag fades in and, when closed, out to enhance the user experience.
  • the pop-up dialogue is also configurable to blend into any existing Web page's style.
  • the system has an active interest in knowing which documents users find helpful or relevant. However, users cannot generally be relied upon to indicate explicitly to the system when a particular document is considered helpful or relevant. Instead, the system has to infer this information from actions users would take anyway on a document, with or without the system.
  • buttons or links for typical actions with each document in the search results. Because these actions are available with a single mouse click, as opposed to the multiple clicks that are typically required to perform most actions using normal browser controls, users tend to use them rather than the standard browser controls for performing these actions. Furthermore, because these buttons or links are under the control of the system, the system is able to take note of the actions a user takes with respect to a document. Thus, users are given a convenient mechanism for performing actions they would perform anyway on documents in a set of search results, and the system is able to monitor these actions.
  • each button or link representing an action has a URL associated with it.
  • a URL would refer directly to the associated document.
  • these URLs instead refer to a CGI, servlet, or similar mechanism associated with the system.
  • the URL contains information about the user, the document, and the action the user wants to perform.
  • the system logs the action and related information, and then redirects the request to either the original document, in the case of simple “view” type actions, or some other kind of Web resource to complete the requested action.
  • An optional “add to library” action is available for documents as well. As the name implies, this action adds the document to the user's library. This is a way for users to inform the system explicitly that a document is particularly useful. A user's primary motivation for using this action is to ensure that the document is considered favorably in future queries because documents in a user's library are generally given improved rankings.
  • URLs for the configured actions are provided along with the other usual data for each document in the results. It is the customer's responsibility to ensure that these URLs are used for the various actions users might take on the result documents, or else the value of the system is diminished.
  • the system uses more generic facility to observe user behaviors against all content during search and navigation.
  • the observations are made implicitly without user participation other than their doing their normal browsing and searching. Observations are consolidated from either search and navigation and then used to improve future search and navigation.
  • the system also benefits from knowing the original query that yielded a document on which a user takes an action. For example, if the system notices that a user issues the same query later, or if it notices several different users making the same or similar queries, it can increase the ranking of documents in the new query that were found interesting in the original query. However, because query strings can be rather cumbersome, it is not always practical to include them in the action URLs. Instead, the system maintains a database of query strings and issues a unique ID for each. This unique ID can then be included with the action URLs presented in the search results. When a user takes an action on a particular result document, the system can determine the query that produced that particular document by looking up the query ID.
  • the system uses blended search to enhance search results.
  • a blended search a single query is passed to two or more separate search processors, each of which produces a set of zero or more documents, referred to as the result set, that match that query in some fashion. Depending on the configuration and circumstances, the same document may show up in one or more of the result sets from these various searches.
  • search processors Once all of the search processors have completed the requested query, their result sets are merged together into a single result set. Rankings are assigned to individual documents in the merged result set using a configurable formula that takes into account such factors as the number and/or type of search processors that produced the document and the document's ranking within each of those individual result sets.
  • the two distinct search processors need not be distinct software entities. For example, the same search engine running against two different indexes and/or with different configuration parameters could constitute two distinct search processors. More important is that two distinct search processors should typically yield different results for the same query. One might consider that each search processor offers a different point of view for a query.
  • Each search processor can be assigned a weight that determines the degree to which it influences the rankings in the merged search results. This weight can be either a static constant, or a dynamically computed value that varies according to the query, results, or other circumstances.
  • Search processors can, but do not necessarily, run independently of each other. Some search processors can be configured to take the result set of a different search processor as input and manipulate it in some way to produce its own result set. This kind of search processor is referred to as a filter. Filters are useful for such tasks as narrowing the results from a different search processor, e.g. removing documents that are too large, too old, etc., or modifying them in some way, e.g. computing summaries or titles from document contents, adding annotations from revision logs, manipulating the ranking score, etc.
  • a search processor that does not filter the output of another search processor is referred to as an independent search processor.
  • An ordered sequence of search processors in which the first is an independent search processor and the second and subsequent search processors acts as filters for the search processors preceding them is referred to as a pipeline.
  • the individual search processors that make up a pipeline are also referred to as stages.
  • the result set of a blended search is formed by merging the output result sets of one or more pipelines.
  • each pipeline produces a score for each document in its result set that is used for ranking the document's relevance.
  • these scores are normalized to the same range, then multiplied by a scaling factor. If the same document appears in more than one pipeline's result set, the scores from each result set are added together to form a single score in the blended result. These accumulated scores determine the final rankings of the documents in the blended results, with the highest scores being given the best rankings.
  • the existing search mechanism that is being wrapped by the system is referred to as the baseline processor.
  • Any other search processors are referred to as ancillary processors.
  • a baseline processor is normally built on top of conventional search technologies and is therefore capable of standing alone as an adequate, though sub-optimal, document search mechanism. Amongst other things, this implies it should have access to the majority of public documents in an enterprise, have a query processor capable of handling typical requests from most business users, and that it not act as a filter stage in a pipeline.
  • Ancillary processors on the other hand, have fewer such requirements: they may have access to only a handful of documents, they may or may not use a conventional search engine to accomplish their goals, and they may in fact participate as a filter stage in a pipeline.
  • the system can in fact be configured with two or more baseline search processors. This is sometimes referred to as federated search, in which the results of otherwise independent search engines are merged. Though this is not necessarily a goal of the system, it is a beneficial special case of its blended search technology.
  • FIG. 8 is a block schematic diagram showing expert-guided personalized search across applications according to the invention.
  • the server is shown including information about the user's library, “My Lib.”
  • the user's browser 21 is shown having a “My Lib” view.
  • the source of this view includes searching of a business application, Web searches, and other business application information. This creates a network effect so that other applications can use the server as well.
  • the user's library is a behavioral journal. It can be embedded in other applications and is, therefore, not just a new user interface or application.
  • the contents are created by user search and discovery and are generally invisible to the user.
  • the analytics of the system allow the improvement of quality and provides bridge silos. As discussed herein, there is a form of spam control implicit in operation of the invention.
  • the system provides dynamic personal navigation support.
  • a proximity hash, loading ratio, and privacy C policy are also implemented.
  • the invention operates in the form of a browser and desktop plug-in and includes content update and caching.
  • the information accessed in connection with the invention is pursuant to a domain expertise network, discussed in greater detail elsewhere herein, that consists of individual information, peer information, expert information, and community information.
  • FIG. 9 is a screen shot showing a user library according to the invention.
  • FIG. 10 is a second screen shot showing a user library according to the invention.
  • FIG. 11 is a third screen shot showing a user library according to the invention.
  • the system can be realized with different search processors provided in connection with the affinity engine ( FIG. 3 ), that can be combined in different ways to accomplish different goals.
  • the following discussion describes several of the more common search processors that are available.
  • This search processor is an independent baseline processor that generates its results by issuing a query to an existing Lucene index (see http://Lucene.Apache.org).
  • the result set that it produces includes a content locator and a relevance score that is a floating-point number in the range of 0.0 to 1.0.
  • This search processor is an independent ancillary processor that searches a particular user's library for documents that match a specified query.
  • a Lucene index is maintained for each user's library, so this search processor is essentially a special case of the Lucene baseline search processor running with a different scaling factor against a different index.
  • This special case of the library search processor runs against the library of the user that has invoked the original query. It normally runs with a relatively large scaling factor. Thus, documents in which the user has previously shown interest and which match the current query tend to receive elevated rankings.
  • This search processor is an independent ancillary processor that searches the libraries of the related users in a given relation. It is conceptually similar to invoking the library search processor for each of the related users then merging the results. In practice, this can be optimized in a number of different ways, for example by performing each library search in parallel, or by maintaining a separate merged index for the entire relation.
  • This search processor is a case of the relation search processor that has been specialized for one of a subject's peer relations. If the user has more than one such relation, the specific relation to be used for a given search can be determined in a number of different ways.
  • This search processor generally runs with a relatively high scaling factor, thus elevating the rankings of documents that both match the query and reside in a peer's library.
  • one-way relationships are transitive: if user A has a particular one-way relationship with user B, and user B has a similar one-way relationship with user C, then if the relationship is transitive it can be inferred that user A has this same one-way relationship with user C. If a given relation represents a transitive one-way relationship, then the transitive closure of that relation is the union of the members of the original relation with the members of the same relation for each of those related users. In a full closure, this process is continued recursively for each of the related users and each of their related users, etc. until the full tree of transitive relationships has been computed. In a partial closure, the recursion is limited to a particular depth.
  • the transitive relation search processor is an independent ancillary processor that searches the libraries of all users that belong to a full or partial closure of a specified one-way relation.
  • a single recursion depth can be specified for the entire relation, or a separate recursion depth can be specified for each member of the starting relation.
  • This search processor is a special case of the transitive relation search processor that has been specialized for one of a subject's expert relations. If the user has more than one such relation, the specific relation to be used for a given search can be determined in a number of different ways, as outlined for the My Peers search processor.
  • this search processor runs with a high scaling factor thus causing content selected by experts to be given elevated rankings.
  • the freshness search processor is a simple ancillary filter processor that captures this difference by increasing the scores of more recent documents and decreasing the scores of older documents.
  • the degree to which a document's score is changed varies according to its age. Thus very recent documents might have their scores increased more than less recent documents, and very old documents might have their scores decreased more than middle-aged documents.
  • the thresholds and ranges for the various types of scaling are all configurable, making it possible, for example, to set up a filter that only penalizes old documents without enhancing new documents, or contrarily, to penalize new documents and enhance old ones.
  • Some documents are the canonical correct answer to certain queries. For example, in organizations that must pay special attention to regulatory matters, e.g. HIPPA, SOX, etc., a query related to a particular procedure is ideally answered with the most current, official description of that procedure, possibly to the exclusion of all other documents.
  • the explicit bias search processor is an ancillary processor that recognizes certain queries or query keywords and injects a fixed set of documents in the results for those queries, each with a fixed score, usually a very high one. This is generally done without a formal search index. Typically, it is configured with a simple table that maps keywords to documents. It can be configured as either an independent processor or a filter. When it is configured as a filter, it can further be configured to either replace or supplant the input results. When the explicit bias search filter does not find a matching keyword, it leaves the input results unmodified.
  • Some search topics tend to recur regularly in any given enterprise, typically with a small number of documents in the results towards which everyone gravitates.
  • the system can detect these popular results by noticing when the same query is issued multiple times and then watching which documents are acted upon most frequently in response to these queries.
  • the popularity search processor is an ancillary filter processor that puts this knowledge to use. It detects popular queries and then increases the ranking of documents in the results that have historically been selected by previous users making the same query. In practical terms, it is similar to the explicit bias processor, except that the table of keywords to documents is generated automatically by the system from data obtained by analyzing the query and action logs.
  • the system watches both queries and the actions taken on the results of queries, it can monitor the quality of its results dynamically. This is then used for such purposes as return-on-investment (ROI) reports or feedback on site design.
  • ROI return-on-investment
  • a simple form of feedback on search quality can be found be comparing the query logs to the action logs. If a user query produces no corresponding actions, or perhaps only yields actions on poorly ranked documents, then the system can infer that the query produced poor results. On the other hand, a query that yields several different actions, particularly to highly ranked documents, might be considered good.
  • Another dimension for quality feedback is to compare actions on documents that would have been found by the baseline search processor to those that would have been found only by the system, or perhaps to documents that were found more easily because of the system.
  • the system can accomplish this by taking note of which search processors contributed significantly to a document's relevance score when an action is actually taken on that document. If the only significant contributor to a document's score was the baseline search processor, then the system can infer that it did not add any particular value to that result. On the other hand, if one or more of the search processors contributed significantly to the document's score, then the system can infer that it did add value to the result.
  • the system can dynamically produce interesting and valuable ROI reports. For example, one report might be to compare the ratio of good-to-poor search results for queries that were enhanced by the system to the same ratio for queries that were not enhanced by the system. If a dollar cost is assigned to poor queries, then the difference in the cost of poor search results rendered by the original search system and those rendered by the system can be computed. Another report might concentrate on the amount of time the system saves its users. For example, a document that was found only by a search processor and not the baseline search processor might be assumed to save the user two hours of manual research, while a document that was pushed from a low rank to a high rank by search processors might be assumed to save the user 30 minutes of research. If a cost per hour is assigned to the user's time, then a cost savings for using the system can be computed.
  • the system is designed to wrap an existing document search mechanism, it necessarily employs a number of technologies that are not intrinsically its own.
  • the following discussion describes these types of non-core technology used by the system. Those skilled in the art will appreciate that the following is only an example of a presently preferred embodiment of the invention, and the other technologies may be chosen to implement the invention.
  • the bulk of the system is implemented using version 1.5 of the Java language, and all classes are compiled using the Java compiler supplied by Sun Microsystems in 1.5.04 of their Java Software Development Kit. It is presumed to run correctly in any JVM supporting version 1.5 of the Java language. If customers do not provide a JVM of their own, version 1.5.04 of the Sun JVM are used by default.
  • Java servlets and Java Server Pages The current implementation is written to version 2.4 of the Java Servlet specification and version 2.0 of the JSP specification. It should, in principle, run in any application server supporting those specifications. If customers do not provide an application server of their own, version 5.0.28 of the Apache Tomcat application server is used by default.
  • the system uses version 1.4.1 of the Lucene search engine to manage user libraries.
  • the current implementation includes support for Lucene version 1.4.1. If customers do not provide a baseline search engine of their own, a basic implementation using Lucene version 1.4.1 is provided.
  • Any conventional Web server can be used with the system to serve regular content.
  • the reference implementation of the system uses Apache 2.0.52.
  • a tool that configures a target web application with new capabilities such that the new capabilities can be demonstrated live within the Web application, though the Web application has not been modified in any way.
  • An automated process is provided that enables the evaluation of a set of software capabilities within existing Web applications by guiding the evaluator through a series of steps and automatically provisioning the necessary infrastructure to support the evaluation.
  • the process is virtual in that it requires no changes to the target Web application and no installation of software.
  • the invention comprises a virtual environment that is implemented using proxy technology.
  • the system is used by a prospective customer to access a system Web site. This allows the prospective customer to see the “before they know” and “after they know” impact of the system against the prospective customer's live application.
  • a virtual environment is created that mimics the prospective customer's live application, without copying the live application's content.
  • the system nonetheless performs interception and augmentation in this proxy environment without physically possessing any content or interfering with the structure of the live application.
  • the invention may be used to do instrumentation without having to go physically into a customer's application environment, get logging, or the customer's IT department involved. Thus, the customer does not really know there is a change, but can see the impact.
  • This embodiment provides a virtual proof of concept (POC) that automates the sales process for the system.
  • POC virtual proof of concept
  • the intent is to capture interest through the Website where a visitor comes in interested in a product.
  • the user accesses the system with a click-through to the POC.
  • the system automates the process of going through the POC. Once they have conducted the POC, the service is turned on, and they are now a paying customer.
  • the users are allowed to “try it,” for example. They enter their email address.
  • the system validates the email address with a first-level screening. Then, the system sends an email after they try it, and maybe a link to where they can see screens about how the system works.
  • the system In phase two, the system generates a demonstration room for them. This is based on some of the information they gave the system in the first step, and in addition, the system now requires them to upload some information, a log file, for example, to provided the system with some historical information about how their Website has been used. The system then takes the log file, automatically generate a set of reports that explain to them what the expected increase in value the system can provide. The system then goes through an automated process and creates a “before and after” picture of what their site looks like before the system and then after the system.
  • This aspect of the invention concerns method that derives a score which describes the usefulness of an electronic asset.
  • the computation of usefulness measures the actual usefulness of any electronic asset based on user behaviors with respect to the assets. Given a topic, there might be hundreds or thousands of relevant documents but only a few that are useful.
  • Usefulness measures how useful a document is for a given user, while relevancy measures keywords that match with the content.
  • Usefulness scores are computed for any electronic asset and for arbitrary user population sizes, ranging from millions to a single user.
  • the invention detects usefulness.
  • relevancy for example, if one is learning Java programming, there are hundreds of relevant Java books that can be used to learn Java. Are they all useful? No. If one wants to really learn Java, one should ask a Java guru what books to read, and they probably will recommend two or three books, instead of hundreds of Java books. Thus, relevant books comprise all these hundreds of books, while useful books are the two or three very useful ones. This usefulness is based on the knowledge of experts, community, and peers.
  • Expert, peer, and community knowledge is automatically extracted and assembled by the present invention based on observed behaviors of the user population. As user behaviors change over time, the system adapts its representation of expert, community, and peer knowledge. User behaviors can be recorded in real time (through various means of observation described elsewhere in this application) or extracted from existing log files of user behavior. On an ongoing basis, the system can continue to improve performance, based on ongoing real-time observations and getting continuing updates of the log files.
  • the updates amount to the differences in the log file, for example on a month-by-month basis. That is in addition to the information the system captures based on observations. There are certain things that are in the Web logs that the observations do not track and there are certain things that can be observed in real time that existing web logs do not track. There is more activity than is needed in runtime or user time, but it is interesting to look at these after the fact and draw generalizations from the broad sets of data that are captured in these log files.
  • This aspect of the invention concerns a method that enables the system to identify changes in the behavior of its user population, with respect to both electronic assets and members of the user population themselves, and automatically adapts its operation to self-correct for changes.
  • Self-correction enables systems to identify and adapt to changes proactively before they are obvious, while minimizing the need for administrative intervention to keep systems maintained.
  • this aspect of the invention concerns an attribute of the system, i.e. inherited nature of system, because it observes peoples' behavior. As peoples' behaviors change, their preferences change, and their useful content changes. The system automatically adapts to that change. Thus, the system is, by default, a self-learning system that can correct itself because, when people start to correct themselves, the system follows them.
  • the Inventive Technology is Content Type Independent or Content Agnostic
  • the preferred embodiment of the invention comprises an independent and content agnostic system because the system does not look at the content itself. This is unlike traditional search technology, which parses content, picks up key words in the content, and uses those key words to select results.
  • the invention in contrast, is not concerned with what is in the content, but about the location of an asset and how people interact with that asset. The invention does not care what that piece of content is. It could be a text file in a simple case, but it can also be a video file, which has no text to parse and no index to be built in the sense of traditional technology.
  • the Inventive Technology Seeds the System from Web Server Logs, Search Engine Logs, Web Analytics Server Logs, and Other Log Files so that it can Generate Value From Day One of the Operation.
  • Supervised guidance can be accomplished through administrators by assigning experts and peers based on their roles, reputations, and expertise etc., although it is not a necessary step Such information can also be inferred and extracted from historical log files. Because the system is a learning system, it can derive more value over time as people use the system. This aspect of the invention concerns seeding technology that makes the system useful from day one. It may not be 100% useful, as it would be down the road, but it would give at least 50% to 80% of the value.
  • the Web server log which is actually a recorded history of what has happened in an enterprise, is used. It does not have the fine-grained information that is ultimately needed, but it has coarse-grained information.
  • the log file provides historical information.
  • the preferred embodiment uses weeks to months worth of a log file depending on the site's traffic patterns.
  • the invention provides a way to take something a user already has, i.e. the log file, and turn it into a resource that is used to seed the system. Then, over time, the system learns more because the invention is making observations by means of the extensions to the browser or the scripts that are running, as discussed herein.
  • the system takes advantage of not only basic logs, but also the analysis that is generated from those logs by higher order analytics which are available commercially from various companies known to those skilled in the art.
  • the Invention Federates Across Multiple Applications, Websites, and Repositories Without Crawling or Indexing the Applications.
  • Federation is an attribute and a natural fallout of the core technology herein.
  • the traditional approach to searching is to have multiple indexes, each of which is linked to a different repository or different application. A search is performed against each repository with separate indexes for each repository that are not cross-searchable.
  • a federated search is automatically provided because when people use an asset in a context of an application, they do not care where they use it. They can use one particular piece of content in one sort of a silo, and the next minute can move into a different silo, e.g. start with a CRM system and then move into an ERP system. In this way, the user created a trail, i.e. a virtual link of the various systems.
  • the inventive system can recommend information from the multiple different data sources, such that that federation is automatic because the user is creating the federation. That is, the user's pattern of usage of information from and across various data sources creates the federation.
  • the invention herein does not require crawling of Web sites or applications, or indexing of the applications or the contents thereof. Further, the invention respects any security that is already in place.
  • a significant challenge in building federated search systems is that federated search systems must understand and work with the underlying security of these applications. It is difficult to do this because each application generally has its own security model. Generally, security models are not shared across different applications.
  • the federation of search while protecting security is a huge challenge.
  • the invention is unique in the sense that it does this naturally, without any specific adapters, and it guarantees that it can preserve perfectly the underlying security mechanism for that application. This is done in a very unique way. The system goes through the browser instead of implementing proprietary modules to preserve security.
  • search engine is actually building up an index of all of the content.
  • search engine When a search is performed, one cannot simply bring back a list of search results and then prevent somebody else from clicking on the list if they do not have access to it. So, in effect, the search engine replicates multiple security models in one index.
  • the inventors have recognized that there is no need to do this because the system has a browser where a user queries through the system. The system then accesses its database of content and, in return, provides a list of results.
  • the system does not filter out all the content at this time but, instead, filters as the results are returned,
  • the system provides technology inside the browser that checks each of these repositories in real time if this user in this session can access this content. If the answer is no, the user is prevented from reviewing the content. It is kept off the list. The user does not even know it came up.
  • the primary driver for whether the browser has access is the person who is logged into the browser at the time, based on the person's privileges in the system, which determine whether the person can see the results. If the person can not see some of the results, the system does not show these results.
  • the system is, in real time, asking the application if a particular user, e.g. the user currently logged in, can access the content right now.
  • This aspect of the invention accomplishes personalized search by knowing who a user is thus, when the user exhibits certain behaviors when using the system, the user is self-identifying, e.g. through cookies, logins, etc. Even if the user is an anonymous user, the system places a cookie in the user's browser. Thus, when the user is using the system he leaves a personal trail, and the system then personalizes information based on who the user is. In the system, no one predefines relations based on personalization because the system is based on the user's behavior. The user's affinity with other people creates a space, referred to as a club. Thus, a user can form his own clubs implicitly by exhibiting interest in one area. No one actually is monitoring the user. The clubs are established all through the user's behavior.
  • Controlled Deployment of the Invention for Risk Management and Acceptance Tests reduces the product deployment risk by controlling the number of people who can see the product features in the live running system.
  • a special cookie is set and sent to a controlled testing population. With that cookie, the users of the site can see the invention's features while the general users have no visibility of these features. This is a desired way to deploy new technology in enterprises.
  • Augmented Search This feature of the invention blends traditional full-text search with preference and activeness information from global, peer, and expert population to give users precise answers.
  • This aspect of the invention states how the index is used.
  • the invention can augment a search for a better result.
  • a search request is made to the customer's Web server and a result is obtained.
  • a request for search along with the Web server results, generating query, and user id is sent to search server.
  • search server A response comes back.
  • the system sends augmented results back in search server format and the client renders the HTML.
  • Top N is a list of most useful information based on the context and driven by the usage of the community.
  • the context may be set by a user query, explicit specification of topic, or presence on a particular web page or group of pages, for example.
  • the invention also creates an insight called top ten, e.g. the top ten most important, related, useful pieces of information, given a topic or given a context.
  • the user can see information based on context-driven usage of the information by the community.
  • Top ten is a popularity result. Give the user the ten most popular links that have to do with a query term (context), or maybe no term. If there is no term, then the top ten most popular pages are returned. For all of these views, one can apply a filter, e.g. only look at the top ten that fall within the category of technology, or only look at the top ten PDF files.
  • Predictive Navigation Provides Short-Cuts for Browsing Navigation Based on where Similar Users Start and End on an Application.
  • This aspect of the invention predicts navigation and shortcuts traditional navigation based on previous navigation by peers and experts, including where they started and where they ended. Thus, the starting point and end point are critical to predict the user's navigation, to try to shortcut the middle part of a series of navigational steps, and send the user straight to the destination without wasting time in a lot of other places.
  • Predictive navigation is also referred to as “Next Step,” and depends on which calculations or results one wants to display.
  • Predictive navigation uses the navigation trail.
  • a navigation trail There is a notion of a navigation trail; the system tracks the last N pages a user has been to. The system keeps a history of the query that was used to start this trail, if applicable. Thus, the user searches and a result comes up. The user may click on the result. The user may click on the result. The user may click again to go somewhere else. The user keeps clicking. The system accumulates this history of pages. It also notes that this entire history was due to a query. The system tries to, based on the user's history and other observations in the past, figure out where the user is going. The recommendations that come back are pages ahead of the user that other people have found useful, based on the history, i.e. the trail that the user has accumulated in this session. The system thus tries to match where the user has been and figure out where he is going.
  • This aspect of the invention states how the index is used. Using community insight, the invention can augment a search for a better result.
  • Zip through concerns the idea of having content preloaded on the system. As the user is going down the results, the system shows the user a preview of what that link is. Thus, instead of having to go to the page and changing the page that the user is viewing, the user just zips through. If he sees the content he wants that is the one he clicks into.
  • the peers and experts are not hard drawn circles but a web with hub concentration and connections among various hubs. Information and knowledge is aggregated together and a fusion effect of wisdom is created naturally and automatically.
  • the invention essentially uses the same information to identify the user community. Who are the peers? Who are the experts? Not only does the invention identify what content are user would like to see, but also it can identify the actual people who are the user's peers. The grouping of peers is naturally formed. There are no hard boundaries on them. There are affinities in these people.
  • This aspect of the invention provides far more accurate prediction of content usefulness than traditional content voting or survey.
  • Document rating driven by usage reflects the real value of content.
  • implicit observation is very important. If you ask people to vote on content, you tend to get biased results. You also get a sample that is highly skewed because most people do not have time to vote, to survey, to do anything that is explicit. People who do vote tend to have extreme opinions. They have a lot of time on their hands. They are outspoken and opinionated. Thus, they tend to misrepresent the entire population. The sample does not match the population. They also tend to vote negative more than positive. Thus, the invention preferably does not use explicit voting. It takes account of implicit actions.
  • the user requesting a print is implicit because he's doing something else at the time he's making the request.
  • the user is not giving feedback and not being asked for feedback.
  • the invention exploits passive observation, not active feedback. Although, some embodiments could include active feedback, such as negative feedback.
  • This embodiment concerns method that observes information about electronic assets, the behavior of the user population with respect to electronic assets, and the changes in assets and behavior over time to discern the relative value of the assets to the user population over time.
  • the method identifies a spectrum of values ranging from short-lived Information, to mid-range knowledge, to long-lived wisdom.
  • the system provides an on-going, content agnostic, and adaptive institutional memory for the enterprises.
  • Computational wisdom means that the wisdom is a form of community behavior that, with regard to a set of assets, does not change over time. There are four items stated above in terms of how frequently people change opinions.
  • Content for example, is the least reliable thing to trust because content can change.
  • Information is at a second level of trust. If information stays stable in view of people's opinions, that set of information becomes knowledge. If knowledge can go through time and continue to be used and supported, then that becomes wisdom. So, wisdom cannot change year to year. Knowledge may change from month to month, and information may change from day-to-day. Content by itself does not mean anything.
  • the Invention Provides Content Gap Analysis Through the Use of Content, Data and Application Instead of Relying on Content Owners' Speculation on What's Missing, Hot, Good, and Bad.
  • the gap analysis report provides the ability to detect gaps in the content. The assumption is the content is there, they just can not find it. Frequently there are gaps. For such cases, the system ascertains what people are actually looking for and what is missing. Someone in a traditional search or navigation situation might search for something, and then either they fail, or if they do not resign to failure, they may search again, or they might potentially try to navigate. Through either of these mechanisms they might have success or failure.
  • the system addresses this problem when someone starts to exhibit search or navigation behavior. It is known precisely what they are looking for, and the content that over time starts to get surfaced in search and navigation is the content that the community, itself, has deemed useful, without regard to a developer or merchandiser, or what merchandisers thought about the content. Thus, somebody is looking for something, they think it is going to be useful, but they are not finding it. This aspect of the invention allows one to quantify how dire the need is for this information.
  • the system provides the information flow over time, and helps company to manage information logistics.
  • This aspect of the invention uses an applied information gap as a division flow over time to identify how information flows.
  • the system allows one to understand what people are requesting and what content is available that can meet those needs. In time, it is possible to see the flow of information going in and out from division to division, from location to location. It is possible to see which location presents what information, or what group, or for what number of people.
  • a dashboard for company or industry efficiency against information consumption can be measured, and white-collar workers productivity can be derived for the first time.
  • This aspect of the invention is related to the ability to identify experts and peer-enabled companies with the expertise at a global scale, which allows the system to provide a dashboard for finding out who knows what and what can be done where.
  • This method identifies firms that are purchasing keywords and banner ad space on public websites for advertising purposes.
  • the invention looks at both common and uncommon keywords, as well the context of given banner ads, and automatically generates a list of firms who are prospects for improved lead generation through their websites.
  • the invention uses information found in the online ad itself and combines it with other public information sources to create a refined list of firms.
  • the system then back-traces to the buyers of the ads, and automatically includes that information in the candidate prospects list.
  • This aspect of the invention helps firms who wish to retain customers or increase lead generation. This is accomplished by increasing the conversion rate of sponsored ads e.g. Google and Yahoo ads.
  • sponsored ads e.g. Google and Yahoo ads.
  • the system can guide this user to the most useful information on the given website or collection of related websites. Without this capability, users who arrive at website no longer have the benefit of these public search engines directing them to the most relevant information.
  • the invention routes these users to the most useful information by observing where the community found the most useful information, given the initial query context from the public search engine. There are two steps in this process: 1) first, the system captures the collective wisdom on where useful information lies given the query context; 2) secondly, as users come in from Google or Yahoo, the system leverages this collective wisdom to guide people to exactly the right content.
  • This aspect of the invention concerns anchor text. This is very different than Google because when Google is page-ranking they parse every single page, and how many links, and other things.
  • the invention parses links that are used by people. A link is a dead link unless it is used. So, if someone clicked on a link, then this link is useful by this person.
  • used links are cross-examined by many uses of peers and experts, in addition to that of the individual users. The peer and expert implicit validation and endorsement reduces noise from individual behaviors and strengthens the signal-noise-to-noise ratio.
  • Usage of a link and the importance of the text is determined by a blended vector of user, peer, expert, query context, and time.
  • This aspect of the invention successful use of link, determines how an individual user behaves. In addition to looking at a link itself, where if a user clicks on it, it is useful, the system also does additional analysis on how many other peers similar to the user click on these links. For example how many other experts are different than the user, but the user depends on them to do his job, and who also clinked on the link. Thus, there is a two-level value: individual use of content, and that of the peer-group and expert group. These dimensions give the total value of the data.
  • the inventive system deploys a unique way of letting the community implicitly create context terms that describe what the content is about.
  • the context terms are learned through watching users conducting queries and finding useful results via the queries. It also includes navigation trails with the link text associated to the each use.
  • the system builds its vocabulary by watching how visitors uses various terms to describe content and how a site uses links to describe content. Again, more used queries and links are more important and associated with content, while a link text that yields no use of a content in the downstream trails has no association to the content.
  • JavaScript Tags In this method, the page is instrumented with a piece of JavaScript. This JavaScript detects link usage and text and sends this information onward to a server.
  • the browser is instrumented with a piece of software.
  • This software detects link usage and text and sends this information to a server.
  • the access logs for a web site are analyzed via a special program—the Log Analyzer—which detects usage of links and sends this information to a server.
  • the system client comprises three general areas: the UI, the observer, and the proxy.
  • the client comprises a Web browser that entails the client UI (see FIGS. 5-7 ).
  • the client includes a JavaScript observer to make observations on usage of the Web page at the client.
  • One embodiment of the invention comprises a sidebar UI that shows the recommendations from the system engine.
  • This aspect of the invention is embodied as JavaScript tags that generate the JavaScript necessary to display the UI.
  • enterprise Web content is displayed on the page and along the side there are system generated recommendations.
  • a variant of the UI provides a popup, where the user clicks on something on a page, e.g. an icon, that calls into system code to display a popup in context.
  • the UI also comprises an API for fetching results from the system server. This is in lieu of gaining results directly from the enterprise installed search server.
  • the user types in the search term, clicks on search, and the search hits their Web servers.
  • the Web servers then go back to their search server.
  • the search server returns the result in some format, usually in XML, and then they have a presentation code on the front end to parse through the XML and present the search results.
  • the invention operates in a similar fashion, except when their Web server goes back to the search server, instead of going back directly to the search server, it goes back to a server side extension.
  • the extension then fetches the initial results from their search server, feeds that back to the system, and the system either reorders the results or, in any case, enhances the results, possibly adding more entries. This is provided back to the extension, and the extension reports back to their Web server. Their Web server continues on as it did before, parsing through the XML, reformatting, and sending it back to the client.
  • the JavaScript observer is a piece of JavaScript code packaged as a tag that is given to the user, and the user instruments their page using this tag.
  • the JavaScript tag resides on the client page and makes observations. For example, a scroll or a dwell observation. If the end-user, for example, is reading a page, he would conform to what is defined as a “dwell.” Once a dwell occurs, i.e. once the JavaScript observer has observed a dwell, it then sends back that information to the server. The server accumulates these observations.
  • the augmented search concerns the notion of reusing the UI that the user has, instead of standing in between the presentation layer and the search server and augmenting the search from there.
  • Predictive navigation, top ten, more like this, context expansion, and zip through are all types of tags that the user can put into a page, and they all use a different algorithm in creating suggestions.
  • the recommendations that come back are pages ahead of the user that other people have found useful, based on the history, i.e. the trail that the user has accumulated in this session.
  • the system is thus trying to match up where the user has been and trying to figure out where he is going.
  • the proxy demonstrates the system UI on a user's pages where the system does not have access to the source code of those pages. These users are prospective customers who do not want to give out access to their pages.
  • the system wants real-time pages, but with our tags injected into the page to show the user what the page would look like with the system enabled. To do this, the system uses a proxy that goes and fetches the page from the URL and then, based on a configuration of rules, alters the HTML of that page, and then sends that page back to the browser.
  • the proxy sits between the browser and the target Web server, i.e. the potential customer's Web server.
  • the proxy itself has its own URL, and it just passes in the target URL as part of that URL.
  • a URL for purposes of this embodiment of the invention consists of two URLs.
  • the URL used actually points to the proxy, but embedded in the URL itself is the target, i.e. the customer URL that you want the proxy to go to.
  • the URL goes to the system first and reconstructs the URL, bringing you to the customer page. Then, the proxy makes an HTTP connection on your behalf to fetch the page for the customer site. It looks at the page and applies a set of rules to it, and then sends the page back to the user.
  • the page is first instrumented with tags.
  • the presently preferred format of the tag is in JavaScript (JS).
  • JS JavaScript
  • the customer incorporates a JS file on their Web server. Then they refer to it in HTML with a script tag.
  • HTML HTML
  • the file sets up an object at the system.
  • a place is set up in the page where the UI is displayed.
  • the system sends back HTML to the UI.
  • the administrator on the customer side specifies a style sheet on the tag. Even though it is the same HTML, because the style sheet is different, the user gets a different color scheme and font, for example.
  • a plug-in may be provided that serves a similar purpose as the proxy, in that it also modifies the HTML and comes back with a search result.
  • the user configures the plug-in.
  • the plug-in takes the search request, performs a search for the results, sends them back to the system, which then augments the results and sends them back to the plug-in.
  • the plug-in does a swap of the HTML that is displayed.
  • the plug-in displays a modified page from the system.
  • the plug-in also makes observations. Because it has more access to the browser functions than JavaScript does, it has a better ability to capture a wider range of observations.
  • Top Ten and predictive navigation work the same way as discussed above for the UI.
  • the only difference is in the request. For example, when the user asks the system for augmentation, the system is asked for a specific calculation.
  • the JavaScript observer sits and waits and observes the user on the page. An observation is made of a user action and the observation is sent to the system, including all the information about the observation, e.g. what page it is, if there is user information.
  • Dwell is observed when the user has spent N number of seconds or minutes on a particular page.
  • Range can be selected as a matter of choice, but is typically around 30 seconds to five minutes, depending on the complexity of the document that being inspected.
  • An excessive amount of time means the user walked away from the computer
  • the system preferably does not capture an upper threshold because the user may be reading the document.
  • a scroll concerns a scrolling of the screen.
  • the anchor text is the hyperlinked text that got the user to the present page. It could be as simple as the user clicking on a news items that then brings up some recent news about the subject.
  • Think is a usage pattern, i.e. a combination of a dwell and a scroll, or some action, mouse movements, etc., that indicates that the user is thinking about the page. Thus, think is a period of inactivity followed by some action.
  • Mail is when a user mails or forwards the content to another user, or virtually emails it in a similar fashion of virtual bookmark with the intent to mail.
  • FIG. 12 is a flow diagram showing document recommendations according to the invention.
  • various term vectors T 1 , T 2 exist, as well as a peer/expert population.
  • a term vector is available and is compared with the term vectors of every other document such that the top N matches are selected.
  • the most popular terms for the N most documents are found and these are added into the term vector as well.
  • activeness information may be obtained for every document and the new term vector can be compared to the term vector of every other document.
  • the two can then be combined 118 and the top N can then be selected 119 .
  • the concepts of term vectors and document searching are discussed in greater detail below.
  • the invention uses a similar strategy to the way in which a search is performed inside of a document.
  • the known approach represents everything in a vector space model. This approach takes a document and makes a vector, which comprises all of the document terms in term space. This is done for every document.
  • the search is represented as another vector in term space.
  • the invention uses a vector space model, but the way it builds the vector to represent a document has nothing to do with what is in the document. It has to do with what other people have searched on and have hit this document. For example, an individual performs a search on the term “chip design” and perhaps other words. He might have ended up finding a document. It might have been low down on the list of returned results, but he might have ended up finding it. As soon as he does, and he finds it, the invention then associates whatever he searched on with that document, and that contributes to the vector. There are other ways to fill out the term vector, e.g., through a users navigation behaviors (described later), or through explicit input by the user. Thus, the invention builds representations of the document based on how other people are using it.
  • the invention gives every individual user their own term vector for a document. Accordingly, every user gets to say what their opinion is on what a particular document is about. Some people may have no opinion on certain documents. However, if somebody has ever performed a search and used the document, for example, their opinion gets registered in their vector for that document.
  • the invention allows various functions to be performed, such as “I want to match all the documents, but I don't want to look at everybody's opinion;” or “I want to look at just my peers' opinions or the experts' opinions.”
  • the invention takes several of the term vectors from different people, sums them together, gets a view of what that population thinks that document is about, and uses that result to connect people to the right documents.
  • this aspect of the invention provides a search enhancement that relies upon the novel technique of a usage-based topic detection.
  • the term vector provides a vector space model to represent usage-determined topics.
  • the invention also comprises a vector that looks at what documents each user uses. Every user has one of these, called the activeness vector. Every time they use a particular document, the invention notes that they have used it. Every bucket in the activeness vector is a particular document or asset, and the bucket keeps an accumulation of usage. Some buckets may have zero. Some might have a huge number. Different actions on an asset by a user, e.g., reading or printing, contribute differently to the activeness vector.
  • An activeness vector for a population can be generated based on the activeness vectors of each of the users in that population. For example, for a particular population, e.g. seven users, the usage vectors are summed to determine how much a particular document is used in that population.
  • the invention combines these two pieces of information, i.e. the term vector and the activeness vector, to help recommend documents. For example, there might be a document that matches perfectly in terms of topic but is not used very much. Even though both vectors concern usage-based information, one vector concerns an amount of usage and the other vector concerns the context of usage.
  • the invention brings these two numbers together to suggest a document.
  • the invention can also incorporate the results from an existing search engine (which amounts to topic match based on the contents of an asset) with the term and activeness vectors if we want.
  • Each individual has his own library (collection of used assets) and for each document in an individual's library, the system has the user's term vector, which represents what they think the document is about.
  • the system also has their activeness vector which indicates how much they have used that document in any context. It is now possible to bring together any given group of users and ask for their collective opinion on a document.
  • There are also the global usage vectors which are the sum of everybody's vectors.
  • usage vectors for the group of anonymous users. Everybody who is unknown all contribute to the same usage vectors. When combining vectors to create a collective view, there can be a different weight for different people's vectors, but the sum of everybody is the global vector.
  • the invention also comprises users' peers and experts.
  • Peers The way that the invention determines peers is similar to the way that peers are determined in collaborative filter applications, such as Amazon.com, but in this case the determination is based on document usage.
  • the invention looks at two user's document usage (activeness) vectors, e.g. one person has used this document, this document, and that document; the other person has used the same three documents, therefore they're similar. That is, there is a document usage similarity that, in this case, is established when the invention compares the two user's activeness vectors.
  • two users that overlap on a significant set of used assets are considered to be peers regardless of other assets used only by one user or the other.
  • the invention can also look at the actual terms that people use: do they search on similar terms? Similar documents? Similar search terms, or a blend? Thus, the invention considers term usage. Another consideration is the user's area of expertise. Consider, for the moment, that two people have expertise vectors, and that their expertise vector is a term vector as well. It is a term vector that, instead of representing a document, represents a person and what they know. It could be from a profile or it could be automatically detected based on what they use.
  • the system can automatically determine a user's expertise based on what assets they use and what topics they tend to spend time on. Expertise is validated.
  • the invention looks at a person's collection. We ask the global population what they think those documents are about, not what the user said they were about when he searched for them, which is the user's term usage, but what the global population says these documents are about.
  • a picture emerges as to what that user's expertise (which can also be represented by a term vector).
  • a user can not self claim what he expertise is.
  • the population of other users ultimately determines the expertise vector.
  • the system looks at the expertise vector. For example, if a user is searching on chip design, the system looks at every user's expertise vector, and finds out who is most expert on chip design. The system selects, e.g. the top 30 experts. The system then finds those 30 experts' term vectors and activeness vectors for every document, sums them together, and then performs a comparison.
  • Asset Impact Factor is a measure of how useful an asset is to a particular population, such as the global population, for the given topic. Once the impact factor is computed for assets in the entire collection, every user's library can be assessed in terms of impact factor of included assets. Using this method, users with relatively many high impact assets in their collection of used assets are considered to be experts on the given topic. Such users may also be assigned an Expert Impact Factor, reflecting the impact of that user's asset collection on a given population for a given topic.
  • the user can ask what documents are similar to this e.g. ask for “More like this.”
  • the system can compare this document's vector to every other document's term vector.
  • the invention is looking at the term vector, which is determined by a group as to the relevance of terms to a particular space. Therefore, there is a second measure on top of the term vector, which is the measure of relevance. It is therefore possible to say that this document is relevant in this space, not just that it has these words in common.
  • the system performs navigation tracking based on usage in a user-specific way.
  • the invention can also track where people go from one document to another.
  • a user may end up going to a particular document looking for information, but may then have to click through several documents before finding the useful documents.
  • the system can recommend that a user landing on the initial document go immediately to the useful document. That is, the invention makes an association between documents that are found to be useful, even where other documents may have been encountered by users along a navigation path.
  • the invention recommends going straight to a document useful document, without having a user navigate through documents that may normally intervene. This is based on usage: for each user there is a matrix representing connections between visited documents and the most useful documents arrived at from that location.
  • a collective opinion e.g., of the user's peers, experts, or the global population, in this case regarding where are the most useful places (assets) to go to from the current location.
  • the invention keeps track of the navigation patterns of the user's peers.
  • identified navigation patterns can also be used to provide visualizations of user activity within an asset collection on a global or population-specific basis.
  • Such visualizations or usage maps can, for example, be very informative to site designers in understanding the effectiveness of their site and understanding user interests.
  • the invention counters this using a validation technique. For example, if there is a fad that the system is reinforcing, e.g. people go to a particular document all at once, but they are not going to return, e.g. a fad movie. Everybody goes because they hear that it is good, but in fact, it is terrible and they hate it. A lot of people go to the document, but no people come back to it.
  • the invention adjusts the usefulness of the document by looking at the percentage of people that come back, and determines if there are enough people coming back to validate it as a legitimately useful document.
  • the system encourages that attention, in a sense, because it may be something that is new and important. But, if over time, people do not start coming back, it is going to decay away quickly. Accordingly, the invention considers both the notion of newness and the notion of validation.
  • the invention Besides connecting a user from document to document, the invention also uses navigation to find information that identifies what a document is about. When somebody clicks on a link and goes straight to a document and uses it, that tells the system that the user clicked on this link thinking he were getting something and then he used it. Whatever the link text is, it is a decent reflection about what this document is. The system then uses that link text and now that link text also contributes to the term vector. It is as if the link is a substitute for a query and, in fact, if the user has clicked on, e.g. ten links going all the way through this document, the system uses the link text, with some weighting because not every link has the same weight. It depends on how close the link is to the document. If the user clicked on one word in a document and clicked on another link, then another link in the next document, the most recent link gets the most weight, and earlier links get less weight. Thus, in this embodiment weighting is based on a form of proximity.
  • Another aspect of navigation addressed by the invention concerns where a user starts with a particular document, e.g. document 1 , and the system makes various recommendations based on this starting point.
  • the system may recommend a different set of documents.
  • the invention employs an additive model which look sat the documents that the system recommends from, e.g. document 13 and looks at the documents it recommends from document 1 , and the system weights them together. In this way the system may use the user's navigation trail (encompassing one or more navigation points) to suggest the best location to go next.
  • the person skilled in the art will appreciate that there are so many options available for processing the systems vector information
  • One aspect of the invention concerns determining what is a successfully used document.
  • One approach looks at how much time has somebody spent on the document.
  • Document processing time is the simplest measure of successful use because the system only need look at how much time somebody's on a document. Another consideration is that seeking time is not the same thing as processing time. In this case, the user is scrolling around and not finding what they want, i.e. they are not processing it.
  • the system can take the time spent on the document, subtracting out the time that the user is scrolling on the document.
  • the system can also use scrolling as an indication that the user is actually on the document.
  • the system can apply a combination of scrolling and pausing and use that to get a sense of how long the user is actually processing the document. So, if somebody scrolls, then they stop, and they sit there for 30 seconds, then they start scrolling again, the system can make a guess that the user was reading that document for 30 seconds. Any mistake in the guess is an aggregate because the system rarely looks at just one person's opinion, but is summing this information over a group of users.
  • the term vector is a big vector of every possible term.
  • the system increments a term's vector entry each time the term is associated with a document through user behavior. Terms that are not associated with a document get a zero. On top of this, terms that are associated through usage with many documents get less of a weight in the term vector because they are very common. Thus, the system looks at all the terms that are known and looks at how many documents those terms are associated with. Terms that are associated with many documents have a count for the number of documents that the term is associated with. The system applies a formula to lower the term's rating based on its association with many documents. If there is a word that is in every single document in a collection, then it's rating is equivalently a zero. Certain words, such as “the” and “and” are standard stop words and are removed from a search before they even get to the system analytics.
  • the system commences with the creation of the initial term vector based on a data structure.
  • a term doc matrix This is the collection of a user's term vectors for each document known to the system.
  • every user has a term doc matrix that represents for each document, what that user thinks the document is about. For example, one person thinks a document is about oil and about refineries, but not about something else.
  • the system services a particular population that is already selected, which can comprise for example peers or experts, e.g. the person's top 30 peers.
  • the system knows of these 30 users, and each of these users has a weight based on how much of a peer they are to a current user.
  • the peer with the greatest weight is the top peer, the peer with the next greatest weight is the next best peer, on so on.
  • the invention looks at all of the term doc matrices of these peers and adds them together given the weightings based on their peer scores, and produces therefrom a single term doc matrix which is this population's opinion on every document in the system.
  • the system takes this matrix and calculates a cosine between a term vector representing a query or current context and each of the rows in the matrix, which represent term vectors for each document.
  • the result of the cosine calculation represents how closely a document matches the context according to the user's peers.
  • the system selects the top documents.
  • One way to do this is to select the top ten document and then sum them together to get a single vector which says in the aggregate that these documents are bout a certain topic. Then, the system takes out those search terms that are already used and looks at where the other peaks are. Those are additional terms that the system either wants to suggest or automatically enter to the core. Now there a new term vector, and the system goes through the same process. The system then can match this new term vector with every other document, get a new set of scores, and select the top ten of those documents. The system has a matrix that has everybody's opinion on what these documents are about. The system can now compare this new term factor and get a new set of scores for all of them. The system also goes and we get a single vector for each document which indicates how related this document is to the topic with another score, which is how useful this document is.
  • the system selects the top documents.
  • One way to do this is to select the top ten document and then sum them together to get a single vector which says in the aggregate that these documents are bout a certain topic. Then, the system takes out those search terms that are already used and looks at where the other peaks are. Those are additional terms that the system either wants to suggest or automatically enter to the core. Now there a new term vector, and the system goes through the same process. The system then can match this new term vector with every other document, get a new set of scores, and select the top ten of those documents. The system has a matrix that has everybody's opinion on what these documents are about. The system can now compare this new term factor and get a new set of scores for all of them. The system also goes and we get a single vector for each document which indicates how related this document is to the topic with another score, which is how useful this document is.
  • Every user has a document vector or their activeness vector, which identifies what documents that have been used.
  • Each user also has an associated weight based on the peer population. Given these two numbers for every document, i.e. how well it matches and how popular it is, the system combines the two to produce a score. One way to combine them is to calculate a weighted sum. Another way is to take the numbers as is, but anything that is below a threshold is removed.
  • the system looks at such actions is the user navigating in a certain way, staying a certain amount of time on a document, where are the links going.
  • the system collects that information to learn about the usefulness of a document.
  • the user starts by doing a search on a topic, e.g. oil.
  • the system responds by recommending certain documents.
  • the user clicked on one and printed it.
  • the system adds a number for that document connected with those words. If somebody else does a search and the term doc is involved, the matrix indicates that the document is relevant for a certain purpose according to a certain person.
  • the term match is good, and the document is recommended.
  • the user is not a close peer or the document does not have a good term match, or the document is not very active, then it is not recommended.
  • Another approach involves separate vectors related to peers, experts, and the global population, which are then combined with different weightings, e.g. the experts get the greatest weighting, the weighting could be profile-based, or it could be based on user responses to a series of questions.
  • Every user has a term doc matrix that captures what the user thinks every document is about, and they have an activeness vector that expresses how much the user has used these documents.
  • This activeness vector is not only used through search. It could be used through navigation and is built up based on search terms or link terms. To determine peers and experts proceeds as follows:
  • the global population has a term doc matrix which represents what the global population's opinion is about every document. This is essentially a sum of every single user's equally weighted opinions on the document. This is a global opinion.
  • For each user look at what documents that user has used, and that they have used them a certain amount. This step involves determining the expertise of this user. For example, this user has used document one at a weight of, e.g. four, so when the system goes into document one it determines what the global population think this document is about. Take that, multiply it by four, and add it to the user's expertise vector.
  • the system does that for every document in this user's collection. The things the user has used the most get the most weight in terms of their expertise. Each time the system adds what the global population thinks about the documents that the user has used. Thus, expertise is a measure of what expertise does a user's collection represents. The system does not know what a user's actual expertise is. It could be somebody who has done an excellent job of collecting all the right documents on this topic, but if he's done that, in a sense, he serves the purpose as an expert. That is, if the user has all the good documents on that topic, therefore that collection is an expert collection.
  • the amount of weighting to give the popularity of documents is an issue. An amount of weight is given to how used a document was by this user and an amount of weight is given to how popular the document was in the population.
  • the system combines these numbers and recalculates expertise, every night for example. Thus, the system recalculates everybody's area of expertise because it might change on some basis, e.g. a daily or monthly.
  • the system goes through and calculates everybody's area of expertise, and then if it is desired to figure out who the experts are, given a particular query, the system takes that query vector and compares this query vector to every user's expertise vector. Then, the system can produce the top N of experts, and that is the expert population.
  • the system does not have a query and has a document but the user wants to know who the experts are.
  • the system can use the document itself to determine who the experts are.
  • the document itself has a vector, and the system can compare the vector of this document to the expertise vector of everybody and, given the topic of this document, determine who the experts are on this topic represented by this document.
  • Peers Every user has a term doc matrix and an activeness vector. There are three things that the system can look at and combine to determine peer-hood. One is to compare what the peer value is for, e.g. two users. The system makes this determination for everybody, but for now focus on two users. Look at one user's activeness vector and another user's activeness vector, and look at how similar they are. Two people that use similar documents a similar amount are similar, looking in the same places and at relatively proportional amounts. In this case, there is a similarity metric between one user's activeness vector and another user's activeness vector. Another way of determining peers is to look at what topics they are interested in.
  • the system compares interest vectors.
  • the system can compare the computed expertise vectors of each user to determine peers. Alternatively, the system could employ combinations of these approaches. Because the user has a particular subject in mind and the user has a certain number, the peers are weighted according to how closely they match that particular person. Some peers may have a closer number to the user's number. Some are going to have one that is smaller or larger, depending on what notation the system is using. In the end, there is a number that indicates how much like this person the user is.
  • the user might want a peer group of 30, then number 30 in the group might have a smaller weight, and number 1 in the group might have a greater weight, and everybody's in between have a weight between.
  • the system could also have a threshold that does not create any peers less than the threshold.
  • FIG. 12 is a flow diagram showing an augmented search according to the invention.
  • a search request is made by a client of customer libraries.
  • the search is sent to the search server and the extension makes a request for augmented information, for example from Google.
  • the augmented results are returned to the server and the results received are added to the server information which are then sent back to the search server in search server format.
  • the customer then receives the rendered HTML of the search.
  • recency bias is accomplished through decaying past usage patterns based on a time decay function.
  • Activeness vectors for example, might decay at a rate of 0.01% per day, such that activity that occurred in the past has less of an influence on the activeness vector than more recent usage.
  • term vectors can be set to decay at a specified rate, such that term to asset associations are biased toward more recent usage patterns. In this way, the usefulness of assets can be computed in a time-sensitive manner.
  • Assets that were useful in the past are not necessarily useful in the present. All information stored within the system, including peer scores and expertise scores, can be set to time decay in a similar fashion. Regarding activeness vectors, assets that are very new or newly rediscovered may need a boost above and beyond recency bias to enable their discovery by the population prior to strong usage patterns having had an opportunity to emerge. Thus, very new assets, defined as those assets whose very recent activity makes up a large proportion of their total activity over all time, may be given an additional newness bias. It is also possible for an administrator to assign a newness bias explicitly to certain assets or a collection of assets. This newness bias makes very new assets appear more active than they are in reality for a short period of time. It is also possible to identify periodic usage of assets and give activeness biases to assets as they reemerge at specific times of year, for example.
  • This aspect of the invention relates to relationships amongst terms and amongst terms and phrases that the system infers based on captured usage data.
  • a term affinity (similarity) matrix can be constructed that relates terms and phrases to one another. Terms and phrases with high affinities for one another are considered to be aspects of a single topic and may even be synonyms for one another.
  • the term affinity matrix can be constructed based on the frequency of co-occurrence of terms in users' queries or used links, or by the frequency of co-occurrence of terms in assets' term vectors, for example. This matrix, in combination with linguistic characteristics of the terms and phrases themselves, can be used to identify synonyms, acronyms, and atomic phrases automatically.
  • Atomic phrases are ordered sets of two or more words whose frequent occurrence together indicate that they should be considered a single multi-word phrase rather than multiple independent words.
  • the term affinity matrix in combination with navigational usage patterns and assets' term vectors can even be used to detect terms and phrases that are sub-topics of other terms and phrases. Because all such identified relationships between terms/phrases and automatic detection of synonyms, acronyms, and atomic phrases are based on usage by a community of users, identified relationships are inherently tailored to a specific community.
  • the invention is also useful in marketing lead generation for business Web sites, sales and channel partner extranets, customer support sites, vertical healthcare applications such as physician portals and patient research sites; vertical government applications such as citizen portals; and financial services and insurance vertical applications such as agent and advisor portals.

Abstract

The invention comprises a set of complementary techniques that dramatically improve enterprise search and navigation results. The core of the invention is an expertise or knowledge index, called UseRank that tracks the behavior of website visitors. The expertise-index is designed to focus on the four key discoveries of enterprise attributes: Subject Authority, Work Patterns, Content Freshness, and Group Know-how. The invention produces useful, timely, cross-application, expertise-based search and navigation results. In contrast, traditional Information Retrieval technologies such as inverted index, NLP, or taxonomy tackle the same problem with an opposite set of attributes than what the enterprise needs: Content Population, Word Patterns, Content Existence, and Statistical Trends. Overall, the invention encompasses Baynote Search—a enhancement over existing IR searches, Baynote Guide—a set of community-driven navigations, and Baynote Insights—aggregated views of visitor interests and trends and content gaps.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This is a continuation of U.S. provisional patent application Ser. No. 11/319,928, filed Dec. 27, 2005 which claims priority of U.S. provisional application Ser. No. 60/640,872 filed Dec. 29, 2004, all of which are herein incorporated in their entirety by this reference thereto.
  • BACKGROUND OF THE INVENTION
  • 1. Technical Field
  • The invention relates to electronic access to information. More particularly, the invention relates to a method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge.
  • 2. Description of the Prior Art
  • For years enterprises have struggled with ineffective search techniques. Compared to what is available on the public Web via services such as Google and Yahoo, there remains a dearth of highly relevant search solutions for content within the enterprise. With dozens to hundreds of independent application and repository information silos in each company, finding critical business information costs business hundreds of billions of dollars each year [source: A. T. Kearney]. Today, CIOs and business executives are revisiting enterprise search as one of the top business/IT challenges for the next few years.
  • Enterprise search needs are poorly served. Various search technologies have been developed to attack the challenge of searching the Web, searching individual user's computers (PC/desktop), and searching the internal business documents (enterprise). Each of these approaches are unique, but none provide an adequate solution for the enterprise.
  • PC (Desktop) Search
  • PC or desktop search can be compared with finding stuff in your messy garage. You know you have it somewhere but just cannot find it. So to locate is the only goal. And when you do find it, you are the sole judge to decide if you have indeed found the right content or document because you collected or wrote the content in the first place. You are the only expert and authority that matters.
  • Traditional PC search from Microsoft is based on parsing a file at the time of search. It is slow and can only find things in isolated places, such as file folders or email directories. The latest PC search introduces inverted index technology from Google, soon to be available also from Yahoo, Ask Jeeves, and Microsoft. They start to solve the speed and silo problems so that users can find information across personal file systems, Outlook or email systems, calendars, and other desktop environment.
  • Web Search
  • The other spectrum of the search is Web search. There, the story is more like driving in Boston for the first time. You are not necessarily the expert of the topics you are looking for and you are learning a new subject. Sometimes, you search to find new services such as weather, travel, or shopping. With Web search, you are counting on millions of people on the Web to help you and you do not necessarily know or care who is the real expert or authority. As a result, you sometimes get bad advice or may shop in the wrong places.
  • Web search before Google relied only on technologies such as inverted indexes, natural language processing (NLP), and database indexes. They were OK but not as good as it could be if counted the number of links that point at a page. As more sites link to your page, your page becomes more important, simply because webmasters behind the sites have gone through the trouble of adding those extra links to your page. Hence, the birth of page-ranking[tm] and the success of Google's business.
  • Enterprise Search
  • The enterprise, however, does not behave as the PC or the Web environment. Imagine you are looking for books to learn Java programming—you know your ultimate goal but there are hundreds of books about Java, which one should I read: it has to be exactly right. So a discovery process finds the right reference content, or information known by other experts in the company. The ultimate judge of good search results for an enterprise extends beyond just yourself. These arbiters of good results could be your peers or the experts that you depend on to do your job.
  • An example of the problem with enterprise search is shown in FIG. 1, which is a flow diagram showing the state of the art in enterprise search. In the example of FIG. 1, Dave is searching for particular information and retrieves 2,800 documents. There is no useful result that Dave found in the top ten results returned so, Dave calls Sam. Sam, in turn, searches and, finding nothing, e-mails marketing. Mark and Tina in marketing search and find nothing as well. Mark calls Eric, Nancy, and Ganesh and the answer is found in Ganesh's design document. Tina calls Eric, Nancy, and Ganesh again and everybody is now upset. Clearly, it would have been more useful for Dave if he had found Ganesh's design document in his initial search. In fact, the document may have been there but among the 2,800 documents located, but it was not possible for Dave to identify the most useful document.
  • Traditional enterprise search technology uses inverted index, NLP, and database index approaches (see FIG. 2). The major problem is that the current engine throws hundreds to thousands of search results per query back to the user. Anything that looks like Java or programming, is all mixed together for you to see. Much like email spam, search engines spam the user with numerous, out-of-date, irrelevant, unofficial, siloed, contradictory, and unauthorized results. Users give up quickly and resort to much more expensive ways to get the information including calling, emailing, chatting, or worse, starting to recreate, make up, or give up on the information that already exists.
  • Enterprise Search Exhibits a Unique Set of Characteristics
  • By comparing the key issues in enterprise search with that of Web or PC search, it can be concluded that enterprise search is unique and in direct contrast to Web search. In fact, what works for Web search does not and will not work for enterprise search, and vice versa. Five key attributes are considered in this regard: search guide, user behavior, freshness and credibility of the content, user homogeneity, and privacy concerns.
  • Primary Guide
  • On the Web, for example, Google's success has depended on page ranking as the primary guide. While page ranking has been effective to provide some sanity in the Web, the same effect will not happen for enterprise content search. Firstly, enterprise content lacks the large number of links needed to provide the page ranking guiding effect, nor are there incentives for enterprises to create these links on a sustainable basis. Secondly, the real goal of page ranking is to find the traces of human effort to indicate subject authority indirectly because it is next to impossible to find the real experts in the vast universe of the Web. For enterprises, you should not need to guess indirectly who might be the experts, you know who the trusted experts are, you hire them, and they work day in and day out in the company as specialists in their domain areas. Enterprise search should rely on them as subject authorities for relevant guidance and ranking.
  • User Behavior
  • User behavior is completely different between the enterprises and the Web. We as individuals on the Web have more faces than we might know. We could be men, fathers, sons, husbands, brothers, golfers, travelers, rock musicians, investors, and hundreds of other profiles all at the same time. When we search on the Web, the search tends to be one-off and all over the place. Also, the keywords we type in tend to be the search goals themselves. When we type in “weather,” we are looking for weather information. User feedback on the Web is not reliable because only a very small group of loud users have the time to give feedback and therefore skews the search results with their non-representative bias (how does this last sentence connects to the rest of the paragraph? Perhaps build a short paragraph that explains the bias in user feedback).
  • Enterprise search, however, tends to repeat itself quickly based on the user's role and the situations he is in. When one sales person is looking for some sales collateral, other sales people responsible for the same products in the same region are very likely in need of the same information. Equally important is the fact that this person who may have 300 roles and profiles in their personal life, has a much smaller number of work roles, e.g. a half dozen at most. He might be an engineer, working in the Paris office while he is a member of the cross-functional cultural committee. It is also important to note that the keywords in the enterprise searches are more like hints, even fishing bait, to documents a person is looking for. It is thought that eighty percent of people seek information they have seen before. Given the enterprise user predictability, we can safely rely on self-motivated actions and behaviors to collect unbiased feedback.
  • Freshness and Credibility
  • Web search rewards or ranks older content higher. The longer the content has been sitting there, the more likely it will be found because it has time for others to discover and link to this piece of content.
  • Enterprises want to behave differently. Fresh content reflects new business situations and, therefore, must be ranked higher so that more people see it. By responding to fresh content quickly, business agility is assured. A piece of content that is one week old may be better than one that is a year old, except that it is not good at all if today's content is available and shows something different than the one week old content. Enterprise search users do not want good enough content, they require the search result to be exactly right.
  • Homogeneity
  • The Web or consumer world is very heterogeneous, while an enterprise is the opposite: homogeneous, or more precisely, segmented homogeneous, meaning that different departments or groups (sales vs. marketing vs. engineering) in a company might be different (segmented), but within a group, people are very similar or homogeneous in the way they work regardless how different their profiles are.
  • The implication of this splitting attributes is profound. In a large heterogeneous world with millions of people involved, statistics is the only known technique to approach the problem in the effort of understanding what people like, want, etc. Web search relies on statistics correctly to find not-so-precise information for the users. The enterprise again is different. With small sample populations and homogeneous groups, statistics do not work. To understand them, you need to know their likes and dislikes. No predictions (what do we mean by ‘predictions’?), just awareness.
  • With this understanding of enterprise characteristics, it is seen that enterprise search needs to focus on subject authorities, repeated role-based work patterns, fresh and official content, and group know-how (a group's collective knowledge and expertise to do a job). Re-examining traditional IR-based (information retrieval) search, we realize that it focuses on the opposite. It relies on the whole content population (crawl and index it) instead of subject authorities, word or linguistic patterns instead of work patterns, older existing content instead of fresh or official content, statistical trends to predict instead of group similarity to know. There is thus a need for techniques that focus on the correct key characteristics of the enterprises.
  • The problem with enterprise search technology has become acute to many CIO's and business executives. In the inventor's own limited surveys of a dozen CIOs and business executives, people ranked the enterprise search priority problem as a 9-10 out of 10. The challenge of traditional full-text engines is poor relevancy. They are good for everything (all content) and good for nothing (irrelevant results) at the same time. The NLP technology achieves better relevancy by focusing on one application and one domain where human language becomes more deterministic. The problem with the NLP is that the solution is placed in a silo and good only within that specific application, while enterprises are operating on hundreds to thousands of applications. It is not possible for employees to log on to these many systems one by one to look for information. Both classes of solutions also suffer from the inability to adapt to changes once deployed. Taxonomies and structures change quickly over time in enterprises.
  • Current search software also suffers from traditional enterprise model with inherited expensive product architecture, design and marketing and sales model. A typical enterprise search deployment costs $500K to several millions after considering software licenses, services, training, and other related costs.
  • It would therefore be advantageous to transform how enterprise search technologies are bought and deployed with an improvement on cost and quality of search.
  • SUMMARY OF THE INVENTION
  • The invention addresses the above limitations of state of the art enterprise search by leveraging what should be depended on for enterprise search: one's peers and experts in and out of the company. The invention provides systems that identify, extract, analyze, and use the expertise ranking to produce personalized, precise search results to the user so that they do not have to call, email, etc.
  • The inventors have discovered a set of unique approaches to enterprise search that is different from all existing IR (information retrieval) based solutions, such as Verity, Autonomy, FAST, Endeca, and Google Appliance. The inventors carefully analyzed the characteristics of enterprises in contrast to the Web search environment, and applied a set of methodologies in related disciplines from technology development, academic research, and social behavior. The invention provides a technique that can work standalone or embed itself in other applications via a plug-n-play interface with minimum effort. The result is a huge improvement in search usefulness, relevancy, search federation across applications, and cost savings. The preferred embodiment of the invention also leverages traditional search technologies.
  • The invention provides relevant information discovery by taking a completely opposite approach to that of traditional search theories and technologies. As discussed above, traditional content search technology and products use content as the basis for guiding searches. It employs techniques such as information retrieval (IR) algorithms, natural language processing (NLP) techniques and rules, product or structural taxonomy, or page ranking by link count. Traditional data search relies on building database indexes on key words or numbers in database rows or columns. It crawls and indexes the content and data, generates inverted full-text indexes or database indexes with word tokens or phrases, potentially assisted by taxonomy and paging ranking for improving search results. The search results using traditional search technology are poor, with large amounts of low-relevancy hits. For many business processes, when a search fails, users have to resort to alternative, expensive ways of acquiring information that either take a significant amount of time for the user, or worse yet, involves others to help find the information (see FIG. 1).
  • Instead of using content as the starting point for information discovery, the invention provides a system that starts with the people in and around enterprises. After all, enterprises are made of specialists and experts possessing expertise and know-how. They conduct work and repeat their work patterns frequently on a role-by-role basis. The system detects and captures the expertise and work patterns stored in people's brains and exhibited in their daily behavior, and creates a behavioral based knowledge index. The knowledge index is then, in turn, used to produce expert-guided, personalized information. This process is transparent to the experts themselves, and therefore efficient and extremely economical to employ.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a flow diagram showing the current state of the art in enterprise search;
  • FIG. 2 is a flow diagram showing traditional IR-based search models;
  • FIG. 3 is a block schematic diagram showing system architecture according to the invention;
  • FIG. 4 is a flow diagram showing the capture of behavioral relevancy by an embedded application according to the invention;
  • FIG. 5 is a screen shot showing an inline user interface according to the invention;
  • FIG. 6 is a screen shot showing an inline user interface rendered using Java Script tags according to the invention;
  • FIG. 7 is a screen shot showing a popup user interface according to the invention;
  • FIG. 8 is a block schematic diagram showing expert-guided personalized search across applications according to the invention;
  • FIG. 9 is a screen shot showing a user library according to the invention;
  • FIG. 10 is a second screen shot showing a user library according to the invention;
  • FIG. 11 is a third screen shot showing a user library according to the invention;
  • FIG. 12 is a flow diagram showing a document recommendation according to the invention; and
  • FIG. 13 is a flow diagram showing an augmented search according to the invention.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Using Expertise and Behavior for Information Discovery
  • The invention comprises a set of complementary techniques that dramatically improve enterprise search and navigation results. The core of the invention is an expertise or knowledge index, also referred to as an expertise repository, that makes observations of website and web application visitors. The expertise-index is designed to focus on the four key discoveries of enterprise search: Subject Authority, Work Patterns, Content Freshness, and Group Know-how. The invention produces relevant, timely, cross-application, expertise-based search results. In contrast, traditional Information Retrieval technologies such as inverted index, NLP, or taxonomy tackle the same problem with an opposite set of attributes than what the enterprise needs: Content Population, Word Patterns, Content Existence, and Statistical Trends.
  • A further embodiment of the invention makes the novel technology work within existing an enterprise application and repository environment transparently so that no user training or adoption of new interfaces is required. It also supports all legacy full-text or NLP search technologies, such as Verity, Autonomy, Endeca, and the Google Appliance. In fact, it works on top of those technologies and uses their base result as a foundation for refinement.
  • A third embodiment of the invention comes from leveraging open source technology, such as Lucene, for building a scalable network query engine that binds all dimensions of the information source and indexes into one set of meaningful results.
  • Embeds in Application UI to Capture Behavioral Relevancy
  • The invention embeds itself in any existing Web applications such as www, CRM, ERP, and portals etc. via a simple change to the search results interface. For non-Web applications, similar work can be done by inserting SOA (service-oriented architecture) stub code so that search traffic can be inspected and re-ranked by an expertise index. Further, any web page, not just search results pages, can be configured with the invention to provide active guidance to the user without requiring the user to enter a query.
  • Reliance on Self-Interest
  • The invention does not require users to explicitly vote, provide feedback, or utilize other mechanisms that commonly result in collaborative filtering. It relies on people doing their normal jobs selfishly and leaving a trail of evidence of what they need and prefer to get their job done. The reliance on selfishness is fundamentally different and far more reliable guidance than the traditional collaborative filtering, where users are instructed to vote for other people. When users are asked to give feedback, most people do not do it because they lack time or it is not a priority. When forced, people check boxes quickly without thinking and, therefore, mislead people who use the data. There is a small group of people who do like to fill out surveys and give feedback, but often times they are the vocal, critical, and least representative samples of the user population. Both Amazon and eBay have negative experiences in using traditional collaborative filtering techniques to accomplish ranking by similarity.
  • Implicit Relevance Actions
  • The invention allows application-by-application configuration of user action tracking. Implicit action buttons (discussed below) are embedded as part of the search results to capture critical cues of user intention and preferences as the users do their job. For example, a common portal may give users “view,” “download,” “print,” and “email” buttons as the actions to reflect their intention when discovering relevant content. “View” might be a weaker indication, while the others are strong indications of preference. The invention develops additional implicit observations that predict visitor intentions with strong confidence. These observations include the ability to detect think time, virtual bookmarks, virtual print and virtual email. In all cases, visitors have not performed a bookmark, print, or email against the content, but they keep the content up on the computer screen for a long time, i.e. long enough to use the content as a reference for work. These observations are cross checked among peers and experts before they truly become useful for the community.
  • Explicit Relevance Actions
  • At least two additional explicit buttons can be added to track clear cues of user behavior. “Save to library” indicates a strong, explicit endorsement of content given the query, while “remove” or “demote” indicates strong dislike of the content given a query and a role. The library is virtual and does not physically live on a browser or even on the PC. It is the main user behavior tracking object or a journal. Again, explicit relevance ranks higher than implicit relevance, but both are managed under one per-user library object.
  • Search Spam Control
  • People are familiar with email spam. Current search exhibit similar behaviors to spam, with thousands, if not tens of thousands of results returned in response to a simple query, and many of the results are totally irrelevant to what the users are looking for. Through the use of the “remove” action after considering the role an employee is playing, the inventive system can identify and demote results that are less relevant or irrelevant to a group of users of the same role, in a manner that is analogous to the spam email reporting scheme. For example, if three engineers remove their interest in a document, other similar engineers should not see this document highly ranked, while sales employees may still give this document a high ranking.
  • Expert-Guided, Personalized Search Across Applications
  • Consider the invention in the context of the entire enterprise. Although user actions are done through business application interfaces on their PCs, a behavioral journal called “my library” is stored and maintained in the system server. Neither the PC nor the application involved needs to be concerned about the journal.
  • My Library/Behavioral Journal
  • This is a per user object. It is generally invisible to the users unless power users or applications want to use them directly. The data in the library can be mined or learned before the system goes live. It continues to improve itself, adjust, and adapt to the real business usage of the content and their queries. The library stores user profiles and attributes, all queries, relevant content URIs, one or more indexes for all relevant content and data, query caching, content and data caching, access time, personalized ranking formula, proximity hashing, and a loading ratio control for the privacy policy considered.
  • A desktop version of My Library can be added to provide content caching, content push/update/alert, and disconnected content access.
  • Domain Expertise Networks
  • One aspect of the invention concerns examining multiple personal libraries or behavioral journals. When enterprises start to analyze many people's journals from peer and expert dimensions, great insights on information consumption and employee productivity emerge.
  • Peers are defined herein as a group of users with common interests, such as products, topic sets, events, job roles, locations etc.
  • Experts are defined herein as a group of visitors with a different knowledge and skill sets than the person querying or browsing, but the person querying and browsing depends on the experts to do his job effectively.
  • For example, an engineer has a peer group of other engineers, and also has an expert group made up of product managers, sales people, some customers, HR staff, and office assistants. Peers and Experts can change when context is changed. An engineer, John, may play roles beyond the organization he is in. He could be a cross-functional committee member, and physically work in London office. So John has three contexts that he is part of. This context is referred to herein as a Domain Expertise Network or DEN. An employee may belong to several DENs. Various types of DENs are discussed in greater detail below.
  • Architecture
  • Expertise Index System also referred to as an Expertise and Behavioral Repository: This element is key to the invention. It is a server based system with a service-oriented architecture (SOA) using Web services, XML, J2EE, and other foundational technologies.
  • The following components are key parts of the invention:
  • Behavioral Instrumentation Also referred to as a Work Monitor, this element is responsible for implanting and recording user behavior on various business applications. The application search form is one of many observation posts that the invention implements. Browser and application navigation, file upload, Web plug-in, page-tags, email server and client integration, content management, document management, records management, and collaboration systems are all common places for instrumentation. The invention also goes back in time, and parses common log files, such as Web server logs, query logs, directory files, e.g. LDAP, to build and extract historical or base level expertise.
  • Real-time Behavioral Journals: This element is a per user object described above.
  • Domain Expertise Networks: This element is the work relationship object, described above, that connects personal, peer, and expert associations, and that records repeated role-based enterprise work patterns.
  • Non-Uniformed Network Index (NUNI): This element provides the most relevant, timely and authoritative search results. This is discussed in detail below.
  • Contextual Mapping and Dynamic Navigation: With the help of personal, peer, and expert journals, the NUNI index can not only produce good search results, but also provide additional contextual information that the users are not directly asking via their search queries or keywords. The contextual results can be presented back to the users in a search result sidebar, or as part of personalized, dynamic navigation. Dynamic navigation is discussed in greater detail below.
  • Productivity Reports and Solutions: This element generates various reports based on the behavioral journals and NUNI index.
  • In summary, the Expertise Index System focuses on enterprise Subject Authorities, Work Patterns, Content Freshness, and Group Know-how to deliver expert-guided, personalized information.
  • Technical Overview
  • FIG. 3 is a block schematic diagram showing the system architecture of a preferred embodiment of the invention. A more detailed discussion of various aspects of the architecture is provided below. The architecture consists of a server farm 20, a customer enterprise 22, and a user browser 21. The user browser is instrumented with an extension 23 and accesses both customer servers 25 at the customer enterprise 22 and the server farm 20 via a load balancer 27. Communication with the server farm is currently effected using the HTTPS protocol. User access to the customer server is in accordance with the enterprise protocols. The browser extension 23 is discussed in greater detail below.
  • Of note in connection with the invention is the provision of a failsafe. The extension 23, as well as the enterprise extension 24, are constructed such that, if the server farm 20 does not respond in a successful fashion, the extension is shut down and the enterprise and browser interact in a normal manner. The features of the invention are only provided in the event that the server is active and performing its operations correctly. Therefore, failure of the server does not in any way impair operation of the enterprise for users of the enterprise.
  • As discussed above, an extension 24 is also provided for the enterprise which communicates with the load balancer 27 at the server farm 20 via the HTTPS protocol.
  • The enterprise also includes a helper 26 which communicates with the server farm via an agency 31 using the HTTPS protocol. The agency retrieves log information from the enterprise and provides it to log analyzers 28, which produce a result that is presented to the usage repository 29. Information is exchanged between the affinity engine 32 and the browser and enterprise via various dispatchers 30. The browser itself provides observations to the server and receives displays in response to search queries therefrom. These observation and displays are discussed in greater detail below.
  • A key feature of the invention is the affinity engine 32 which comprises a plurality of processors 33/34 and a configuration administration facility 35. During operation of the invention, a form of information, also referred to as wisdom, is collected in a wisdom database 36. The operation of the affinity engine is discussed in greater detail below.
  • The inventive system is typically installed as an enhancement to an existing search system based on conventional engines provided by vendors, such as Verity, Autonomy, Google, etc. This content and data search system based on conventional technology is referred to as the existing search mechanism.
  • The inventive system is implemented as a wrapper for an existing search mechanism. When a user issues a search query, the query is handled initially by the system. The system, in turn, typically forwards the query to the existing search mechanism. It may also perform one or more searches or related operations against its own internal indexes and databases. Once the results from the various searches have been obtained, they are merged together into a single set of results. The actual presentation of these results is at the discretion of the customer, who may either take the raw results data from the system and present them using a JSP, CGI, or similar mechanism, or else use the default search results page provided with the system, possibly customized using cascading style sheets or other similar techniques.
  • Each document in the results is generally presented along with a variety of possible actions for the user to take on the document. The available actions are site-configurable, and can include, for example, “think”, “view,” “download,” “email,” or “print.” The system is informed when a user selects one of these actions for a particular document. That data are then used to infer the relevance of a particular document with respect to the query that yielded it. Thus, if a user selects the “view” action for a document, the system might infer that the document has certain actual value to the user for that query, while if the user selects a more permanent action such as “print” or “download,” the system might infer that the document is highly relevant to the user. The system can detect virtual print or download to give an accurate approximation as if a physical print, download, or bookmark has happened. The techniques rely on detecting activities of users on the browser for a certain amount of time, e.g. over one minutes, where documents remain open for a long time, i.e. long dwell. On the other hand, if a user does not perform any action at all against the results from a query, the system might infer that the results were irrelevant to the user. This data are retained and used to influence the results of future queries by the user and to generate quality metrics.
  • Libraries
  • The system maintains a library of content reference and/or use for each user. The library is also called the behavioral journal. This library is similar in some sense to bookmarks in a Web browser, though it is not necessarily visible to the user. Indeed, the user may not even be aware of its presence. Depending on how the system has been configured, a document name and its location may be added to a user's library automatically when certain actions for a document are selected from the search results. A document could also be added to a user's library explicitly with an optional “add to library” action from the search results. The presence of a document reference in a user's library generally indicates that the document is of particular interest to the user. Thus, if the results of a query produce a document that also appears in the user's library, its ranking is typically improved.
  • In some configurations, it is possible to add a document to a user's library directly, without first encountering it in search results. Such a document need not be indexed by, or even accessible to, the existing search mechanism. However, because it is present in the user's library, it can still be merged into the final search results if it matches a query, and it is therefore available in the results produced by the system. Content discovered in this manner is typically quite valuable and so is usually given particular preference in the result rankings.
  • Relations
  • People in businesses relate to each other in a number of different ways. For example, there are relationships between peers in a group, between superiors and subordinates, or between subject matter experts and seekers. When these different kinds of relationships are modeled and observed, they reveal insights that can be used to influence and refine search results. For example, if several members of a group of peers all find a particular document to be helpful, then there is good chance that other members of that same group would find the document helpful as well because members of a peer group typically have similar interests. Similarly, if someone is seeking information about a particular subject, then documents that a known expert in that subject found useful would probably be valuable to the seeker as well.
  • The system maintains one or more named relations for each user to represent these kinds of relationships between one user (the subject) and other users (the related users) in the system. A relation is formally the set of users that have a particular relationship with the subject. A relationship can be two-way or one-way. A two-way relationship applies equally in both directions between the subject and the related user. Thus, if user A has a two-way relationship with user B, then user B has the same kind of relationship with user A. An example of this might be a peer relationship, which could describe two users who are in the same organizational department or who have similar job descriptions: if user A is a peer of user B, then it is also the case that user B is a peer of user A. On the other hand, a one-way relationship is directed: if user A has a one-way relationship with user B, it is not necessarily true that user B has that same kind of relationship with user A. An example of this might be a superior-subordinate relationship: if user A is a subordinate of user B, then it is not the case that user B is a subordinate of user A.
  • Because related users are users of the system, they have libraries of their own. Depending on the configuration, the system can search the libraries of some or all related users as part of a query and merge any hits into the results. The degree to which results from a related user's library biases the baseline results can be configured both at the relationship level, e.g. experts have a larger bias than peers, and also at the user level, e.g. some peers may exert more influence than others.
  • In the default configuration, the system maintains two different relations for each user:
  • Peers: This is a two-way relationship intended to represent users with common interests, job roles, locations and other factors. People can belong to multiple peer groups based on different contexts. The system develops the peer groups through learning. Peer group change and adapt according to community and business changes.
  • Experts:, This relationship represents skill sets or knowledge a person possesses. The system detects experts by examining the community and individuals who have the ability to discover and collect the most and useful documents having the most impact. Experts are relative. An expert today may become less so if the person stops to be the connection to the most useful content.
  • Although this typical configuration only provides a single peers relation and a single experts relation for each user, an advanced configuration might supply two or more of each, with each peer and expert pair called a Domain Expertise Network (DEN). Multiple peers relations or DENs allow a user to identify several different peer groups that are each relevant at different times, e.g. a departmental group for day-to-day operations, a special interest group representing a committee membership, etc. Multiple experts groups allow a user to have several different sets of experts focused on different subject areas.
  • Monitoring User Activity
  • FIG. 4 is a flow diagram showing the capture of behavioral relevancy by an embedded application. In FIG. 4, an information seeker is using a business application. In doing a search, such as “sales preso on bonds,” the server performs various data mining activities and produces a result for the information seeker. In the process of doing so, the invention makes observations of implicit relevance actions, such as “view,” “download,” “print,” and “e-mail.” The server also makes observations with regard to explicit relevance actions, such as “save to library,” and actions similar to spam control, such as “remove.” These items are discussed in greater detail below. The observations made by the system are used to determine the value of a particular document to a searcher. The system accumulates information about the value of the document and then develops a usefulness measure for the document, as discussed in greater detail below.
  • FIG. 5 is a screen shot showing an inline user interface according to the invention. Because the tags used in the system are configurable and customizable, the user interface can be made to blend into an existing Web site for a particular enterprise. The example given in FIG. 5 of a public Web site.
  • FIG. 6 is a screen shot showing an inline user interface (UI) rendered using JavaScript tags according to the invention. This particular example shows the “most popular” tag, which gives a list of the most popular documents to the end user. The UI is rendered using JavaScript tags. Other tags, such as “next step,” “similar documents,” and “preferred” are rendered in a similar fashion.
  • FIG. 7 is a screen shot showing a pop-up user interface according to the invention. As with the inline user interface, this interface is rendered using JavaScript tags. This particular example shows a “next step.” This tag fades in and, when closed, out to enhance the user experience. As with the inline tags, the pop-up dialogue is also configurable to blend into any existing Web page's style.
  • Action Monitoring/Observations
  • The system has an active interest in knowing which documents users find helpful or relevant. However, users cannot generally be relied upon to indicate explicitly to the system when a particular document is considered helpful or relevant. Instead, the system has to infer this information from actions users would take anyway on a document, with or without the system.
  • One way to do this is to present to the user one or more convenient buttons or links for typical actions with each document in the search results. Because these actions are available with a single mouse click, as opposed to the multiple clicks that are typically required to perform most actions using normal browser controls, users tend to use them rather than the standard browser controls for performing these actions. Furthermore, because these buttons or links are under the control of the system, the system is able to take note of the actions a user takes with respect to a document. Thus, users are given a convenient mechanism for performing actions they would perform anyway on documents in a set of search results, and the system is able to monitor these actions.
  • In practical terms, intercepting these user actions is straightforward. As is normally the case with HTML, each button or link representing an action has a URL associated with it. Normally, such a URL would refer directly to the associated document. However, with the system these URLs instead refer to a CGI, servlet, or similar mechanism associated with the system. The URL contains information about the user, the document, and the action the user wants to perform. The system logs the action and related information, and then redirects the request to either the original document, in the case of simple “view” type actions, or some other kind of Web resource to complete the requested action.
  • Most content search systems make the title of a document an active link to the document when presenting search results. The system uses this standard convention as well, except that the active link is treated as a “view” action and monitored in the same manner as the other actions described above.
  • An optional “add to library” action is available for documents as well. As the name implies, this action adds the document to the user's library. This is a way for users to inform the system explicitly that a document is particularly useful. A user's primary motivation for using this action is to ensure that the document is considered favorably in future queries because documents in a user's library are generally given improved rankings.
  • When a user elects to take responsibility for presenting search results manually, URLs for the configured actions are provided along with the other usual data for each document in the results. It is the customer's responsibility to ensure that these URLs are used for the various actions users might take on the result documents, or else the value of the system is diminished.
  • General Implicit Observations of Search and Navigation
  • The system uses more generic facility to observe user behaviors against all content during search and navigation. The observations are made implicitly without user participation other than their doing their normal browsing and searching. Observations are consolidated from either search and navigation and then used to improve future search and navigation.
  • Query Monitoring
  • The system also benefits from knowing the original query that yielded a document on which a user takes an action. For example, if the system notices that a user issues the same query later, or if it notices several different users making the same or similar queries, it can increase the ranking of documents in the new query that were found interesting in the original query. However, because query strings can be rather cumbersome, it is not always practical to include them in the action URLs. Instead, the system maintains a database of query strings and issues a unique ID for each. This unique ID can then be included with the action URLs presented in the search results. When a user takes an action on a particular result document, the system can determine the query that produced that particular document by looking up the query ID.
  • Blended Search
  • The system uses blended search to enhance search results. In a blended search, a single query is passed to two or more separate search processors, each of which produces a set of zero or more documents, referred to as the result set, that match that query in some fashion. Depending on the configuration and circumstances, the same document may show up in one or more of the result sets from these various searches. Once all of the search processors have completed the requested query, their result sets are merged together into a single result set. Rankings are assigned to individual documents in the merged result set using a configurable formula that takes into account such factors as the number and/or type of search processors that produced the document and the document's ranking within each of those individual result sets.
  • The two distinct search processors need not be distinct software entities. For example, the same search engine running against two different indexes and/or with different configuration parameters could constitute two distinct search processors. More important is that two distinct search processors should typically yield different results for the same query. One might consider that each search processor offers a different point of view for a query.
  • Each search processor can be assigned a weight that determines the degree to which it influences the rankings in the merged search results. This weight can be either a static constant, or a dynamically computed value that varies according to the query, results, or other circumstances.
  • Search processors can, but do not necessarily, run independently of each other. Some search processors can be configured to take the result set of a different search processor as input and manipulate it in some way to produce its own result set. This kind of search processor is referred to as a filter. Filters are useful for such tasks as narrowing the results from a different search processor, e.g. removing documents that are too large, too old, etc., or modifying them in some way, e.g. computing summaries or titles from document contents, adding annotations from revision logs, manipulating the ranking score, etc. A search processor that does not filter the output of another search processor is referred to as an independent search processor. An ordered sequence of search processors in which the first is an independent search processor and the second and subsequent search processors acts as filters for the search processors preceding them is referred to as a pipeline. The individual search processors that make up a pipeline are also referred to as stages.
  • The result set of a blended search is formed by merging the output result sets of one or more pipelines. As a rule, each pipeline produces a score for each document in its result set that is used for ranking the document's relevance. When the results of two or more independent search processors are blended, these scores are normalized to the same range, then multiplied by a scaling factor. If the same document appears in more than one pipeline's result set, the scores from each result set are added together to form a single score in the blended result. These accumulated scores determine the final rankings of the documents in the blended results, with the highest scores being given the best rankings.
  • As a practical matter, separate pipelines can be run in parallel for efficiency. On a conceptual level, the various stages of a single pipeline are run serially, though in actual practice some parallelism can still be achieved when stages are able to produce portions of their result sets incrementally. The composition of the pipelines that are used in a blended search and the manner in which they are run, e.g. serial vs. parallel, is configured by the administrator and/or manipulated dynamically by the end user.
  • In a typical configuration, the existing search mechanism that is being wrapped by the system is referred to as the baseline processor. Any other search processors are referred to as ancillary processors. A baseline processor is normally built on top of conventional search technologies and is therefore capable of standing alone as an adequate, though sub-optimal, document search mechanism. Amongst other things, this implies it should have access to the majority of public documents in an enterprise, have a query processor capable of handling typical requests from most business users, and that it not act as a filter stage in a pipeline. Ancillary processors, on the other hand, have fewer such requirements: they may have access to only a handful of documents, they may or may not use a conventional search engine to accomplish their goals, and they may in fact participate as a filter stage in a pipeline.
  • Note that the system can in fact be configured with two or more baseline search processors. This is sometimes referred to as federated search, in which the results of otherwise independent search engines are merged. Though this is not necessarily a goal of the system, it is a beneficial special case of its blended search technology.
  • FIG. 8 is a block schematic diagram showing expert-guided personalized search across applications according to the invention. In FIG. 8, the server is shown including information about the user's library, “My Lib.” The user's browser 21 is shown having a “My Lib” view. The source of this view includes searching of a business application, Web searches, and other business application information. This creates a network effect so that other applications can use the server as well. The user's library is a behavioral journal. It can be embedded in other applications and is, therefore, not just a new user interface or application. The contents are created by user search and discovery and are generally invisible to the user. The analytics of the system allow the improvement of quality and provides bridge silos. As discussed herein, there is a form of spam control implicit in operation of the invention. The system provides dynamic personal navigation support. A proximity hash, loading ratio, and privacy C policy are also implemented. The invention operates in the form of a browser and desktop plug-in and includes content update and caching. The information accessed in connection with the invention is pursuant to a domain expertise network, discussed in greater detail elsewhere herein, that consists of individual information, peer information, expert information, and community information.
  • FIG. 9 is a screen shot showing a user library according to the invention;
  • FIG. 10 is a second screen shot showing a user library according to the invention; and FIG. 11 is a third screen shot showing a user library according to the invention.
  • Sample Search Processors
  • The system can be realized with different search processors provided in connection with the affinity engine (FIG. 3), that can be combined in different ways to accomplish different goals. The following discussion describes several of the more common search processors that are available.
  • Lucene Baseline Search
  • This search processor is an independent baseline processor that generates its results by issuing a query to an existing Lucene index (see http://Lucene.Apache.org). The result set that it produces includes a content locator and a relevance score that is a floating-point number in the range of 0.0 to 1.0.
  • Library Search
  • This search processor is an independent ancillary processor that searches a particular user's library for documents that match a specified query. In a typical implementation, a Lucene index is maintained for each user's library, so this search processor is essentially a special case of the Lucene baseline search processor running with a different scaling factor against a different index.
  • My Library (My Lib)
  • This special case of the library search processor runs against the library of the user that has invoked the original query. It normally runs with a relatively large scaling factor. Thus, documents in which the user has previously shown interest and which match the current query tend to receive elevated rankings.
  • Relation Search
  • This search processor is an independent ancillary processor that searches the libraries of the related users in a given relation. It is conceptually similar to invoking the library search processor for each of the related users then merging the results. In practice, this can be optimized in a number of different ways, for example by performing each library search in parallel, or by maintaining a separate merged index for the entire relation.
  • My Peers
  • This search processor is a case of the relation search processor that has been specialized for one of a subject's peer relations. If the user has more than one such relation, the specific relation to be used for a given search can be determined in a number of different ways.
  • For example:
      • It can be set by an explicit action on the part of the user, e.g. the user might indicate that work is currently being done in the context of a particular peer group;
      • It can be set implicitly by the current search context, e.g. the actual search form used to launch the query might select a specific peer group;
      • It can be computed, e.g. by analyzing the query itself.
  • The theory behind this search processor is that a user's peers tend to have similar interests to the user, so if a document was particularly interesting to a peer, i.e. the document is in the peer's library, then it probably is interesting to the user as well. This search processor generally runs with a relatively high scaling factor, thus elevating the rankings of documents that both match the query and reside in a peer's library.
  • Transitive Relation Search
  • Many, but not all, one-way relationships are transitive: if user A has a particular one-way relationship with user B, and user B has a similar one-way relationship with user C, then if the relationship is transitive it can be inferred that user A has this same one-way relationship with user C. If a given relation represents a transitive one-way relationship, then the transitive closure of that relation is the union of the members of the original relation with the members of the same relation for each of those related users. In a full closure, this process is continued recursively for each of the related users and each of their related users, etc. until the full tree of transitive relationships has been computed. In a partial closure, the recursion is limited to a particular depth.
  • The transitive relation search processor is an independent ancillary processor that searches the libraries of all users that belong to a full or partial closure of a specified one-way relation. A single recursion depth can be specified for the entire relation, or a separate recursion depth can be specified for each member of the starting relation. Once the closure itself has been computed, the transitive relation search processor is very similar to the normal relation search processor, conceptually performing a library search on each user in the closure and merging the results. For this reason it can be optimized in the same manner as the regular relation search.
  • My Experts
  • This search processor is a special case of the transitive relation search processor that has been specialized for one of a subject's expert relations. If the user has more than one such relation, the specific relation to be used for a given search can be determined in a number of different ways, as outlined for the My Peers search processor.
  • The theory behind this search processor is that if a user can identify experts in a particular subject, then documents that those experts find interesting, i.e. that appear in the experts' libraries, are presumably of interest to the user as well when conducting a search in that subject. Furthermore, it is assumed that expert relationships are transitive. Thus, if user A considers user B to be an expert on some topic, and user B considers user C to be an expert on the same topic, then user A would consider user C to be an expert on that topic as well, even though user A does not necessarily know user C.
  • As with the My Peers processor, this search processor runs with a high scaling factor thus causing content selected by experts to be given elevated rankings.
  • Freshness
  • One important contrast between content searches in an enterprise compared to more general Web search is the importance of freshness or recency in the results. On the Web, somewhat older data are generally considered more valuable because they have had a chance to be evaluated and vetted by users around the world. In an enterprise, however, the opposite is true: most users have already seen the older data, so when a search is performed newer data are usually more useful.
  • The freshness search processor is a simple ancillary filter processor that captures this difference by increasing the scores of more recent documents and decreasing the scores of older documents. The degree to which a document's score is changed varies according to its age. Thus very recent documents might have their scores increased more than less recent documents, and very old documents might have their scores decreased more than middle-aged documents. The thresholds and ranges for the various types of scaling are all configurable, making it possible, for example, to set up a filter that only penalizes old documents without enhancing new documents, or contrarily, to penalize new documents and enhance old ones.
  • Explicit Bias
  • Some documents are the canonical correct answer to certain queries. For example, in organizations that must pay special attention to regulatory matters, e.g. HIPPA, SOX, etc., a query related to a particular procedure is ideally answered with the most current, official description of that procedure, possibly to the exclusion of all other documents.
  • The explicit bias search processor is an ancillary processor that recognizes certain queries or query keywords and injects a fixed set of documents in the results for those queries, each with a fixed score, usually a very high one. This is generally done without a formal search index. Typically, it is configured with a simple table that maps keywords to documents. It can be configured as either an independent processor or a filter. When it is configured as a filter, it can further be configured to either replace or supplant the input results. When the explicit bias search filter does not find a matching keyword, it leaves the input results unmodified.
  • Popularity
  • Some search topics tend to recur regularly in any given enterprise, typically with a small number of documents in the results towards which everyone gravitates. The system can detect these popular results by noticing when the same query is issued multiple times and then watching which documents are acted upon most frequently in response to these queries.
  • The popularity search processor is an ancillary filter processor that puts this knowledge to use. It detects popular queries and then increases the ranking of documents in the results that have historically been selected by previous users making the same query. In practical terms, it is similar to the explicit bias processor, except that the table of keywords to documents is generated automatically by the system from data obtained by analyzing the query and action logs.
  • Quality Metrics
  • Because the system watches both queries and the actions taken on the results of queries, it can monitor the quality of its results dynamically. This is then used for such purposes as return-on-investment (ROI) reports or feedback on site design.
  • A simple form of feedback on search quality can be found be comparing the query logs to the action logs. If a user query produces no corresponding actions, or perhaps only yields actions on poorly ranked documents, then the system can infer that the query produced poor results. On the other hand, a query that yields several different actions, particularly to highly ranked documents, might be considered good.
  • Another dimension for quality feedback is to compare actions on documents that would have been found by the baseline search processor to those that would have been found only by the system, or perhaps to documents that were found more easily because of the system. The system can accomplish this by taking note of which search processors contributed significantly to a document's relevance score when an action is actually taken on that document. If the only significant contributor to a document's score was the baseline search processor, then the system can infer that it did not add any particular value to that result. On the other hand, if one or more of the search processors contributed significantly to the document's score, then the system can infer that it did add value to the result.
  • By combining these two metrics, the system can dynamically produce interesting and valuable ROI reports. For example, one report might be to compare the ratio of good-to-poor search results for queries that were enhanced by the system to the same ratio for queries that were not enhanced by the system. If a dollar cost is assigned to poor queries, then the difference in the cost of poor search results rendered by the original search system and those rendered by the system can be computed. Another report might concentrate on the amount of time the system saves its users. For example, a document that was found only by a search processor and not the baseline search processor might be assumed to save the user two hours of manual research, while a document that was pushed from a low rank to a high rank by search processors might be assumed to save the user 30 minutes of research. If a cost per hour is assigned to the user's time, then a cost savings for using the system can be computed.
  • Non-Core Technology
  • Because the system is designed to wrap an existing document search mechanism, it necessarily employs a number of technologies that are not intrinsically its own. The following discussion describes these types of non-core technology used by the system. Those skilled in the art will appreciate that the following is only an example of a presently preferred embodiment of the invention, and the other technologies may be chosen to implement the invention.
  • Language
  • The bulk of the system is implemented using version 1.5 of the Java language, and all classes are compiled using the Java compiler supplied by Sun Microsystems in 1.5.04 of their Java Software Development Kit. It is presumed to run correctly in any JVM supporting version 1.5 of the Java language. If customers do not provide a JVM of their own, version 1.5.04 of the Sun JVM are used by default.
  • Application Server
  • Much of the core functionality of the system is implemented using Java servlets and Java Server Pages (JSP). The current implementation is written to version 2.4 of the Java Servlet specification and version 2.0 of the JSP specification. It should, in principle, run in any application server supporting those specifications. If customers do not provide an application server of their own, version 5.0.28 of the Apache Tomcat application server is used by default.
  • Search Engine
  • The system uses version 1.4.1 of the Lucene search engine to manage user libraries. The current implementation includes support for Lucene version 1.4.1. If customers do not provide a baseline search engine of their own, a basic implementation using Lucene version 1.4.1 is provided.
  • Web Server
  • Any conventional Web server can be used with the system to serve regular content. The reference implementation of the system uses Apache 2.0.52.
  • Tool for Virtual Configuration of Web Applications
  • A tool that configures a target web application with new capabilities, such that the new capabilities can be demonstrated live within the Web application, though the Web application has not been modified in any way.
  • Method for Virtual Proof of Concept
  • An automated process is provided that enables the evaluation of a set of software capabilities within existing Web applications by guiding the evaluator through a series of steps and automatically provisioning the necessary infrastructure to support the evaluation. The process is virtual in that it requires no changes to the target Web application and no installation of software.
  • One aspect of the invention concerns a virtual Web sales tool. In this embodiment, the invention comprises a virtual environment that is implemented using proxy technology. The system is used by a prospective customer to access a system Web site. This allows the prospective customer to see the “before they know” and “after they know” impact of the system against the prospective customer's live application. A virtual environment is created that mimics the prospective customer's live application, without copying the live application's content. The system nonetheless performs interception and augmentation in this proxy environment without physically possessing any content or interfering with the structure of the live application. Thus, when a prospective customer comes into this virtual environment, they feel as though they are actually in their live application. One benefit of this embodiment is that the invention may be used to do instrumentation without having to go physically into a customer's application environment, get logging, or the customer's IT department involved. Thus, the customer does not really know there is a change, but can see the impact.
  • In this embodiment, it is possible to go through a process that requires no installation of software in the traditional sense and that allows a customer coming to the Website to have the same kind of experience that they would traditionally have with a traditional software provider, except the invention allows one to do it all online. This approach is virtual in the sense that there is nothing the has to do except interact with a Web browser to take advantage of the service. This embodiment provides a virtual proof of concept (POC) that automates the sales process for the system. The intent is to capture interest through the Website where a visitor comes in interested in a product. The user accesses the system with a click-through to the POC. Then, the system automates the process of going through the POC. Once they have conducted the POC, the service is turned on, and they are now a paying customer.
  • To capture the user's interest, through the Website, the users are allowed to “try it,” for example. They enter their email address. The system validates the email address with a first-level screening. Then, the system sends an email after they try it, and maybe a link to where they can see screens about how the system works.
  • In phase two, the system generates a demonstration room for them. This is based on some of the information they gave the system in the first step, and in addition, the system now requires them to upload some information, a log file, for example, to provided the system with some historical information about how their Website has been used. The system then takes the log file, automatically generate a set of reports that explain to them what the expected increase in value the system can provide. The system then goes through an automated process and creates a “before and after” picture of what their site looks like before the system and then after the system.
  • There is a certain amount of backend provisioning to do to make that work. Once they actually commit, in phase two, the system explains to them that they need to upload their log files. The system can then provide them with a report that they can print out and use that to build internal momentum around the system. The system then allows them to use the system in a POC fashion for some period of time, and then converts them to a real customer. The period of time could be 30 days, it could be 90 days, for example. Thus, this aspect of the invention takes a prospect through a methodical process online that requires very little human intervention to allow them to experience the value of the system, without having to interact directly with a salesperson to feel any pressure, and without having to send a salesperson to their site.
  • Method of Alqorithm of Usefulness
  • This aspect of the invention concerns method that derives a score which describes the usefulness of an electronic asset. In contrast to well-known relevancy algorithms, which help to find documents relevant to a query context, the computation of usefulness measures the actual usefulness of any electronic asset based on user behaviors with respect to the assets. Given a topic, there might be hundreds or thousands of relevant documents but only a few that are useful. Usefulness measures how useful a document is for a given user, while relevancy measures keywords that match with the content. Usefulness scores are computed for any electronic asset and for arbitrary user population sizes, ranging from millions to a single user.
  • Thus in contrast to traditional search technology, which is focused on relevancy detection, the invention detects usefulness. With regard to relevancy, for example, if one is learning Java programming, there are hundreds of relevant Java books that can be used to learn Java. Are they all useful? No. If one wants to really learn Java, one should ask a Java guru what books to read, and they probably will recommend two or three books, instead of hundreds of Java books. Thus, relevant books comprise all these hundreds of books, while useful books are the two or three very useful ones. This usefulness is based on the knowledge of experts, community, and peers.
  • Expert, peer, and community knowledge is automatically extracted and assembled by the present invention based on observed behaviors of the user population. As user behaviors change over time, the system adapts its representation of expert, community, and peer knowledge. User behaviors can be recorded in real time (through various means of observation described elsewhere in this application) or extracted from existing log files of user behavior. On an ongoing basis, the system can continue to improve performance, based on ongoing real-time observations and getting continuing updates of the log files.
  • The updates amount to the differences in the log file, for example on a month-by-month basis. That is in addition to the information the system captures based on observations. There are certain things that are in the Web logs that the observations do not track and there are certain things that can be observed in real time that existing web logs do not track. There is more activity than is needed in runtime or user time, but it is interesting to look at these after the fact and draw generalizations from the broad sets of data that are captured in these log files.
  • Method for the Self-Learning and Adapting of Systems
  • This aspect of the invention concerns a method that enables the system to identify changes in the behavior of its user population, with respect to both electronic assets and members of the user population themselves, and automatically adapts its operation to self-correct for changes. Self-correction enables systems to identify and adapt to changes proactively before they are obvious, while minimizing the need for administrative intervention to keep systems maintained.
  • Thus, this aspect of the invention concerns an attribute of the system, i.e. inherited nature of system, because it observes peoples' behavior. As peoples' behaviors change, their preferences change, and their useful content changes. The system automatically adapts to that change. Thus, the system is, by default, a self-learning system that can correct itself because, when people start to correct themselves, the system follows them.
  • The Inventive Technology is Content Type Independent or Content Agnostic
  • The system works against any content/information types, such as audio/video files, data types, such as RDBMS data, and application nodes, such as a “buy” button. Thus, the preferred embodiment of the invention comprises an independent and content agnostic system because the system does not look at the content itself. This is unlike traditional search technology, which parses content, picks up key words in the content, and uses those key words to select results. The invention, in contrast, is not concerned with what is in the content, but about the location of an asset and how people interact with that asset. The invention does not care what that piece of content is. It could be a text file in a simple case, but it can also be a video file, which has no text to parse and no index to be built in the sense of traditional technology.
  • The Inventive Technology Seeds the System from Web Server Logs, Search Engine Logs, Web Analytics Server Logs, and Other Log Files so that it can Generate Value From Day One of the Operation.
  • Supervised guidance can be accomplished through administrators by assigning experts and peers based on their roles, reputations, and expertise etc., although it is not a necessary step Such information can also be inferred and extracted from historical log files. Because the system is a learning system, it can derive more value over time as people use the system. This aspect of the invention concerns seeding technology that makes the system useful from day one. It may not be 100% useful, as it would be down the road, but it would give at least 50% to 80% of the value. In this embodiment, the Web server log, which is actually a recorded history of what has happened in an enterprise, is used. It does not have the fine-grained information that is ultimately needed, but it has coarse-grained information. The log file provides historical information. The preferred embodiment uses weeks to months worth of a log file depending on the site's traffic patterns. Thus, the invention provides a way to take something a user already has, i.e. the log file, and turn it into a resource that is used to seed the system. Then, over time, the system learns more because the invention is making observations by means of the extensions to the browser or the scripts that are running, as discussed herein. The system takes advantage of not only basic logs, but also the analysis that is generated from those logs by higher order analytics which are available commercially from various companies known to those skilled in the art.
  • The Invention Federates Across Multiple Applications, Websites, and Repositories Without Crawling or Indexing the Applications.
  • The federation is accomplished through users' actual usage of those applications. Federation is an attribute and a natural fallout of the core technology herein. The traditional approach to searching is to have multiple indexes, each of which is linked to a different repository or different application. A search is performed against each repository with separate indexes for each repository that are not cross-searchable.
  • In the inventive system, a federated search is automatically provided because when people use an asset in a context of an application, they do not care where they use it. They can use one particular piece of content in one sort of a silo, and the next minute can move into a different silo, e.g. start with a CRM system and then move into an ERP system. In this way, the user created a trail, i.e. a virtual link of the various systems. When this query is searched again, the inventive system can recommend information from the multiple different data sources, such that that federation is automatic because the user is creating the federation. That is, the user's pattern of usage of information from and across various data sources creates the federation.
  • The invention herein does not require crawling of Web sites or applications, or indexing of the applications or the contents thereof. Further, the invention respects any security that is already in place. A significant challenge in building federated search systems is that federated search systems must understand and work with the underlying security of these applications. It is difficult to do this because each application generally has its own security model. Generally, security models are not shared across different applications. The federation of search while protecting security is a huge challenge. The invention is unique in the sense that it does this naturally, without any specific adapters, and it guarantees that it can preserve perfectly the underlying security mechanism for that application. This is done in a very unique way. The system goes through the browser instead of implementing proprietary modules to preserve security.
  • Traditionally, to solve the federation problem, there would be some sort of search application that ties into each of the applications, and that comprises a specialized security model, conceptually. The problem is the search engine is actually building up an index of all of the content. When a search is performed, one cannot simply bring back a list of search results and then prevent somebody else from clicking on the list if they do not have access to it. So, in effect, the search engine replicates multiple security models in one index. The inventors have recognized that there is no need to do this because the system has a browser where a user queries through the system. The system then accesses its database of content and, in return, provides a list of results. The system does not filter out all the content at this time but, instead, filters as the results are returned, The system provides technology inside the browser that checks each of these repositories in real time if this user in this session can access this content. If the answer is no, the user is prevented from reviewing the content. It is kept off the list. The user does not even know it came up. The primary driver for whether the browser has access is the person who is logged into the browser at the time, based on the person's privileges in the system, which determine whether the person can see the results. If the person can not see some of the results, the system does not show these results. Thus, the system is, in real time, asking the application if a particular user, e.g. the user currently logged in, can access the content right now. If this is true, let the user see the document. The system, in real time, every time, asks the application if the user has sufficient privileges. Further, it does not matter what mechanism is used because the person can have different access rights depending on whether they are group-based, identity-based, or profile-based.
  • Personalization, Driven Completely by Usages of Individuals, Peer Groups and Expert Groups at Non-Predefined Levels Depending on Contexts Such as Query Terms, Navigation Patterns.
  • This aspect of the invention accomplishes personalized search by knowing who a user is thus, when the user exhibits certain behaviors when using the system, the user is self-identifying, e.g. through cookies, logins, etc. Even if the user is an anonymous user, the system places a cookie in the user's browser. Thus, when the user is using the system he leaves a personal trail, and the system then personalizes information based on who the user is. In the system, no one predefines relations based on personalization because the system is based on the user's behavior. The user's affinity with other people creates a space, referred to as a club. Thus, a user can form his own clubs implicitly by exhibiting interest in one area. No one actually is monitoring the user. The clubs are established all through the user's behavior.
  • Controlled Deployment of the Invention for Risk Management and Acceptance Tests. The system reduces the product deployment risk by controlling the number of people who can see the product features in the live running system. A special cookie is set and sent to a controlled testing population. With that cookie, the users of the site can see the invention's features while the general users have no visibility of these features. This is a desired way to deploy new technology in enterprises.
  • Augmented Search. This feature of the invention blends traditional full-text search with preference and activeness information from global, peer, and expert population to give users precise answers. This aspect of the invention states how the index is used. Using community insight, the invention can augment a search for a better result. In the augmented search, a search request is made to the customer's Web server and a result is obtained. Then, a request for search along with the Web server results, generating query, and user id is sent to search server. A response comes back. Then, the system sends augmented results back in search server format and the client renders the HTML.
  • Top N is a list of most useful information based on the context and driven by the usage of the community. The context may be set by a user query, explicit specification of topic, or presence on a particular web page or group of pages, for example. The invention also creates an insight called top ten, e.g. the top ten most important, related, useful pieces of information, given a topic or given a context. The user can see information based on context-driven usage of the information by the community. Top ten is a popularity result. Give the user the ten most popular links that have to do with a query term (context), or maybe no term. If there is no term, then the top ten most popular pages are returned. For all of these views, one can apply a filter, e.g. only look at the top ten that fall within the category of technology, or only look at the top ten PDF files.
  • More-Like-This expands the content in similarity based on content use and community profile. If the user likes a piece of content, the system observes that and there are other pieces of content that can be shown to the user based on the communities' interest. If the user likes a piece of content, the system observes that and there are other pieces of content that can be shown to the user based on the communities' interest.
  • More like this is a concept that applies when a user is reading this page now, but wants to find another page that is very similar. More like this is based on what the community says is more like this, meaning based on the usage pattern. Thus, the community sees that this page is similar to the page a user is reading
  • Predictive Navigation Provides Short-Cuts for Browsing Navigation Based on where Similar Users Start and End on an Application.
  • In this embodiment, if people with the user's profile come to a particular node in the application, then the user is highly likely to go to another, related place. This aspect of the invention predicts navigation and shortcuts traditional navigation based on previous navigation by peers and experts, including where they started and where they ended. Thus, the starting point and end point are critical to predict the user's navigation, to try to shortcut the middle part of a series of navigational steps, and send the user straight to the destination without wasting time in a lot of other places.
  • Predictive navigation is also referred to as “Next Step,” and depends on which calculations or results one wants to display. Predictive navigation uses the navigation trail. There is a notion of a navigation trail; the system tracks the last N pages a user has been to. The system keeps a history of the query that was used to start this trail, if applicable. Thus, the user searches and a result comes up. The user may click on the result. The user may click on the result. The user may click again to go somewhere else. The user keeps clicking. The system accumulates this history of pages. It also notes that this entire history was due to a query. The system tries to, based on the user's history and other observations in the past, figure out where the user is going. The recommendations that come back are pages ahead of the user that other people have found useful, based on the history, i.e. the trail that the user has accumulated in this session. The system thus tries to match where the user has been and figure out where he is going.
  • This aspect of the invention states how the index is used. Using community insight, the invention can augment a search for a better result.
  • Zip through concerns the idea of having content preloaded on the system. As the user is going down the results, the system shows the user a preview of what that link is. Thus, instead of having to go to the page and changing the page that the user is viewing, the user just zips through. If he sees the content he wants that is the one he clicks into.
  • Dynamical and Adaptive Identification of Peers and Experts in the User Community.
  • The peers and experts are not hard drawn circles but a web with hub concentration and connections among various hubs. Information and knowledge is aggregated together and a fusion effect of wisdom is created naturally and automatically.
  • The invention essentially uses the same information to identify the user community. Who are the peers? Who are the experts? Not only does the invention identify what content are user would like to see, but also it can identify the actual people who are the user's peers. The grouping of peers is naturally formed. There are no hard boundaries on them. There are affinities in these people.
  • Implicit Voting that is Unbiased and Critical.
  • This aspect of the invention provides far more accurate prediction of content usefulness than traditional content voting or survey. Document rating driven by usage reflects the real value of content. For a system such as the herein disclosed system to work reliably with high confidence, implicit observation is very important. If you ask people to vote on content, you tend to get biased results. You also get a sample that is highly skewed because most people do not have time to vote, to survey, to do anything that is explicit. People who do vote tend to have extreme opinions. They have a lot of time on their hands. They are outspoken and opinionated. Thus, they tend to misrepresent the entire population. The sample does not match the population. They also tend to vote negative more than positive. Thus, the invention preferably does not use explicit voting. It takes account of implicit actions. The user requesting a print is implicit because he's doing something else at the time he's making the request. The user is not giving feedback and not being asked for feedback. The invention exploits passive observation, not active feedback. Although, some embodiments could include active feedback, such as negative feedback.
  • Method for the Computation of Wisdom (Relative Value of Assets to a User Community Over Time)
  • This embodiment concerns method that observes information about electronic assets, the behavior of the user population with respect to electronic assets, and the changes in assets and behavior over time to discern the relative value of the assets to the user population over time. The method identifies a spectrum of values ranging from short-lived Information, to mid-range knowledge, to long-lived wisdom. Ultimately, the system provides an on-going, content agnostic, and adaptive institutional memory for the enterprises.
  • Computational wisdom means that the wisdom is a form of community behavior that, with regard to a set of assets, does not change over time. There are four items stated above in terms of how frequently people change opinions. Content, for example, is the least reliable thing to trust because content can change. Information is at a second level of trust. If information stays stable in view of people's opinions, that set of information becomes knowledge. If knowledge can go through time and continue to be used and supported, then that becomes wisdom. So, wisdom cannot change year to year. Knowledge may change from month to month, and information may change from day-to-day. Content by itself does not mean anything. Thus, if content does not become less useful over time, but it becomes constant as time goes by and usefulness remains constant, then it passes from content to the stage of information, then to the stage of knowledge. And over a certain period of time if the usefulness remains high or continues to increase, it becomes wisdom. Whereas, if there is fluctuation in the usefulness over time, then the change shows that maybe it is not really wisdom, but just current information that is interesting
  • The Invention Provides Content Gap Analysis Through the Use of Content, Data and Application Instead of Relying on Content Owners' Speculation on What's Missing, Hot, Good, and Bad.
  • One does not know what content is missing, or what people are looking for. Also, it is not known what kind of content that can be produced comprises that which people need to consume. Because it is known from the system what the trends are, what people are asking for, the question of whether they are being satisfied with some content can be answered. That is, it is known where the gaps are. A lot of people are requesting this thing and are not finding anything useful. There is a gap.
  • The gap analysis report provides the ability to detect gaps in the content. The assumption is the content is there, they just can not find it. Frequently there are gaps. For such cases, the system ascertains what people are actually looking for and what is missing. Someone in a traditional search or navigation situation might search for something, and then either they fail, or if they do not resign to failure, they may search again, or they might potentially try to navigate. Through either of these mechanisms they might have success or failure. The system addresses this problem when someone starts to exhibit search or navigation behavior. It is known precisely what they are looking for, and the content that over time starts to get surfaced in search and navigation is the content that the community, itself, has deemed useful, without regard to a developer or merchandiser, or what merchandisers thought about the content. Thus, somebody is looking for something, they think it is going to be useful, but they are not finding it. This aspect of the invention allows one to quantify how dire the need is for this information.
  • Applying the Information Gaps at a Division, Company or Industry Level.
  • The system provides the information flow over time, and helps company to manage information logistics. This aspect of the invention uses an applied information gap as a division flow over time to identify how information flows. The system allows one to understand what people are requesting and what content is available that can meet those needs. In time, it is possible to see the flow of information going in and out from division to division, from location to location. It is possible to see which location presents what information, or what group, or for what number of people.
  • The Ability to Identify Experts and Peers to Enable Companies to Locate their Expertise at a Global Scale.
  • A dashboard for company or industry efficiency against information consumption can be measured, and white-collar workers productivity can be derived for the first time. This aspect of the invention is related to the ability to identify experts and peer-enabled companies with the expertise at a global scale, which allows the system to provide a dashboard for finding out who knows what and what can be done where.
  • Method for Building Automatic Marketing List Through Sponsored Advertising
  • This method identifies firms that are purchasing keywords and banner ad space on public websites for advertising purposes. The invention looks at both common and uncommon keywords, as well the context of given banner ads, and automatically generates a list of firms who are prospects for improved lead generation through their websites. The invention uses information found in the online ad itself and combines it with other public information sources to create a refined list of firms. The system then back-traces to the buyers of the ads, and automatically includes that information in the candidate prospects list.
  • Method for Improving Sponsored Advertising Conversion Rates
  • This aspect of the invention helps firms who wish to retain customers or increase lead generation. This is accomplished by increasing the conversion rate of sponsored ads e.g. Google and Yahoo ads.
  • Based on context and where a user came from, e.g. from Google with a given search term, the system can guide this user to the most useful information on the given website or collection of related websites. Without this capability, users who arrive at website no longer have the benefit of these public search engines directing them to the most relevant information.
  • The invention routes these users to the most useful information by observing where the community found the most useful information, given the initial query context from the public search engine. There are two steps in this process: 1) first, the system captures the collective wisdom on where useful information lies given the query context; 2) secondly, as users come in from Google or Yahoo, the system leverages this collective wisdom to guide people to exactly the right content.
  • Engineering Features
  • Usage-Driven Link Analysis
  • If a link is clicked by a user in a particular site, the text within that link is then captured to augment future searches and navigation within that site. The text is noted only if the navigation trail leads to a successful document discovery, i.e. the user finds the document useful implicitly.
  • This aspect of the invention, usage-driven link analysis, concerns anchor text. This is very different than Google because when Google is page-ranking they parse every single page, and how many links, and other things. The invention parses links that are used by people. A link is a dead link unless it is used. So, if someone clicked on a link, then this link is useful by this person. Furthermore, used links are cross-examined by many uses of peers and experts, in addition to that of the individual users. The peer and expert implicit validation and endorsement reduces noise from individual behaviors and strengthens the signal-noise-to-noise ratio.
  • The Successful Use of a Link is Determined by Capturing and Analyzing Individual User Behaviors, Peer Group Behaviors and Expert Group Behaviors, with Respect to that Link.
  • Usage of a link and the importance of the text is determined by a blended vector of user, peer, expert, query context, and time. This aspect of the invention, successful use of link, determines how an individual user behaves. In addition to looking at a link itself, where if a user clicks on it, it is useful, the system also does additional analysis on how many other peers similar to the user click on these links. For example how many other experts are different than the user, but the user depends on them to do his job, and who also clinked on the link. Thus, there is a two-level value: individual use of content, and that of the peer-group and expert group. These dimensions give the total value of the data.
  • Implicit Building of Context Terms
  • It has been historically challenging to create metadata for content. The inventive system deploys a unique way of letting the community implicitly create context terms that describe what the content is about. The context terms are learned through watching users conducting queries and finding useful results via the queries. It also includes navigation trails with the link text associated to the each use. The system builds its vocabulary by watching how visitors uses various terms to describe content and how a site uses links to describe content. Again, more used queries and links are more important and associated with content, while a link text that yields no use of a content in the downstream trails has no association to the content.
  • Capturing Such Information is Done Via One of Three Methods
  • JavaScript Tags. In this method, the page is instrumented with a piece of JavaScript. This JavaScript detects link usage and text and sends this information onward to a server.
  • Browser Add-on. In this method, the browser is instrumented with a piece of software. This software detects link usage and text and sends this information to a server.
  • Log Analyzer. In this method, the access logs for a web site are analyzed via a special program—the Log Analyzer—which detects usage of links and sends this information to a server.
  • All the information captured above is referred to as observations. The analysis of observations captured above takes place in the server.
  • Client
  • The system client comprises three general areas: the UI, the observer, and the proxy.
  • The client comprises a Web browser that entails the client UI (see FIGS. 5-7). The client includes a JavaScript observer to make observations on usage of the Web page at the client. One embodiment of the invention comprises a sidebar UI that shows the recommendations from the system engine. This aspect of the invention is embodied as JavaScript tags that generate the JavaScript necessary to display the UI. In this embodiment, enterprise Web content is displayed on the page and along the side there are system generated recommendations. A variant of the UI provides a popup, where the user clicks on something on a page, e.g. an icon, that calls into system code to display a popup in context.
  • The UI also comprises an API for fetching results from the system server. This is in lieu of gaining results directly from the enterprise installed search server. On the typical enterprise site the user types in the search term, clicks on search, and the search hits their Web servers. The Web servers then go back to their search server. The search server returns the result in some format, usually in XML, and then they have a presentation code on the front end to parse through the XML and present the search results. The invention operates in a similar fashion, except when their Web server goes back to the search server, instead of going back directly to the search server, it goes back to a server side extension. The extension then fetches the initial results from their search server, feeds that back to the system, and the system either reorders the results or, in any case, enhances the results, possibly adding more entries. This is provided back to the extension, and the extension reports back to their Web server. Their Web server continues on as it did before, parsing through the XML, reformatting, and sending it back to the client.
  • The JavaScript observer is a piece of JavaScript code packaged as a tag that is given to the user, and the user instruments their page using this tag. The JavaScript tag resides on the client page and makes observations. For example, a scroll or a dwell observation. If the end-user, for example, is reading a page, he would conform to what is defined as a “dwell.” Once a dwell occurs, i.e. once the JavaScript observer has observed a dwell, it then sends back that information to the server. The server accumulates these observations.
  • Each of these observations correspond to a particular calculation on the back end, e.g. at the affinity engine. The augmented search concerns the notion of reusing the UI that the user has, instead of standing in between the presentation layer and the search server and augmenting the search from there. Predictive navigation, top ten, more like this, context expansion, and zip through are all types of tags that the user can put into a page, and they all use a different algorithm in creating suggestions.
  • The recommendations that come back are pages ahead of the user that other people have found useful, based on the history, i.e. the trail that the user has accumulated in this session. The system is thus trying to match up where the user has been and trying to figure out where he is going.
  • Observations
  • There are eight directional observations coming from the browser extension, which is a script that is watching the user's sessions and collecting information.
  • The proxy demonstrates the system UI on a user's pages where the system does not have access to the source code of those pages. These users are prospective customers who do not want to give out access to their pages. The system wants real-time pages, but with our tags injected into the page to show the user what the page would look like with the system enabled. To do this, the system uses a proxy that goes and fetches the page from the URL and then, based on a configuration of rules, alters the HTML of that page, and then sends that page back to the browser. The proxy sits between the browser and the target Web server, i.e. the potential customer's Web server. The proxy itself has its own URL, and it just passes in the target URL as part of that URL. A URL for purposes of this embodiment of the invention consists of two URLs. The URL used actually points to the proxy, but embedded in the URL itself is the target, i.e. the customer URL that you want the proxy to go to. The URL goes to the system first and reconstructs the URL, bringing you to the customer page. Then, the proxy makes an HTTP connection on your behalf to fetch the page for the customer site. It looks at the page and applies a set of rules to it, and then sends the page back to the user.
  • With regard to the client browser, the page is first instrumented with tags. The presently preferred format of the tag is in JavaScript (JS). The customer incorporates a JS file on their Web server. Then they refer to it in HTML with a script tag. Once the JS file is loaded, the file sets up an object at the system. Then, a place is set up in the page where the UI is displayed. On the system side, the system sends back HTML to the UI. The administrator on the customer side specifies a style sheet on the tag. Even though it is the same HTML, because the style sheet is different, the user gets a different color scheme and font, for example.
  • A plug-in may be provided that serves a similar purpose as the proxy, in that it also modifies the HTML and comes back with a search result. However, unlike with the proxy, the user configures the plug-in. When a search of the URL is performed on the user's internal site, the plug-in takes the search request, performs a search for the results, sends them back to the system, which then augments the results and sends them back to the plug-in. The plug-in does a swap of the HTML that is displayed. Thus, instead of displaying the HTML of that URL, the plug-in displays a modified page from the system. The plug-in also makes observations. Because it has more access to the browser functions than JavaScript does, it has a better ability to capture a wider range of observations.
  • The variant UIs work the same way. For example, Top Ten and predictive navigation work the same way as discussed above for the UI. The only difference is in the request. For example, when the user asks the system for augmentation, the system is asked for a specific calculation.
  • The JavaScript observer sits and waits and observes the user on the page. An observation is made of a user action and the observation is sent to the system, including all the information about the observation, e.g. what page it is, if there is user information.
  • Dwell is observed when the user has spent N number of seconds or minutes on a particular page.
  • Range can be selected as a matter of choice, but is typically around 30 seconds to five minutes, depending on the complexity of the document that being inspected. An excessive amount of time means the user walked away from the computer The system preferably does not capture an upper threshold because the user may be reading the document.
  • There is a virtual bookmark/virtual print feature where the user reads the document, and finds it useful, but can not remember everything the document has said, so he leaves the window up somewhere behind his other windows. The user then goes and does other tasks, and when he needs to refer to the document, he pops it back up; the user can tab document. Thus, even though the user did not bookmark the document, meaning that the information is probably useful for the next few hours or day or so, and the user does not need to bookmark it, it is useful for the user to keep the document up the window. In such case, the user has not explicitly bookmarked a document, but he has left it open for a reason. If one looks at a typical computer at any given time, the things that are open on the computer are not open because the user was done with them and just did not close them. They tend to be open because they are a virtual bookmark. Thus, things that are left open for a very long time, e.g. two minutes, five minutes, ten minutes, are considered to be virtual bookmarks.
  • A scroll concerns a scrolling of the screen.
  • The anchor text is the hyperlinked text that got the user to the present page. It could be as simple as the user clicking on a news items that then brings up some recent news about the subject.
  • Think is a usage pattern, i.e. a combination of a dwell and a scroll, or some action, mouse movements, etc., that indicates that the user is thinking about the page. Thus, think is a period of inactivity followed by some action. Mail is when a user mails or forwards the content to another user, or virtually emails it in a similar fashion of virtual bookmark with the intent to mail.
  • Affinity Engine and Wisdom of the Community
  • FIG. 12 is a flow diagram showing document recommendations according to the invention. At the beginning of a search 110 various term vectors T1, T2 exist, as well as a peer/expert population. A term vector is available and is compared with the term vectors of every other document such that the top N matches are selected. The most popular terms for the N most documents are found and these are added into the term vector as well. At this point, activeness information may be obtained for every document and the new term vector can be compared to the term vector of every other document. The two can then be combined 118 and the top N can then be selected 119. The concepts of term vectors and document searching are discussed in greater detail below.
  • The invention uses a similar strategy to the way in which a search is performed inside of a document. The known approach represents everything in a vector space model. This approach takes a document and makes a vector, which comprises all of the document terms in term space. This is done for every document. The search is represented as another vector in term space.
  • In this regard, the invention uses a vector space model, but the way it builds the vector to represent a document has nothing to do with what is in the document. It has to do with what other people have searched on and have hit this document. For example, an individual performs a search on the term “chip design” and perhaps other words. He might have ended up finding a document. It might have been low down on the list of returned results, but he might have ended up finding it. As soon as he does, and he finds it, the invention then associates whatever he searched on with that document, and that contributes to the vector. There are other ways to fill out the term vector, e.g., through a users navigation behaviors (described later), or through explicit input by the user. Thus, the invention builds representations of the document based on how other people are using it.
  • Instead of having single-term vectors for a document, which is what happens in the search space, the invention gives every individual user their own term vector for a document. Accordingly, every user gets to say what their opinion is on what a particular document is about. Some people may have no opinion on certain documents. However, if somebody has ever performed a search and used the document, for example, their opinion gets registered in their vector for that document. Knowing this, the invention allows various functions to be performed, such as “I want to match all the documents, but I don't want to look at everybody's opinion;” or “I want to look at just my peers' opinions or the experts' opinions.” In this case, the invention takes several of the term vectors from different people, sums them together, gets a view of what that population thinks that document is about, and uses that result to connect people to the right documents.
  • Thus, this aspect of the invention provides a search enhancement that relies upon the novel technique of a usage-based topic detection. The term vector provides a vector space model to represent usage-determined topics.
  • Activeness. In addition to the term vector, the invention also comprises a vector that looks at what documents each user uses. Every user has one of these, called the activeness vector. Every time they use a particular document, the invention notes that they have used it. Every bucket in the activeness vector is a particular document or asset, and the bucket keeps an accumulation of usage. Some buckets may have zero. Some might have a huge number. Different actions on an asset by a user, e.g., reading or printing, contribute differently to the activeness vector. An activeness vector for a population can be generated based on the activeness vectors of each of the users in that population. For example, for a particular population, e.g. seven users, the usage vectors are summed to determine how much a particular document is used in that population. The invention combines these two pieces of information, i.e. the term vector and the activeness vector, to help recommend documents. For example, there might be a document that matches perfectly in terms of topic but is not used very much. Even though both vectors concern usage-based information, one vector concerns an amount of usage and the other vector concerns the context of usage. The invention brings these two numbers together to suggest a document. The invention can also incorporate the results from an existing search engine (which amounts to topic match based on the contents of an asset) with the term and activeness vectors if we want.
  • Each individual has his own library (collection of used assets) and for each document in an individual's library, the system has the user's term vector, which represents what they think the document is about. The system also has their activeness vector which indicates how much they have used that document in any context. It is now possible to bring together any given group of users and ask for their collective opinion on a document. There are also the global usage vectors, which are the sum of everybody's vectors. There are also special usage vectors, for the group of anonymous users. Everybody who is unknown all contribute to the same usage vectors. When combining vectors to create a collective view, there can be a different weight for different people's vectors, but the sum of everybody is the global vector. The invention also comprises users' peers and experts.
  • Peers. The way that the invention determines peers is similar to the way that peers are determined in collaborative filter applications, such as Amazon.com, but in this case the determination is based on document usage. The invention looks at two user's document usage (activeness) vectors, e.g. one person has used this document, this document, and that document; the other person has used the same three documents, therefore they're similar. That is, there is a document usage similarity that, in this case, is established when the invention compares the two user's activeness vectors. In the preferred embodiment, two users that overlap on a significant set of used assets are considered to be peers regardless of other assets used only by one user or the other. That is, the fact that one user uses a subset of what the other uses means they share a common interest or role. In this regard, the invention can also look at the actual terms that people use: do they search on similar terms? Similar documents? Similar search terms, or a blend? Thus, the invention considers term usage. Another consideration is the user's area of expertise. Consider, for the moment, that two people have expertise vectors, and that their expertise vector is a term vector as well. It is a term vector that, instead of representing a document, represents a person and what they know. It could be from a profile or it could be automatically detected based on what they use.
  • Expertise. What a given user knows the most about is his/her expertise. The system can automatically determine a user's expertise based on what assets they use and what topics they tend to spend time on. Expertise is validated. The invention looks at a person's collection. We ask the global population what they think those documents are about, not what the user said they were about when he searched for them, which is the user's term usage, but what the global population says these documents are about. When looking at the combination of global term vector associated with the assets in a given user's collection, a picture emerges as to what that user's expertise (which can also be represented by a term vector). A user can not self claim what he expertise is. The population of other users ultimately determines the expertise vector. To identify an expert, the system looks at the expertise vector. For example, if a user is searching on chip design, the system looks at every user's expertise vector, and finds out who is most expert on chip design. The system selects, e.g. the top 30 experts. The system then finds those 30 experts' term vectors and activeness vectors for every document, sums them together, and then performs a comparison.
  • An alternative and complimentary approach to the determination of user expertise first identifies those documents in the collection that have high impact factor for the given topic of interest. Asset Impact Factor is a measure of how useful an asset is to a particular population, such as the global population, for the given topic. Once the impact factor is computed for assets in the entire collection, every user's library can be assessed in terms of impact factor of included assets. Using this method, users with relatively many high impact assets in their collection of used assets are considered to be experts on the given topic. Such users may also be assigned an Expert Impact Factor, reflecting the impact of that user's asset collection on a given population for a given topic.
  • If the user is on a particular document, using a term vector for the document based on either the user's global population, peer population, or expert population, the user can ask what documents are similar to this e.g. ask for “More like this.” The system can compare this document's vector to every other document's term vector. In this case, the invention is looking at the term vector, which is determined by a group as to the relevance of terms to a particular space. Therefore, there is a second measure on top of the term vector, which is the measure of relevance. It is therefore possible to say that this document is relevant in this space, not just that it has these words in common.
  • When a search is performed using the term vector and the activeness vector, the user gets the most useful documents returned to him. The system can also say that now that these documents were found, the vectors are known, and it is possible to go off and find additional documents that might be of interest and suggest those. This is a way to expand the list of search results. The context in which this happens most often is when the user is navigating around, and has found a certain page. The user opens the documents that are closest to this one. The user clicks on the document that he likes, and then says that this is close, so show me more like this. What gets returned may, in fact, have been in that original search list, but there might be some new things as well.
  • The system performs navigation tracking based on usage in a user-specific way. Thus, the invention can also track where people go from one document to another. A user may end up going to a particular document looking for information, but may then have to click through several documents before finding the useful documents. If one or more users follow this same pattern, the system can recommend that a user landing on the initial document go immediately to the useful document. That is, the invention makes an association between documents that are found to be useful, even where other documents may have been encountered by users along a navigation path. Thus, the invention recommends going straight to a document useful document, without having a user navigate through documents that may normally intervene. This is based on usage: for each user there is a matrix representing connections between visited documents and the most useful documents arrived at from that location. As in other aspects of the system, we can combine user opinions to get a collective opinion, e.g., of the user's peers, experts, or the global population, in this case regarding where are the most useful places (assets) to go to from the current location. For every user, for example, we can take each of the peers' matrices and add them all together, and then come up with a recommendation. In this way, the invention keeps track of the navigation patterns of the user's peers. In addition to providing recommendations, identified navigation patterns can also be used to provide visualizations of user activity within an asset collection on a global or population-specific basis. Such visualizations or usage maps can, for example, be very informative to site designers in understanding the effectiveness of their site and understanding user interests.
  • One concern with a system such as that disclosed herein is that as the system starts recommending documents, it tends to reinforce their usefulness. The invention counters this using a validation technique. For example, if there is a fad that the system is reinforcing, e.g. people go to a particular document all at once, but they are not going to return, e.g. a fad movie. Everybody goes because they hear that it is good, but in fact, it is terrible and they hate it. A lot of people go to the document, but no people come back to it. The invention adjusts the usefulness of the document by looking at the percentage of people that come back, and determines if there are enough people coming back to validate it as a legitimately useful document. As for making a determination, in one embodiment if something is new and it does get a lot of attention in the beginning, the system encourages that attention, in a sense, because it may be something that is new and important. But, if over time, people do not start coming back, it is going to decay away quickly. Accordingly, the invention considers both the notion of newness and the notion of validation.
  • Besides connecting a user from document to document, the invention also uses navigation to find information that identifies what a document is about. When somebody clicks on a link and goes straight to a document and uses it, that tells the system that the user clicked on this link thinking he were getting something and then he used it. Whatever the link text is, it is a decent reflection about what this document is. The system then uses that link text and now that link text also contributes to the term vector. It is as if the link is a substitute for a query and, in fact, if the user has clicked on, e.g. ten links going all the way through this document, the system uses the link text, with some weighting because not every link has the same weight. It depends on how close the link is to the document. If the user clicked on one word in a document and clicked on another link, then another link in the next document, the most recent link gets the most weight, and earlier links get less weight. Thus, in this embodiment weighting is based on a form of proximity.
  • Another aspect of navigation addressed by the invention concerns where a user starts with a particular document, e.g. document 1, and the system makes various recommendations based on this starting point. In the event the user was first on a different document, e.g. document 13, than on document 1, the system may recommend a different set of documents. In this case, the invention employs an additive model which look sat the documents that the system recommends from, e.g. document 13 and looks at the documents it recommends from document 1, and the system weights them together. In this way the system may use the user's navigation trail (encompassing one or more navigation points) to suggest the best location to go next. The person skilled in the art will appreciate that there are so many options available for processing the systems vector information
  • One aspect of the invention concerns determining what is a successfully used document. One approach looks at how much time has somebody spent on the document. There are two concepts related to this: One is the idea of document processing time. What we would like to be able to infer is how much time somebody is actually spending looking at and reading this document. This is in contrast to not reading the document at all, where it just might be on the screen, but the user is not looking at it; or the user is seeking for something, searching, but not finding it. Document processing time is the simplest measure of successful use because the system only need look at how much time somebody's on a document. Another consideration is that seeking time is not the same thing as processing time. In this case, the user is scrolling around and not finding what they want, i.e. they are not processing it. The system can take the time spent on the document, subtracting out the time that the user is scrolling on the document. The system can also use scrolling as an indication that the user is actually on the document. The system can apply a combination of scrolling and pausing and use that to get a sense of how long the user is actually processing the document. So, if somebody scrolls, then they stop, and they sit there for 30 seconds, then they start scrolling again, the system can make a guess that the user was reading that document for 30 seconds. Any mistake in the guess is an aggregate because the system rarely looks at just one person's opinion, but is summing this information over a group of users.
  • The term vector is a big vector of every possible term. In the presently preferred embodiment, the system increments a term's vector entry each time the term is associated with a document through user behavior. Terms that are not associated with a document get a zero. On top of this, terms that are associated through usage with many documents get less of a weight in the term vector because they are very common. Thus, the system looks at all the terms that are known and looks at how many documents those terms are associated with. Terms that are associated with many documents have a count for the number of documents that the term is associated with. The system applies a formula to lower the term's rating based on its association with many documents. If there is a word that is in every single document in a collection, then it's rating is equivalently a zero. Certain words, such as “the” and “and” are standard stop words and are removed from a search before they even get to the system analytics.
  • Thus, the system commences with the creation of the initial term vector based on a data structure. For every user in the system, there is a matrix of documents and associated terms referred to as a term doc matrix. This is the collection of a user's term vectors for each document known to the system. In other words, every user has a term doc matrix that represents for each document, what that user thinks the document is about. For example, one person thinks a document is about oil and about refineries, but not about something else. The system services a particular population that is already selected, which can comprise for example peers or experts, e.g. the person's top 30 peers. To perform a comparison, the system knows of these 30 users, and each of these users has a weight based on how much of a peer they are to a current user. The peer with the greatest weight is the top peer, the peer with the next greatest weight is the next best peer, on so on. The invention looks at all of the term doc matrices of these peers and adds them together given the weightings based on their peer scores, and produces therefrom a single term doc matrix which is this population's opinion on every document in the system. The system then takes this matrix and calculates a cosine between a term vector representing a query or current context and each of the rows in the matrix, which represent term vectors for each document. The result of the cosine calculation represents how closely a document matches the context according to the user's peers.
  • Once the system has determined all of the term vectors in the population and assigned numbers to each document, the system then selects the top documents. One way to do this is to select the top ten document and then sum them together to get a single vector which says in the aggregate that these documents are bout a certain topic. Then, the system takes out those search terms that are already used and looks at where the other peaks are. Those are additional terms that the system either wants to suggest or automatically enter to the core. Now there a new term vector, and the system goes through the same process. The system then can match this new term vector with every other document, get a new set of scores, and select the top ten of those documents. The system has a matrix that has everybody's opinion on what these documents are about. The system can now compare this new term factor and get a new set of scores for all of them. The system also goes and we get a single vector for each document which indicates how related this document is to the topic with another score, which is how useful this document is.
  • Once the system has determined all of the term vectors in the population and assigned numbers to each document, the system then selects the top documents. One way to do this is to select the top ten document and then sum them together to get a single vector which says in the aggregate that these documents are bout a certain topic. Then, the system takes out those search terms that are already used and looks at where the other peaks are. Those are additional terms that the system either wants to suggest or automatically enter to the core. Now there a new term vector, and the system goes through the same process. The system then can match this new term vector with every other document, get a new set of scores, and select the top ten of those documents. The system has a matrix that has everybody's opinion on what these documents are about. The system can now compare this new term factor and get a new set of scores for all of them. The system also goes and we get a single vector for each document which indicates how related this document is to the topic with another score, which is how useful this document is.
  • Consider an extremely popular document, a very unpopular document, a somewhat popular document, and a document that has never been used. Every user has a document vector or their activeness vector, which identifies what documents that have been used. Each user also has an associated weight based on the peer population. Given these two numbers for every document, i.e. how well it matches and how popular it is, the system combines the two to produce a score. One way to combine them is to calculate a weighted sum. Another way is to take the numbers as is, but anything that is below a threshold is removed. In the first approach, when the system adds them together the orderings of the documents is changed; in the second approach, the ordering that gets preserved as the ordering of relevance, but the system is removing things that do not meet a certain threshold of accuracy. In another approach, instead of a straight combination, apply a transform which tends to favor more balanced numbers than it favors extremes. A straight linear combination favors extremes but, for example, a square root combination can produce a more balanced result. Once the system has combined the vector, there is a ranking, and the system can recommend, e.g. the top 10 documents. Once the documents are returned to the user, the next part is watching what the user does with the documents. For example, the system looks at such actions is the user navigating in a certain way, staying a certain amount of time on a document, where are the links going. The system collects that information to learn about the usefulness of a document. The user starts by doing a search on a topic, e.g. oil. The system responds by recommending certain documents. The user clicked on one and printed it. In the user's term doc matrix, the system adds a number for that document connected with those words. If somebody else does a search and the term doc is involved, the matrix indicates that the document is relevant for a certain purpose according to a certain person. Thus, if the person is a close peer] and the document is a relevant document, the term match is good, and the document is recommended. If the user is not a close peer or the document does not have a good term match, or the document is not very active, then it is not recommended.
  • As discussed above, there are two vectors, which express usage-based relevance and activeness. There is also a third vector that could be produced when the two vectors are combined with search engine results. Thus, these three things are combined into this vector. There are many ways they can be combined to determine how much to weight the search results from the search engine versus how much to weight the other results. One approach is to obtain a list of IR results and, from that list, remove everything whose accurateness is below a certain threshold. This produces an augmented search, where the search is augmented by removing the less accurate results.
  • Another approach involves separate vectors related to peers, experts, and the global population, which are then combined with different weightings, e.g. the experts get the greatest weighting, the weighting could be profile-based, or it could be based on user responses to a series of questions.
  • As discussed above, every user has a term doc matrix that captures what the user thinks every document is about, and they have an activeness vector that expresses how much the user has used these documents. This activeness vector is not only used through search. It could be used through navigation and is built up based on search terms or link terms. To determine peers and experts proceeds as follows:
  • For a given user, build a picture of What that user's expertise is. Validate that expertise by the global population or by the appropriate peer group. In the first case, the global population has a term doc matrix which represents what the global population's opinion is about every document. This is essentially a sum of every single user's equally weighted opinions on the document. This is a global opinion. For each user, look at what documents that user has used, and that they have used them a certain amount. This step involves determining the expertise of this user. For example, this user has used document one at a weight of, e.g. four, so when the system goes into document one it determines what the global population think this document is about. Take that, multiply it by four, and add it to the user's expertise vector.
  • If the global population thinks that is an important document and if the user has used it a lot, the user has more expertise, The system does that for every document in this user's collection. The things the user has used the most get the most weight in terms of their expertise. Each time the system adds what the global population thinks about the documents that the user has used. Thus, expertise is a measure of what expertise does a user's collection represents. The system does not know what a user's actual expertise is. It could be somebody who has done an excellent job of collecting all the right documents on this topic, but if he's done that, in a sense, he serves the purpose as an expert. That is, if the user has all the good documents on that topic, therefore that collection is an expert collection. The amount of weighting to give the popularity of documents is an issue. An amount of weight is given to how used a document was by this user and an amount of weight is given to how popular the document was in the population. The system combines these numbers and recalculates expertise, every night for example. Thus, the system recalculates everybody's area of expertise because it might change on some basis, e.g. a daily or monthly. The system goes through and calculates everybody's area of expertise, and then if it is desired to figure out who the experts are, given a particular query, the system takes that query vector and compares this query vector to every user's expertise vector. Then, the system can produce the top N of experts, and that is the expert population. Another case occurs where the system does not have a query and has a document but the user wants to know who the experts are. In this case, the system can use the document itself to determine who the experts are. Thus, the document itself has a vector, and the system can compare the vector of this document to the expertise vector of everybody and, given the topic of this document, determine who the experts are on this topic represented by this document.
  • Peers. Every user has a term doc matrix and an activeness vector. There are three things that the system can look at and combine to determine peer-hood. One is to compare what the peer value is for, e.g. two users. The system makes this determination for everybody, but for now focus on two users. Look at one user's activeness vector and another user's activeness vector, and look at how similar they are. Two people that use similar documents a similar amount are similar, looking in the same places and at relatively proportional amounts. In this case, there is a similarity metric between one user's activeness vector and another user's activeness vector. Another way of determining peers is to look at what topics they are interested in. To do that, sum their term doc matrix, which gives a sense of what topics they have searched on and used in the past. The sum represents what this person is interested in, and is referred to as an interest vector. The system compares interest vectors. Thirdly, the system can compare the computed expertise vectors of each user to determine peers. Alternatively, the system could employ combinations of these approaches. Because the user has a particular subject in mind and the user has a certain number, the peers are weighted according to how closely they match that particular person. Some peers may have a closer number to the user's number. Some are going to have one that is smaller or larger, depending on what notation the system is using. In the end, there is a number that indicates how much like this person the user is. The user might want a peer group of 30, then number 30 in the group might have a smaller weight, and number 1 in the group might have a greater weight, and everybody's in between have a weight between. The system could also have a threshold that does not create any peers less than the threshold.
  • FIG. 12 is a flow diagram showing an augmented search according to the invention. In an augmented search, a search request is made by a client of customer libraries. The search is sent to the search server and the extension makes a request for augmented information, for example from Google. The augmented results are returned to the server and the results received are added to the server information which are then sent back to the search server in search server format. The customer then receives the rendered HTML of the search.
  • Time-Based Usefulness
  • As mentioned elsewhere in this application, every aspect of the system adapts and evolves over time as new observations are made, new log files are processed, and new patterns emerge. One aspect of this adaptation involves giving precedence to usage patterns that occur more recently over those that occur in the past. In the preferred implementation of the system, recency bias is accomplished through decaying past usage patterns based on a time decay function. Activeness vectors, for example, might decay at a rate of 0.01% per day, such that activity that occurred in the past has less of an influence on the activeness vector than more recent usage. Similarly, term vectors can be set to decay at a specified rate, such that term to asset associations are biased toward more recent usage patterns. In this way, the usefulness of assets can be computed in a time-sensitive manner.
  • Assets that were useful in the past are not necessarily useful in the present. All information stored within the system, including peer scores and expertise scores, can be set to time decay in a similar fashion. Regarding activeness vectors, assets that are very new or newly rediscovered may need a boost above and beyond recency bias to enable their discovery by the population prior to strong usage patterns having had an opportunity to emerge. Thus, very new assets, defined as those assets whose very recent activity makes up a large proportion of their total activity over all time, may be given an additional newness bias. It is also possible for an administrator to assign a newness bias explicitly to certain assets or a collection of assets. This newness bias makes very new assets appear more active than they are in reality for a short period of time. It is also possible to identify periodic usage of assets and give activeness biases to assets as they reemerge at specific times of year, for example.
  • Usage-Based Evaluation of Terms and Phrases
  • This aspect of the invention relates to relationships amongst terms and amongst terms and phrases that the system infers based on captured usage data. First, a term affinity (similarity) matrix can be constructed that relates terms and phrases to one another. Terms and phrases with high affinities for one another are considered to be aspects of a single topic and may even be synonyms for one another. The term affinity matrix can be constructed based on the frequency of co-occurrence of terms in users' queries or used links, or by the frequency of co-occurrence of terms in assets' term vectors, for example. This matrix, in combination with linguistic characteristics of the terms and phrases themselves, can be used to identify synonyms, acronyms, and atomic phrases automatically. Atomic phrases are ordered sets of two or more words whose frequent occurrence together indicate that they should be considered a single multi-word phrase rather than multiple independent words. The term affinity matrix in combination with navigational usage patterns and assets' term vectors can even be used to detect terms and phrases that are sub-topics of other terms and phrases. Because all such identified relationships between terms/phrases and automatic detection of synonyms, acronyms, and atomic phrases are based on usage by a community of users, identified relationships are inherently tailored to a specific community.
  • Target Applications for the Invention
  • In addition to the foregoing discussion, the invention is also useful in marketing lead generation for business Web sites, sales and channel partner extranets, customer support sites, vertical healthcare applications such as physician portals and patient research sites; vertical government applications such as citizen portals; and financial services and insurance vertical applications such as agent and advisor portals.
  • Although the invention is described herein with reference to the preferred embodiment, one skilled in the art will readily appreciate that other applications may be substituted for those set forth herein without departing from the spirit and scope of the present invention. Accordingly, the invention should only be limited by the Claims included below.

Claims (25)

1. A computer implemented method for determining knowledge and/or expertise, comprising the steps of:
identifying a target community (or subgroup) of users;
observing usage patterns of Web documents and other online assets by said target community (or subgroup) of users; and
based upon said observed usage patterns, detecting which of said Web documents and other online assets that are most used and useful to said target community (or subgroup) of users.
2. The method of claim 1, further comprising the step of:
based upon said observed usage patterns, automatically determining the topic of said Web documents and other online assets.
3. The method of claim 1, further comprising the steps of:
analyzing a broad range of behaviors exhibited by users while interacting with a Web page or piece of content, said behaviors comprising any of mouse movement, dwell time, scrolling, link usage, searching, repeat visits, and combinations of said actions, as well as absence of said actions; and
determining through said analyzing usefulness of content, contexts associated with said usefulness, and similarities between users.
4. The method of claim 1, further comprising the step of:
transparently executing said method within an existing enterprise application and repository environment based upon a conventional engine provided by a third party;
wherein no user training or adoption of new interfaces is required.
5. The method of claim 1, further comprising the step of:
developing implicit observations that predict user intentions with strong confidence, said observations comprising any of think time, virtual bookmarks, virtual print, and virtual email;
wherein in all cases, users have not necessarily performed a bookmark, print, or email against Web documents.
6. The method of claim 1, further comprising the step of:
providing contextual information derived from said observed usage patterns that a user is not directly asking for via a search query or keyword.
7. The method of claim 1, further comprising the step of:
implementing said method as a wrapper for an existing search mechanism;
wherein when a user issues a search query, said query is handled initially by a dedicated system;
wherein said dedicated system, in turn, forwards said query to said existing search mechanism;
wherein said dedicated system optionally performs one or more searches or related operations against its own internal indexes and databases; and
wherein once results from the searches have been obtained, they are merged together into a single set of results.
8. The method of claim 1, further comprising the steps of:
inferring that results are irrelevant to the user if a user does not perform any action at all against results from a query; and
retaining said inference and using it to influence results of future queries by said user.
9. The method of claim 1, further comprising the step of:
detecting experts by examining a community and individuals who have the ability to discover and collect the most useful Web documents having the most impact.
10. The method of claim 1, further comprising the step of:
providing an inline user interface (UI) rendered using tags comprising a result set derived from a selected view on said observed usage patterns and distilled wisdom from said patterns, including any of a “most popular,” “next step,” “similar documents,” and “preferred” tag.
11. The method of claim 1, further comprising the step of:
providing an ordered sequence of search processors in which a first search processor comprises an independent search processor and the second and subsequent search processors act as filters for search processors preceding them.
12. The method of claim 1, further comprising the step of:
providing an explicit bias search processor for recognizing certain queries or query keywords and injecting a fixed set of documents into results for those queries, each with a fixed score.
13. The method of claim 1, further comprising the step of:
providing a popularity search processor comprising an ancillary filter for detecting popular queries and increasing a ranking of documents in results that have historically been selected and used by previous users making the same query.
14. The method of claim 1, further comprising the step of:
dynamically producing a report comparing a ratio of good-to-poor search results for queries that were enhanced using said method to a same ratio for queries that were not enhanced by using said method.
15. The method of claim 1, wherein said method is implemented in an independent and content agnostic system that looks at a location of a Web document and how users interact with that Web document, rather than content of the Web document itself.
16. The method of claim 1, further comprising the steps of:
creating a federation across multiple applications, Websites, and repositories based upon a user's pattern of usage of information from and across said multiple applications, Websites, and repositories; and
automatically performing a federated search across said multiple applications, Websites, and repositories, wherein when a query is searched again, information from said multiple applications, Websites, and repositories is recommended.
17. The method of claim 1, further comprising the step of:
performing an augmented search by blending traditional full-text search with preference and activeness information from global, peer, and expert populations.
18. The method of claim 1, further comprising the step of:
returning the n most popular links that have to do with a query term to a user in response to a user query;
wherein if there is no term, then the top n most popular links are returned.
19. The method of claim 18, further comprising the step of:
limiting the set of most popular links and associated assets to those meeting constraints of one or more configured filters.
20. The method of claim 1, further comprising the steps of:
preloading content; and
showing a user a preview of what content is associated with a link.
21. The method of claim 20, further comprising the steps of:
temporarily replacing current content being displayed with previewed content for as long as the preview is active; and
seamlessly returning to the original content view when the preview is deactivated.
22. The method of claim 1, further comprising the step of:
identifying a spectrum of values ranging from short-lived information, to mid-range knowledge, to long-lived wisdom.
23. The method of claim 1, further comprising the step of:
observing usage patterns in a content-agnostic way;
wherein usage patterns can be detected for any content type, including non-HTML content.
24. The method of claim 1, further comprising the steps of:
performing a content gap analysis of use of content, data, and applications by ascertaining what users are actually looking for and what is missing when the user exhibits search or navigation behavior; and
providing a gap analysis report for detected gaps in said content.
25. The method of claim 1, further comprising the steps of:
performing an analysis of use of content over time including what pieces of content are most useful to a community of users at a given time and how that usefulness trends over time; and
providing a content report for visualizing said usage patterns.
US11/874,137 2004-12-29 2007-10-17 Method and Apparatus for Identifying, Extracting, Capturing, and Leveraging Expertise and Knowledge Abandoned US20080104004A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/874,137 US20080104004A1 (en) 2004-12-29 2007-10-17 Method and Apparatus for Identifying, Extracting, Capturing, and Leveraging Expertise and Knowledge

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US64087204P 2004-12-29 2004-12-29
US11/319,928 US7698270B2 (en) 2004-12-29 2005-12-27 Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge
US11/874,137 US20080104004A1 (en) 2004-12-29 2007-10-17 Method and Apparatus for Identifying, Extracting, Capturing, and Leveraging Expertise and Knowledge

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/319,928 Continuation US7698270B2 (en) 2004-12-29 2005-12-27 Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge

Publications (1)

Publication Number Publication Date
US20080104004A1 true US20080104004A1 (en) 2008-05-01

Family

ID=36615505

Family Applications (4)

Application Number Title Priority Date Filing Date
US11/319,928 Expired - Fee Related US7698270B2 (en) 2004-12-29 2005-12-27 Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge
US11/351,143 Expired - Fee Related US7702690B2 (en) 2004-12-29 2006-02-08 Method and apparatus for suggesting/disambiguation query terms based upon usage patterns observed
US11/874,157 Expired - Fee Related US8601023B2 (en) 2004-12-29 2007-10-17 Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge
US11/874,137 Abandoned US20080104004A1 (en) 2004-12-29 2007-10-17 Method and Apparatus for Identifying, Extracting, Capturing, and Leveraging Expertise and Knowledge

Family Applications Before (3)

Application Number Title Priority Date Filing Date
US11/319,928 Expired - Fee Related US7698270B2 (en) 2004-12-29 2005-12-27 Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge
US11/351,143 Expired - Fee Related US7702690B2 (en) 2004-12-29 2006-02-08 Method and apparatus for suggesting/disambiguation query terms based upon usage patterns observed
US11/874,157 Expired - Fee Related US8601023B2 (en) 2004-12-29 2007-10-17 Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge

Country Status (4)

Country Link
US (4) US7698270B2 (en)
EP (2) EP1839210A4 (en)
CN (1) CN101137980B (en)
WO (1) WO2006071931A2 (en)

Cited By (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060200556A1 (en) * 2004-12-29 2006-09-07 Scott Brave Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge
US20070150470A1 (en) * 2005-12-27 2007-06-28 Scott Brave Method and apparatus for determining peer groups based upon observed usage patterns
US20080306935A1 (en) * 2007-06-11 2008-12-11 Microsoft Corporation Using joint communication and search data
US20090037355A1 (en) * 2004-12-29 2009-02-05 Scott Brave Method and Apparatus for Context-Based Content Recommendation
US20090150390A1 (en) * 2007-12-11 2009-06-11 Atsuhisa Morimoto Data retrieving apparatus, data retrieving method and recording medium
US20090150380A1 (en) * 2007-12-06 2009-06-11 Industrial Technology Research Institute System and method for processing social relation oriented service
US20090248661A1 (en) * 2008-03-28 2009-10-01 Microsoft Corporation Identifying relevant information sources from user activity
WO2009145914A1 (en) * 2008-05-31 2009-12-03 Searchme, Inc. Systems and methods for building, displaying, and sharing albums having links to documents
US20090313325A1 (en) * 2008-06-17 2009-12-17 Mobile Tribe Llc Distributed Technique for Cascaded Data Aggregation in Parallel Fashion
US20090319397A1 (en) * 2008-06-19 2009-12-24 D-Link Systems, Inc. Virtual experience
US20100145937A1 (en) * 2008-12-10 2010-06-10 Gartner, Inc. Interactive peer directory
US20100169148A1 (en) * 2008-12-31 2010-07-01 International Business Machines Corporation Interaction solutions for customer support
US20100228743A1 (en) * 2009-03-03 2010-09-09 Microsoft Corporation Domain-based ranking in document search
US20100228777A1 (en) * 2009-02-20 2010-09-09 Microsoft Corporation Identifying a Discussion Topic Based on User Interest Information
US20100262610A1 (en) * 2009-04-09 2010-10-14 International Business Machines Corporation Identifying Subject Matter Experts
US20100262612A1 (en) * 2009-04-09 2010-10-14 Microsoft Corporation Re-ranking top search results
US20100274790A1 (en) * 2009-04-22 2010-10-28 Palo Alto Research Center Incorporated System And Method For Implicit Tagging Of Documents Using Search Query Data
US20100312771A1 (en) * 2005-04-25 2010-12-09 Microsoft Corporation Associating Information With An Electronic Document
US20110015664A1 (en) * 2009-07-17 2011-01-20 Boston Scientific Scimed, Inc. Nucleation of Drug Delivery Balloons to Provide Improved Crystal Size and Density
US20110160645A1 (en) * 2009-12-31 2011-06-30 Boston Scientific Scimed, Inc. Cryo Activated Drug Delivery and Cutting Balloons
US20110179002A1 (en) * 2010-01-19 2011-07-21 Dell Products L.P. System and Method for a Vector-Space Search Engine
US20110196340A1 (en) * 1997-08-13 2011-08-11 Boston Scientific Scimed, Inc. Loading and release of water-insoluble drugs
US20110213761A1 (en) * 2010-03-01 2011-09-01 Microsoft Corporation Searchable web site discovery and recommendation
US20110225192A1 (en) * 2010-03-11 2011-09-15 Imig Scott K Auto-detection of historical search context
US20110307541A1 (en) * 2010-06-10 2011-12-15 Microsoft Corporation Server load balancing and draining in enhanced communication systems
US20120317075A1 (en) * 2011-06-13 2012-12-13 Suresh Pasumarthi Synchronizing primary and secondary repositories
US20130024939A1 (en) * 2011-07-19 2013-01-24 Gerrity Daniel A Conditional security response using taint vector monitoring
US20130132433A1 (en) * 2011-11-22 2013-05-23 Yahoo! Inc. Method and system for categorizing web-search queries in semantically coherent topics
US20130185294A1 (en) * 2011-03-03 2013-07-18 Nec Corporation Recommender system, recommendation method, and program
US8597720B2 (en) 2007-01-21 2013-12-03 Hemoteq Ag Medical product for treating stenosis of body passages and for preventing threatening restenosis
US8661034B2 (en) 2010-02-03 2014-02-25 Gartner, Inc. Bimodal recommendation engine for recommending items and peers
US8669360B2 (en) 2011-08-05 2014-03-11 Boston Scientific Scimed, Inc. Methods of converting amorphous drug substance into crystalline form
US8751559B2 (en) 2008-09-16 2014-06-10 Microsoft Corporation Balanced routing of questions to experts
US8889211B2 (en) 2010-09-02 2014-11-18 Boston Scientific Scimed, Inc. Coating process for drug delivery balloons using heat-induced rewrap memory
US8918391B2 (en) 2009-12-02 2014-12-23 Gartner, Inc. Interactive peer directory with question router
US8930714B2 (en) 2011-07-19 2015-01-06 Elwha Llc Encrypted memory
US8955111B2 (en) 2011-09-24 2015-02-10 Elwha Llc Instruction set adapted for security risk monitoring
US9056152B2 (en) 2011-08-25 2015-06-16 Boston Scientific Scimed, Inc. Medical device with crystalline drug coating
US9098608B2 (en) 2011-10-28 2015-08-04 Elwha Llc Processor configured to allocate resources using an entitlement vector
US9170843B2 (en) 2011-09-24 2015-10-27 Elwha Llc Data handling apparatus adapted for scheduling operations according to resource allocation based on entitlement
US9183279B2 (en) 2011-09-22 2015-11-10 International Business Machines Corporation Semantic questioning mechanism to enable analysis of information architectures
US9192697B2 (en) 2007-07-03 2015-11-24 Hemoteq Ag Balloon catheter for treating stenosis of body passages and for preventing threatening restenosis
US9251185B2 (en) 2010-12-15 2016-02-02 Girish Kumar Classifying results of search queries
US9298918B2 (en) 2011-11-30 2016-03-29 Elwha Llc Taint injection and tracking
US9443085B2 (en) 2011-07-19 2016-09-13 Elwha Llc Intrusion detection using taint accumulation
US9465657B2 (en) 2011-07-19 2016-10-11 Elwha Llc Entitlement vector for library usage in managing resource allocation and scheduling based on usage and priority
US9471373B2 (en) 2011-09-24 2016-10-18 Elwha Llc Entitlement vector for library usage in managing resource allocation and scheduling based on usage and priority
US9477672B2 (en) 2009-12-02 2016-10-25 Gartner, Inc. Implicit profile for use with recommendation engine and/or question router
US9558034B2 (en) 2011-07-19 2017-01-31 Elwha Llc Entitlement vector for managing resource allocation
US9575903B2 (en) 2011-08-04 2017-02-21 Elwha Llc Security perimeter
US9785987B2 (en) 2010-04-22 2017-10-10 Microsoft Technology Licensing, Llc User interface for information presentation system
US9798873B2 (en) 2011-08-04 2017-10-24 Elwha Llc Processor operable to ensure code integrity
US9836765B2 (en) 2014-05-19 2017-12-05 Kibo Software, Inc. System and method for context-aware recommendation through user activity change detection
US10102278B2 (en) 2010-02-03 2018-10-16 Gartner, Inc. Methods and systems for modifying a user profile for a recommendation algorithm and making recommendations based on user interactions with items
US10369256B2 (en) 2009-07-10 2019-08-06 Boston Scientific Scimed, Inc. Use of nanocrystals for drug delivery from a balloon
US10600011B2 (en) 2013-03-05 2020-03-24 Gartner, Inc. Methods and systems for improving engagement with a recommendation engine that recommends items, peers, and services
US10628504B2 (en) 2010-07-30 2020-04-21 Microsoft Technology Licensing, Llc System of providing suggestions based on accessible and contextual information

Families Citing this family (278)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100122312A1 (en) * 2008-11-07 2010-05-13 Novell, Inc. Predictive service systems
US20090234718A1 (en) * 2000-09-05 2009-09-17 Novell, Inc. Predictive service systems using emotion detection
US7505964B2 (en) 2003-09-12 2009-03-17 Google Inc. Methods and systems for improving a search ranking using related queries
US7958115B2 (en) * 2004-07-29 2011-06-07 Yahoo! Inc. Search systems and methods using in-line contextual queries
US7409402B1 (en) * 2005-09-20 2008-08-05 Yahoo! Inc. Systems and methods for presenting advertising content based on publisher-selected labels
US7962465B2 (en) * 2006-10-19 2011-06-14 Yahoo! Inc. Contextual syndication platform
US7421441B1 (en) * 2005-09-20 2008-09-02 Yahoo! Inc. Systems and methods for presenting information based on publisher-selected labels
US11259059B2 (en) 2004-07-30 2022-02-22 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US9584868B2 (en) 2004-07-30 2017-02-28 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US7590997B2 (en) 2004-07-30 2009-09-15 Broadband Itv, Inc. System and method for managing, converting and displaying video content on a video-on-demand platform, including ads used for drill-down navigation and consumer-generated classified ads
US9635429B2 (en) 2004-07-30 2017-04-25 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US7631336B2 (en) 2004-07-30 2009-12-08 Broadband Itv, Inc. Method for converting, navigating and displaying video content uploaded from the internet to a digital TV video-on-demand platform
US8369655B2 (en) * 2006-07-31 2013-02-05 Ricoh Co., Ltd. Mixed media reality recognition using multiple specialized indexes
US8838591B2 (en) 2005-08-23 2014-09-16 Ricoh Co., Ltd. Embedding hot spots in electronic documents
US8868555B2 (en) * 2006-07-31 2014-10-21 Ricoh Co., Ltd. Computation of a recongnizability score (quality predictor) for image retrieval
US9384619B2 (en) 2006-07-31 2016-07-05 Ricoh Co., Ltd. Searching media content for objects specified using identifiers
US8825682B2 (en) * 2006-07-31 2014-09-02 Ricoh Co., Ltd. Architecture for mixed media reality retrieval of locations and registration of images
US8184155B2 (en) * 2007-07-11 2012-05-22 Ricoh Co. Ltd. Recognition and tracking using invisible junctions
US7702673B2 (en) 2004-10-01 2010-04-20 Ricoh Co., Ltd. System and methods for creation and use of a mixed media environment
US7970171B2 (en) * 2007-01-18 2011-06-28 Ricoh Co., Ltd. Synthetic image and video generation from ground truth data
US8332401B2 (en) 2004-10-01 2012-12-11 Ricoh Co., Ltd Method and system for position-based image matching in a mixed media environment
US8195659B2 (en) 2005-08-23 2012-06-05 Ricoh Co. Ltd. Integration and use of mixed media documents
US9373029B2 (en) * 2007-07-11 2016-06-21 Ricoh Co., Ltd. Invisible junction feature recognition for document security or annotation
US8005831B2 (en) 2005-08-23 2011-08-23 Ricoh Co., Ltd. System and methods for creation and use of a mixed media environment with geographic location information
US9171202B2 (en) 2005-08-23 2015-10-27 Ricoh Co., Ltd. Data organization and access for mixed media document system
US8156427B2 (en) 2005-08-23 2012-04-10 Ricoh Co. Ltd. User interface for mixed media reality
US7991778B2 (en) 2005-08-23 2011-08-02 Ricoh Co., Ltd. Triggering actions with captured input in a mixed media environment
US8335789B2 (en) 2004-10-01 2012-12-18 Ricoh Co., Ltd. Method and system for document fingerprint matching in a mixed media environment
US8086038B2 (en) 2007-07-11 2011-12-27 Ricoh Co., Ltd. Invisible junction features for patch recognition
US8176054B2 (en) 2007-07-12 2012-05-08 Ricoh Co. Ltd Retrieving electronic documents by converting them to synthetic text
US8156115B1 (en) 2007-07-11 2012-04-10 Ricoh Co. Ltd. Document-based networking with mixed media reality
US8144921B2 (en) * 2007-07-11 2012-03-27 Ricoh Co., Ltd. Information retrieval using invisible junctions and geometric constraints
US9530050B1 (en) 2007-07-11 2016-12-27 Ricoh Co., Ltd. Document annotation sharing
US8600989B2 (en) 2004-10-01 2013-12-03 Ricoh Co., Ltd. Method and system for image matching in a mixed media environment
US8385589B2 (en) 2008-05-15 2013-02-26 Berna Erol Web-based content detection in images, extraction and recognition
US8521737B2 (en) 2004-10-01 2013-08-27 Ricoh Co., Ltd. Method and system for multi-tier image matching in a mixed media environment
US8949287B2 (en) 2005-08-23 2015-02-03 Ricoh Co., Ltd. Embedding hot spots in imaged documents
US8276088B2 (en) * 2007-07-11 2012-09-25 Ricoh Co., Ltd. User interface for three-dimensional navigation
US8156116B2 (en) 2006-07-31 2012-04-10 Ricoh Co., Ltd Dynamic presentation of targeted information in a mixed media reality recognition system
US9405751B2 (en) 2005-08-23 2016-08-02 Ricoh Co., Ltd. Database for mixed media document system
US7812986B2 (en) * 2005-08-23 2010-10-12 Ricoh Co. Ltd. System and methods for use of voice mail and email in a mixed media environment
US8510283B2 (en) 2006-07-31 2013-08-13 Ricoh Co., Ltd. Automatic adaption of an image recognition system to image capture devices
US8856108B2 (en) 2006-07-31 2014-10-07 Ricoh Co., Ltd. Combining results of image retrieval processes
US8683031B2 (en) * 2004-10-29 2014-03-25 Trustwave Holdings, Inc. Methods and systems for scanning and monitoring content on a network
KR100952391B1 (en) * 2005-04-14 2010-04-14 에스케이커뮤니케이션즈 주식회사 System and method for evaluating contents on the internet network and computer readable medium processing the method
US8606781B2 (en) * 2005-04-29 2013-12-10 Palo Alto Research Center Incorporated Systems and methods for personalized search
US8438142B2 (en) 2005-05-04 2013-05-07 Google Inc. Suggesting and refining user input based on original user input
JP2006344118A (en) * 2005-06-10 2006-12-21 Fuji Xerox Co Ltd Using state notifying system
US8086605B2 (en) * 2005-06-28 2011-12-27 Yahoo! Inc. Search engine with augmented relevance ranking by community participation
CA2614440C (en) 2005-07-07 2016-06-21 Sermo, Inc. Method and apparatus for conducting an information brokering service
US8234145B2 (en) * 2005-07-12 2012-07-31 International Business Machines Corporation Automatic computation of validation metrics for global logistics processes
DE102005035903A1 (en) * 2005-07-28 2007-02-08 X-Aitment Gmbh Generic AI architecture for a multi-agent system
US7725485B1 (en) * 2005-08-01 2010-05-25 Google Inc. Generating query suggestions using contextual information
US7774335B1 (en) * 2005-08-23 2010-08-10 Amazon Technologies, Inc. Method and system for determining interest levels of online content navigation paths
US8719255B1 (en) 2005-08-23 2014-05-06 Amazon Technologies, Inc. Method and system for determining interest levels of online content based on rates of change of content access
US20070088633A1 (en) * 2005-10-19 2007-04-19 Mod Systems Method and system for optimal or near-optimal selection of content for broadcast in a commercial environment
US20070088659A1 (en) * 2005-10-19 2007-04-19 Mod Systems Distribution of selected digitally-encoded content to a storage device, user device, or other distribution target with concurrent rendering of selected content
JP4495669B2 (en) * 2005-12-06 2010-07-07 株式会社日立製作所 Business process design support method and system based on role relationship modeling
US20070143364A1 (en) * 2005-12-21 2007-06-21 Chen Lang S Techniques to manage contact information
US8117196B2 (en) 2006-01-23 2012-02-14 Chacha Search, Inc. Search tool providing optional use of human search guides
US8065286B2 (en) 2006-01-23 2011-11-22 Chacha Search, Inc. Scalable search system using human searchers
US7689559B2 (en) * 2006-02-08 2010-03-30 Telenor Asa Document similarity scoring and ranking method, device and computer program product
US7853630B2 (en) 2006-03-06 2010-12-14 Aggregate Knowledge System and method for the dynamic generation of correlation scores between arbitrary objects
US7788358B2 (en) * 2006-03-06 2010-08-31 Aggregate Knowledge Using cross-site relationships to generate recommendations
JP5057546B2 (en) * 2006-03-24 2012-10-24 キヤノン株式会社 Document search apparatus and document search method
US20070239722A1 (en) * 2006-03-30 2007-10-11 Phillips Mark E Distributed user-profile data structure
US20080046872A1 (en) * 2006-05-03 2008-02-21 Cooper Greg J Compiler using interactive design markup language
US7624104B2 (en) * 2006-06-22 2009-11-24 Yahoo! Inc. User-sensitive pagerank
US8214482B2 (en) * 2006-06-27 2012-07-03 Nosadia Pass Nv, Limited Liability Company Remote log repository with access policy
US8301753B1 (en) 2006-06-27 2012-10-30 Nosadia Pass Nv, Limited Liability Company Endpoint activity logging
US7668954B1 (en) 2006-06-27 2010-02-23 Stephen Waller Melvin Unique identifier validation
US7822762B2 (en) * 2006-06-28 2010-10-26 Microsoft Corporation Entity-specific search model
EP2046410B1 (en) * 2006-07-03 2019-04-10 Hemoteq AG Manufacture, method, and use of active substance-releasing medical products for permanently keeping blood vessels open
US20080016052A1 (en) * 2006-07-14 2008-01-17 Bea Systems, Inc. Using Connections Between Users and Documents to Rank Documents in an Enterprise Search System
US7873641B2 (en) * 2006-07-14 2011-01-18 Bea Systems, Inc. Using tags in an enterprise search system
US20080016061A1 (en) * 2006-07-14 2008-01-17 Bea Systems, Inc. Using a Core Data Structure to Calculate Document Ranks
US8073263B2 (en) 2006-07-31 2011-12-06 Ricoh Co., Ltd. Multi-classifier selection and monitoring for MMR-based image recognition
US8201076B2 (en) * 2006-07-31 2012-06-12 Ricoh Co., Ltd. Capturing symbolic information from documents upon printing
US9063952B2 (en) 2006-07-31 2015-06-23 Ricoh Co., Ltd. Mixed media reality recognition with image tracking
US9176984B2 (en) * 2006-07-31 2015-11-03 Ricoh Co., Ltd Mixed media reality retrieval of differentially-weighted links
US9020966B2 (en) 2006-07-31 2015-04-28 Ricoh Co., Ltd. Client device for interacting with a mixed media reality recognition system
US8676810B2 (en) * 2006-07-31 2014-03-18 Ricoh Co., Ltd. Multiple index mixed media reality recognition using unequal priority indexes
US8489987B2 (en) * 2006-07-31 2013-07-16 Ricoh Co., Ltd. Monitoring and analyzing creation and usage of visual content using image and hotspot interaction
CN101127101A (en) * 2006-08-18 2008-02-20 鸿富锦精密工业(深圳)有限公司 Label information supervision system and method
US20080140642A1 (en) * 2006-10-10 2008-06-12 Bill Messing Automated user activity associated data collection and reporting for content/metadata selection and propagation service
WO2008046104A2 (en) * 2006-10-13 2008-04-17 Collexis Holding, Inc. Methods and systems for knowledge discovery
US9110975B1 (en) 2006-11-02 2015-08-18 Google Inc. Search result inputs using variant generalized queries
US8661029B1 (en) 2006-11-02 2014-02-25 Google Inc. Modifying search result ranking based on implicit user feedback
WO2008061290A1 (en) * 2006-11-20 2008-05-29 Funnelback Pty Ltd Annotation index system and method
US7676457B2 (en) * 2006-11-29 2010-03-09 Red Hat, Inc. Automatic index based query optimization
US8099429B2 (en) * 2006-12-11 2012-01-17 Microsoft Corporation Relational linking among resoures
US8140566B2 (en) * 2006-12-12 2012-03-20 Yahoo! Inc. Open framework for integrating, associating, and interacting with content objects including automatic feed creation
US20090234814A1 (en) * 2006-12-12 2009-09-17 Marco Boerries Configuring a search engine results page with environment-specific information
US20090240564A1 (en) * 2006-12-12 2009-09-24 Marco Boerries Open framework for integrating, associating, and interacting with content objects including advertisement and content personalization
WO2008074150A1 (en) * 2006-12-20 2008-06-26 Ma, Gary Manchoir Method and apparatus for scoring electronic documents
US8156135B2 (en) * 2006-12-22 2012-04-10 Yahoo! Inc. Method and system for progressive disclosure of search results
US9298721B2 (en) * 2007-02-28 2016-03-29 Qualcomm Incorporated Prioritized search results based on monitored data
US20080222155A1 (en) * 2007-03-08 2008-09-11 Phillips Mark E Method and apparatus for partial publication and inventory maintenance of media objects in a region
US8938463B1 (en) 2007-03-12 2015-01-20 Google Inc. Modifying search result ranking based on implicit user feedback and a model of presentation bias
US8694374B1 (en) 2007-03-14 2014-04-08 Google Inc. Detecting click spam
US8161040B2 (en) * 2007-04-30 2012-04-17 Piffany, Inc. Criteria-specific authority ranking
US9092510B1 (en) 2007-04-30 2015-07-28 Google Inc. Modifying search result ranking based on a temporal element of user feedback
US8775603B2 (en) 2007-05-04 2014-07-08 Sitespect, Inc. Method and system for testing variations of website content
US20080281807A1 (en) * 2007-05-11 2008-11-13 Siemens Aktiengesellschaft Search engine
US8359309B1 (en) 2007-05-23 2013-01-22 Google Inc. Modifying search result ranking based on corpus search statistics
US7873635B2 (en) * 2007-05-31 2011-01-18 Microsoft Corporation Search ranger system and double-funnel model for search spam analyses and browser protection
US8667117B2 (en) * 2007-05-31 2014-03-04 Microsoft Corporation Search ranger system and double-funnel model for search spam analyses and browser protection
US9430577B2 (en) * 2007-05-31 2016-08-30 Microsoft Technology Licensing, Llc Search ranger system and double-funnel model for search spam analyses and browser protection
US7996400B2 (en) * 2007-06-23 2011-08-09 Microsoft Corporation Identification and use of web searcher expertise
US11570521B2 (en) 2007-06-26 2023-01-31 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US20090037412A1 (en) * 2007-07-02 2009-02-05 Kristina Butvydas Bard Qualitative search engine based on factors of consumer trust specification
US20090012833A1 (en) * 2007-07-02 2009-01-08 Cisco Technology, Inc. Search engine for most helpful employees
US8538940B2 (en) * 2007-07-26 2013-09-17 International Business Machines Corporation Identification of shared resources
US20090037431A1 (en) 2007-07-30 2009-02-05 Paul Martino System and method for maintaining metadata correctness
US20090043749A1 (en) * 2007-08-06 2009-02-12 Garg Priyank S Extracting query intent from query logs
US7966341B2 (en) * 2007-08-06 2011-06-21 Yahoo! Inc. Estimating the date relevance of a query from query logs
US8694511B1 (en) 2007-08-20 2014-04-08 Google Inc. Modifying search result ranking based on populations
US8190475B1 (en) * 2007-09-05 2012-05-29 Google Inc. Visitor profile modeling
US8146099B2 (en) * 2007-09-27 2012-03-27 Microsoft Corporation Service-oriented pipeline based architecture
US8032714B2 (en) 2007-09-28 2011-10-04 Aggregate Knowledge Inc. Methods and systems for caching data using behavioral event correlations
US8171029B2 (en) * 2007-10-05 2012-05-01 Fujitsu Limited Automatic generation of ontologies using word affinities
US8332439B2 (en) * 2007-10-05 2012-12-11 Fujitsu Limited Automatically generating a hierarchy of terms
US8909655B1 (en) 2007-10-11 2014-12-09 Google Inc. Time based ranking
US20090106312A1 (en) * 2007-10-22 2009-04-23 Franklin Charles Breslau User function feedback method and system
JP4998214B2 (en) * 2007-11-02 2012-08-15 ソニー株式会社 Information presentation system, information signal processing apparatus, information signal processing method, and recording medium
US8839088B1 (en) 2007-11-02 2014-09-16 Google Inc. Determining an aspect value, such as for estimating a characteristic of online entity
US8019748B1 (en) 2007-11-14 2011-09-13 Google Inc. Web search refinement
KR101060487B1 (en) * 2007-11-19 2011-08-30 서울대학교산학협력단 Apparatus and method for content recommendation using tag cloud
US10083420B2 (en) 2007-11-21 2018-09-25 Sermo, Inc Community moderated information
US8099430B2 (en) * 2008-12-18 2012-01-17 International Business Machines Corporation Computer method and apparatus of information management and navigation
US10269024B2 (en) * 2008-02-08 2019-04-23 Outbrain Inc. Systems and methods for identifying and measuring trends in consumer content demand within vertically associated websites and related content
US20090222321A1 (en) * 2008-02-28 2009-09-03 Microsoft Corporation Prediction of future popularity of query terms
US8150869B2 (en) * 2008-03-17 2012-04-03 Microsoft Corporation Combined web browsing and searching
US8762364B2 (en) * 2008-03-18 2014-06-24 Yahoo! Inc. Personalizing sponsored search advertising layout using user behavior history
US8589395B2 (en) * 2008-04-15 2013-11-19 Yahoo! Inc. System and method for trail identification with search results
US8051080B2 (en) * 2008-04-16 2011-11-01 Yahoo! Inc. Contextual ranking of keywords using click data
WO2009137408A2 (en) * 2008-05-05 2009-11-12 Thomson Reuters Global Resources Systems and methods for integrating user-generated content with proprietary content in a database
US8055673B2 (en) * 2008-06-05 2011-11-08 Yahoo! Inc. Friendly search and socially augmented search query assistance layer
US8082278B2 (en) * 2008-06-13 2011-12-20 Microsoft Corporation Generating query suggestions from semantic relationships in content
US8918383B2 (en) * 2008-07-09 2014-12-23 International Business Machines Corporation Vector space lightweight directory access protocol data search
US10002324B1 (en) * 2008-10-31 2018-06-19 Emergent Systems Corporation Method for determining data quality in a knowledge management system
US10380634B2 (en) * 2008-11-22 2019-08-13 Callidus Software, Inc. Intent inference of website visitors and sales leads package generation
US8396865B1 (en) 2008-12-10 2013-03-12 Google Inc. Sharing search engine relevance data between corpora
US20100153432A1 (en) * 2008-12-11 2010-06-17 Sap Ag Object based modeling for software application query generation
US8386475B2 (en) 2008-12-30 2013-02-26 Novell, Inc. Attribution analysis and correlation
US8301622B2 (en) * 2008-12-30 2012-10-30 Novell, Inc. Identity analysis and correlation
EP2394228A4 (en) * 2009-03-10 2013-01-23 Ebrary Inc Method and apparatus for real time text analysis and text navigation
US9009146B1 (en) 2009-04-08 2015-04-14 Google Inc. Ranking search results based on similar queries
US20100268661A1 (en) * 2009-04-20 2010-10-21 4-Tell, Inc Recommendation Systems
US10275818B2 (en) 2009-04-20 2019-04-30 4-Tell, Inc. Next generation improvements in recommendation systems
US10269021B2 (en) 2009-04-20 2019-04-23 4-Tell, Inc. More improvements in recommendation systems
US20100299342A1 (en) * 2009-05-22 2010-11-25 Nbc Universal, Inc. System and method for modification in computerized searching
US8385660B2 (en) 2009-06-24 2013-02-26 Ricoh Co., Ltd. Mixed media reality indexing and retrieval for repeated content
US8661050B2 (en) * 2009-07-10 2014-02-25 Microsoft Corporation Hybrid recommendation system
US8447760B1 (en) 2009-07-20 2013-05-21 Google Inc. Generating a related set of documents for an initial set of documents
US8755815B2 (en) 2010-08-31 2014-06-17 Qualcomm Incorporated Use of wireless access point ID for position determination
US8395547B2 (en) 2009-08-27 2013-03-12 Hewlett-Packard Development Company, L.P. Location tracking for mobile computing device
US8498974B1 (en) 2009-08-31 2013-07-30 Google Inc. Refining search results
US9015148B2 (en) * 2009-09-21 2015-04-21 Microsoft Corporation Suggesting related search queries during web browsing
US20140089246A1 (en) * 2009-09-23 2014-03-27 Edwin Adriaansen Methods and systems for knowledge discovery
US8972391B1 (en) 2009-10-02 2015-03-03 Google Inc. Recent interest based relevance scoring
US8671089B2 (en) 2009-10-06 2014-03-11 Brightedge Technologies, Inc. Correlating web page visits and conversions with external references
US9659265B2 (en) 2009-10-12 2017-05-23 Oracle International Corporation Methods and systems for collecting and analyzing enterprise activities
US9251157B2 (en) * 2009-10-12 2016-02-02 Oracle International Corporation Enterprise node rank engine
US8874555B1 (en) 2009-11-20 2014-10-28 Google Inc. Modifying scoring data based on historical changes
US20110173570A1 (en) * 2010-01-13 2011-07-14 Microsoft Corporation Data feeds with peripherally presented interesting content
WO2011095923A1 (en) * 2010-02-03 2011-08-11 Syed Yasin Self-learning methods for automatically generating a summary of a document, knowledge extraction and contextual mapping
US8615514B1 (en) 2010-02-03 2013-12-24 Google Inc. Evaluating website properties by partitioning user feedback
US8924379B1 (en) 2010-03-05 2014-12-30 Google Inc. Temporal-based score adjustments
US8959093B1 (en) 2010-03-15 2015-02-17 Google Inc. Ranking search results based on anchors
WO2011119186A1 (en) * 2010-03-23 2011-09-29 Google Inc. Conversion path performance measures and reports
US8838587B1 (en) 2010-04-19 2014-09-16 Google Inc. Propagating query classifications
US8650173B2 (en) 2010-06-23 2014-02-11 Microsoft Corporation Placement of search results using user intent
US9623119B1 (en) 2010-06-29 2017-04-18 Google Inc. Accentuating search results
US8832083B1 (en) 2010-07-23 2014-09-09 Google Inc. Combining user feedback
US9015244B2 (en) * 2010-08-20 2015-04-21 Bitvore Corp. Bulletin board data mapping and presentation
US8521774B1 (en) 2010-08-20 2013-08-27 Google Inc. Dynamically generating pre-aggregated datasets
US8577915B2 (en) * 2010-09-10 2013-11-05 Veveo, Inc. Method of and system for conducting personalized federated search and presentation of results therefrom
WO2012050948A1 (en) 2010-09-29 2012-04-19 Hewlett-Packard Development Company, L.P. Location tracking for mobile computing device
US20120084279A1 (en) * 2010-09-30 2012-04-05 Microsoft Corporation Search detail display using search result context
US9779168B2 (en) 2010-10-04 2017-10-03 Excalibur Ip, Llc Contextual quick-picks
US8661067B2 (en) * 2010-10-13 2014-02-25 International Business Machines Corporation Predictive migrate and recall
US10026058B2 (en) * 2010-10-29 2018-07-17 Microsoft Technology Licensing, Llc Enterprise resource planning oriented context-aware environment
CA2719790A1 (en) 2010-11-05 2011-01-19 Ibm Canada Limited - Ibm Canada Limitee Expertise identification using interaction metrics
US9002867B1 (en) 2010-12-30 2015-04-07 Google Inc. Modifying ranking data based on document changes
US8468164B1 (en) * 2011-03-09 2013-06-18 Amazon Technologies, Inc. Personalized recommendations based on related users
US8452797B1 (en) * 2011-03-09 2013-05-28 Amazon Technologies, Inc. Personalized recommendations based on item usage
US9679316B2 (en) * 2011-06-06 2017-06-13 Paypal, Inc. Selecting diverse product titles to display on a website
US9058331B2 (en) 2011-07-27 2015-06-16 Ricoh Co., Ltd. Generating a conversation in a social network based on visual search results
CN102982035B (en) * 2011-09-05 2015-10-07 腾讯科技(深圳)有限公司 A kind of search ordering method of community users and system
WO2013039933A1 (en) * 2011-09-13 2013-03-21 Monk Akarshala Design Private Limited Role-based history in a modular learning system
US8195799B1 (en) 2011-10-26 2012-06-05 SHTC Holdings LLC Smart test article optimizer
US8849803B2 (en) 2011-10-31 2014-09-30 International Business Machines Corporation Data collection for usage based insurance
DE102012100470A1 (en) * 2012-01-20 2013-07-25 Nektoon Ag Method of compiling documents
US9098540B2 (en) * 2012-03-12 2015-08-04 Oracle International Corporation System and method for providing a governance model for use with an enterprise crawl and search framework environment
US8762324B2 (en) * 2012-03-23 2014-06-24 Sap Ag Multi-dimensional query expansion employing semantics and usage statistics
US10303754B1 (en) 2012-05-30 2019-05-28 Callidus Software, Inc. Creation and display of dynamic content component
FR2994358B1 (en) * 2012-08-01 2015-06-19 Netwave SYSTEM FOR PROCESSING CONNECTION DATA TO A PLATFORM OF AN INTERNET SITE
US10261938B1 (en) 2012-08-31 2019-04-16 Amazon Technologies, Inc. Content preloading using predictive models
US10628503B2 (en) * 2012-11-27 2020-04-21 Gubagoo, Inc. Systems and methods for online web site lead generation service
US9881011B2 (en) * 2012-11-29 2018-01-30 Ricoh Company, Ltd. System and method for generating user profiles for human resources
US20140156623A1 (en) * 2012-12-05 2014-06-05 Google Inc. Generating and displaying tasks
US9576053B2 (en) * 2012-12-31 2017-02-21 Charles J. Reed Method and system for ranking content of objects for search results
US20140229488A1 (en) * 2013-02-11 2014-08-14 Telefonaktiebolaget L M Ericsson (Publ) Apparatus, Method, and Computer Program Product For Ranking Data Objects
US10331686B2 (en) * 2013-03-14 2019-06-25 Microsoft Corporation Conducting search sessions utilizing navigation patterns
US10157390B2 (en) 2013-03-15 2018-12-18 Commerce Signals, Inc. Methods and systems for a virtual marketplace or exchange for distributed signals
US9002837B2 (en) * 2013-03-15 2015-04-07 Ipar, Llc Systems and methods for providing expert thread search results
US10771247B2 (en) 2013-03-15 2020-09-08 Commerce Signals, Inc. Key pair platform and system to manage federated trust networks in distributed advertising
US10803512B2 (en) 2013-03-15 2020-10-13 Commerce Signals, Inc. Graphical user interface for object discovery and mapping in open systems
US11222346B2 (en) 2013-03-15 2022-01-11 Commerce Signals, Inc. Method and systems for distributed signals for use with advertising
US20140303806A1 (en) * 2013-04-04 2014-10-09 GM Global Technology Operations LLC Apparatus and methods for providing tailored information to vehicle users based on vehicle community input
US9183499B1 (en) 2013-04-19 2015-11-10 Google Inc. Evaluating quality based on neighbor features
WO2014186392A2 (en) * 2013-05-14 2014-11-20 Google Inc. Summarizing a photo album
US10430418B2 (en) 2013-05-29 2019-10-01 Microsoft Technology Licensing, Llc Context-based actions from a source application
US11263221B2 (en) 2013-05-29 2022-03-01 Microsoft Technology Licensing, Llc Search result contexts for application launch
US9563847B2 (en) * 2013-06-05 2017-02-07 MultiModel Research, LLC Apparatus and method for building and using inference engines based on representations of data that preserve relationships between objects
US10242091B2 (en) * 2013-08-08 2019-03-26 Systamedic, Inc. Method of knowledge extraction through data mining
US20150046419A1 (en) * 2013-08-12 2015-02-12 Vidmind Ltd. Method of sorting search results by recommendation engine
US10223401B2 (en) 2013-08-15 2019-03-05 International Business Machines Corporation Incrementally retrieving data for objects to provide a desired level of detail
US10185748B1 (en) * 2013-08-22 2019-01-22 Evernote Corporation Combining natural language and keyword search queries for personal content collections
US11238056B2 (en) 2013-10-28 2022-02-01 Microsoft Technology Licensing, Llc Enhancing search results with social labels
US9973950B2 (en) * 2013-10-31 2018-05-15 Telefonaktiebolaget Lm Ericsson (Publ) Technique for data traffic analysis
US9542440B2 (en) 2013-11-04 2017-01-10 Microsoft Technology Licensing, Llc Enterprise graph search based on object and actor relationships
CN104090890B (en) * 2013-12-12 2016-05-04 深圳市腾讯计算机系统有限公司 Keyword similarity acquisition methods, device and server
US11645289B2 (en) 2014-02-04 2023-05-09 Microsoft Technology Licensing, Llc Ranking enterprise graph queries
US9870432B2 (en) 2014-02-24 2018-01-16 Microsoft Technology Licensing, Llc Persisted enterprise graph queries
US11657060B2 (en) 2014-02-27 2023-05-23 Microsoft Technology Licensing, Llc Utilizing interactivity signals to generate relationships and promote content
US10757201B2 (en) 2014-03-01 2020-08-25 Microsoft Technology Licensing, Llc Document and content feed
US10255563B2 (en) 2014-03-03 2019-04-09 Microsoft Technology Licensing, Llc Aggregating enterprise graph content around user-generated topics
US10169457B2 (en) 2014-03-03 2019-01-01 Microsoft Technology Licensing, Llc Displaying and posting aggregated social activity on a piece of enterprise content
US10394827B2 (en) 2014-03-03 2019-08-27 Microsoft Technology Licensing, Llc Discovering enterprise content based on implicit and explicit signals
US9430477B2 (en) * 2014-05-12 2016-08-30 International Business Machines Corporation Predicting knowledge gaps of media consumers
US20160171419A1 (en) * 2014-05-19 2016-06-16 Degang ZHANG Assistance service facilitation
US9563912B2 (en) * 2014-08-15 2017-02-07 Microsoft Technology Licensing, Llc Auto recognition of acquirable entities
US10061826B2 (en) 2014-09-05 2018-08-28 Microsoft Technology Licensing, Llc. Distant content discovery
US9251013B1 (en) 2014-09-30 2016-02-02 Bertram Capital Management, Llc Social log file collaboration and annotation
US9852132B2 (en) * 2014-11-25 2017-12-26 Chegg, Inc. Building a topical learning model in a content management system
US10552493B2 (en) * 2015-02-04 2020-02-04 International Business Machines Corporation Gauging credibility of digital content items
US10157178B2 (en) * 2015-02-06 2018-12-18 International Business Machines Corporation Identifying categories within textual data
US9936031B2 (en) * 2015-03-31 2018-04-03 International Business Machines Corporation Generation of content recommendations
CN104791233B (en) * 2015-04-30 2018-04-17 西安交通大学 Based on the reciprocating compressor method for diagnosing faults for improving the solution of ball vector machine closure ball
US10127230B2 (en) * 2015-05-01 2018-11-13 Microsoft Technology Licensing, Llc Dynamic content suggestion in sparse traffic environment
US10509834B2 (en) 2015-06-05 2019-12-17 Apple Inc. Federated search results scoring
US10755032B2 (en) 2015-06-05 2020-08-25 Apple Inc. Indexing web pages with deep links
US10592572B2 (en) 2015-06-05 2020-03-17 Apple Inc. Application view index and search
US10621189B2 (en) 2015-06-05 2020-04-14 Apple Inc. In-application history search
US10509833B2 (en) 2015-06-05 2019-12-17 Apple Inc. Proximity search scoring
US11356451B2 (en) * 2015-06-12 2022-06-07 Miblok/Sheryldene-Anne Valente Cube-based user interface for online community
US11803918B2 (en) 2015-07-07 2023-10-31 Oracle International Corporation System and method for identifying experts on arbitrary topics in an enterprise social network
US10142272B2 (en) 2015-11-17 2018-11-27 International Business Machines Corporation Presenting browser content based on an online community knowledge
KR20170082361A (en) 2016-01-06 2017-07-14 삼성전자주식회사 Display apparatus and control method thereof
CN110378731B (en) * 2016-04-29 2021-04-20 腾讯科技(深圳)有限公司 Method, device, server and storage medium for acquiring user portrait
US10769535B2 (en) 2016-05-13 2020-09-08 Cognitive Scale, Inc. Ingestion pipeline for universal cognitive graph
CN106289369B (en) * 2016-07-17 2018-10-12 合肥赑歌数据科技有限公司 Environmental monitoring ability statistical system based on J2EE
US20180052864A1 (en) * 2016-08-16 2018-02-22 International Business Machines Corporation Facilitating the sharing of relevant content
US10839136B2 (en) * 2016-09-07 2020-11-17 Box, Inc. Generation of collateral object representations in collaboration environments
US10394832B2 (en) * 2016-10-24 2019-08-27 Google Llc Ranking search results documents
US10762140B2 (en) 2016-11-02 2020-09-01 Microsoft Technology Licensing, Llc Identifying content in a content management system relevant to content of a published electronic document
US10885026B2 (en) 2017-07-29 2021-01-05 Splunk Inc. Translating a natural language request to a domain-specific language request using templates
US11120344B2 (en) 2017-07-29 2021-09-14 Splunk Inc. Suggesting follow-up queries based on a follow-up recommendation machine learning model
US10565196B2 (en) 2017-07-29 2020-02-18 Splunk Inc. Determining a user-specific approach for disambiguation based on an interaction recommendation machine learning model
US10713269B2 (en) 2017-07-29 2020-07-14 Splunk Inc. Determining a presentation format for search results based on a presentation recommendation machine learning model
US11170016B2 (en) * 2017-07-29 2021-11-09 Splunk Inc. Navigating hierarchical components based on an expansion recommendation machine learning model
CA3081609C (en) * 2017-11-07 2023-12-05 Thomson Reuters Enterprise Centre Gmbh System and methods for concept aware searching
US10430466B2 (en) 2018-01-25 2019-10-01 International Business Machines Corporation Streamlining support dialogues via transitive relationships between different dialogues
US10885137B2 (en) 2018-03-28 2021-01-05 International Business Machines Corporation Identifying micro-editing experts within an appropriate network
US10831749B2 (en) 2018-05-03 2020-11-10 International Business Machines Corporation Expert discovery using user query navigation paths
US11126630B2 (en) * 2018-05-07 2021-09-21 Salesforce.Com, Inc. Ranking partial search query results based on implicit user interactions
US11636376B2 (en) 2018-06-03 2023-04-25 International Business Machines Corporation Active learning for concept disambiguation
RU2731658C2 (en) 2018-06-21 2020-09-07 Общество С Ограниченной Ответственностью "Яндекс" Method and system of selection for ranking search results using machine learning algorithm
RU2720905C2 (en) 2018-09-17 2020-05-14 Общество С Ограниченной Ответственностью "Яндекс" Method and system for expanding search queries in order to rank search results
RU2733481C2 (en) 2018-12-13 2020-10-01 Общество С Ограниченной Ответственностью "Яндекс" Method and system for generating feature for ranging document
CN109783608B (en) * 2018-12-20 2021-01-05 出门问问信息科技有限公司 Target hypothesis determination method and device, readable storage medium and electronic equipment
US11017048B2 (en) 2018-12-21 2021-05-25 Box, Inc. Synchronized content replication
RU2744029C1 (en) 2018-12-29 2021-03-02 Общество С Ограниченной Ответственностью "Яндекс" System and method of forming training set for machine learning algorithm
US11269872B1 (en) * 2019-07-31 2022-03-08 Splunk Inc. Intent-based natural language processing system
US20220027413A1 (en) * 2020-07-21 2022-01-27 Rivian Ip Holdings, Llc Inline search query refinement for navigation destination entry
CN115408196B (en) * 2022-10-31 2023-03-24 国网四川省电力公司电力科学研究院 High fault tolerance power grid fault diagnosis method and system

Citations (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867799A (en) * 1996-04-04 1999-02-02 Lang; Andrew K. Information system and method for filtering a massive flow of information entities to meet user information classification needs
US5890149A (en) * 1996-06-20 1999-03-30 Wisdomware, Inc. Organization training, coaching and indexing system
US5924105A (en) * 1997-01-27 1999-07-13 Michigan State University Method and product for determining salient features for use in information searching
US6016475A (en) * 1996-10-08 2000-01-18 The Regents Of The University Of Minnesota System, method, and article of manufacture for generating implicit ratings based on receiver operating curves
US6041311A (en) * 1995-06-30 2000-03-21 Microsoft Corporation Method and apparatus for item recommendation using automated collaborative filtering
US6049777A (en) * 1995-06-30 2000-04-11 Microsoft Corporation Computer-implemented collaborative filtering based method for recommending an item to a user
US6070133A (en) * 1997-07-21 2000-05-30 Battelle Memorial Institute Information retrieval system utilizing wavelet transform
US6112186A (en) * 1995-06-30 2000-08-29 Microsoft Corporation Distributed system for facilitating exchange of user information and opinion using automated collaborative filtering
US6240407B1 (en) * 1998-04-29 2001-05-29 International Business Machines Corp. Method and apparatus for creating an index in a database system
US20010018698A1 (en) * 1997-09-08 2001-08-30 Kanji Uchino Forum/message board
US6334124B1 (en) * 1997-10-06 2001-12-25 Ventro Corporation Techniques for improving index searches in a client-server environment
US20020078091A1 (en) * 2000-07-25 2002-06-20 Sonny Vu Automatic summarization of a document
US6418448B1 (en) * 1999-12-06 2002-07-09 Shyam Sundar Sarkar Method and apparatus for processing markup language specifications for data and metadata used inside multiple related internet documents to navigate, query and manipulate information from a plurality of object relational databases over the web
US20020103789A1 (en) * 2001-01-26 2002-08-01 Turnbull Donald R. Interface and system for providing persistent contextual relevance for commerce activities in a networked environment
US20020103698A1 (en) * 2000-10-31 2002-08-01 Christian Cantrell System and method for enabling user control of online advertising campaigns
US20020116421A1 (en) * 2001-02-17 2002-08-22 Fox Harold L. Method and system for page-like display, formating and processing of computer generated information on networked computers
US6493703B1 (en) * 1999-05-11 2002-12-10 Prophet Financial Systems System and method for implementing intelligent online community message board
US20020199009A1 (en) * 2001-06-22 2002-12-26 Willner Barry E. Method and apparatus for facilitating the providing of content
US6502091B1 (en) * 2000-02-23 2002-12-31 Hewlett-Packard Company Apparatus and method for discovering context groups and document categories by mining usage logs
US20030004996A1 (en) * 2001-06-29 2003-01-02 International Business Machines Corporation Method and system for spatial information retrieval for hyperlinked documents
US20030005053A1 (en) * 2001-06-29 2003-01-02 International Business Machines Corporation Method and system for collaborative web research
US6687696B2 (en) * 2000-07-26 2004-02-03 Recommind Inc. System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models
US20040039630A1 (en) * 2002-08-12 2004-02-26 Begole James M.A. Method and system for inferring and applying coordination patterns from individual work and communication activity
US20040088323A1 (en) * 2002-10-31 2004-05-06 International Business Machines Corporation System and method for evaluating information aggregates by visualizing associated categories
US20040088312A1 (en) * 2002-10-31 2004-05-06 International Business Machines Corporation System and method for determining community overlap
US20040088276A1 (en) * 2002-10-31 2004-05-06 International Business Machines Corporation System and method for analyzing usage patterns in information aggregates
US20040088322A1 (en) * 2002-10-31 2004-05-06 International Business Machines Corporation System and method for determining connections between information aggregates
US20040088315A1 (en) * 2002-10-31 2004-05-06 International Business Machines Corporation System and method for determining membership of information aggregates
US20040088325A1 (en) * 2002-10-31 2004-05-06 International Business Machines Corporation System and method for building social networks based on activity around shared virtual objects
US20040117222A1 (en) * 2002-12-14 2004-06-17 International Business Machines Corporation System and method for evaluating information aggregates by generation of knowledge capital
US20040230572A1 (en) * 2001-06-22 2004-11-18 Nosa Omoigui System and method for semantic knowledge retrieval, management, capture, sharing, discovery, delivery and presentation
US20040263639A1 (en) * 2003-06-26 2004-12-30 Vladimir Sadovsky System and method for intelligent image acquisition
US6839680B1 (en) * 1999-09-30 2005-01-04 Fujitsu Limited Internet profiling
US6842877B2 (en) * 1998-12-18 2005-01-11 Tangis Corporation Contextual responses based on automated learning techniques
US20050060312A1 (en) * 2003-09-16 2005-03-17 Michael Curtiss Systems and methods for improving the ranking of news articles
US20050108630A1 (en) * 2003-11-19 2005-05-19 Wasson Mark D. Extraction of facts from text
US6938035B2 (en) * 2001-10-03 2005-08-30 International Business Machines Corporation Reduce database monitor workload by employing predictive query threshold
US6943877B2 (en) * 2003-06-30 2005-09-13 Emhart Glass S.A. Container inspection machine
US7031961B2 (en) * 1999-05-05 2006-04-18 Google, Inc. System and method for searching and recommending objects from a categorically organized information repository
US7093012B2 (en) * 2000-09-14 2006-08-15 Overture Services, Inc. System and method for enhancing crawling by extracting requests for webpages in an information flow
US7092936B1 (en) * 2001-08-22 2006-08-15 Oracle International Corporation System and method for search and recommendation based on usage mining
US20060200556A1 (en) * 2004-12-29 2006-09-07 Scott Brave Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge
US20060259344A1 (en) * 2002-08-19 2006-11-16 Choicestream, A Delaware Corporation Statistical personalized recommendation system
US7162473B2 (en) * 2003-06-26 2007-01-09 Microsoft Corporation Method and system for usage analyzer that determines user accessed sources, indexes data subsets, and associated metadata, processing implicit queries based on potential interest to users
US20070043609A1 (en) * 2005-07-18 2007-02-22 Razi Imam Automated systems for defining, implementing and tracking the sales cycle
US20070150515A1 (en) * 2005-12-27 2007-06-28 Scott Brave Method and apparatus for determining usefulness of a digital asset
US20070150646A1 (en) * 2005-12-28 2007-06-28 Chi-Weon Yoon Semiconductor memory device using pipelined-buffer programming and related method
US20070255735A1 (en) * 1999-08-03 2007-11-01 Taylor David C User-context-based search engine
US7343365B2 (en) * 2002-02-20 2008-03-11 Microsoft Corporation Computer system architecture for automatic context associations
US20090037355A1 (en) * 2004-12-29 2009-02-05 Scott Brave Method and Apparatus for Context-Based Content Recommendation

Family Cites Families (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1096038A (en) 1993-05-29 1994-12-07 田玉文 Porcelain-like decorative material
US6128613A (en) * 1997-06-26 2000-10-03 The Chinese University Of Hong Kong Method and apparatus for establishing topic word classes based on an entropy cost function to retrieve documents represented by the topic words
US5928105A (en) * 1998-06-26 1999-07-27 General Motors Corporation Planet carrier assembly with stationary washer members
US6510406B1 (en) * 1999-03-23 2003-01-21 Mathsoft, Inc. Inverse inference engine for high performance web search
US6711585B1 (en) * 1999-06-15 2004-03-23 Kanisa Inc. System and method for implementing a knowledge management system
AU5934900A (en) * 1999-07-16 2001-02-05 Agentarts, Inc. Methods and system for generating automated alternative content recommendations
US7181438B1 (en) * 1999-07-21 2007-02-20 Alberti Anemometer, Llc Database access system
US8352331B2 (en) * 2000-05-03 2013-01-08 Yahoo! Inc. Relationship discovery engine
AU2001274818A1 (en) 2000-05-25 2001-12-11 Symbionautics Corporation Simulating human intelligence in computers using natural language dialog
JP5105456B2 (en) 2000-05-30 2012-12-26 株式会社ホットリンク Distributed monitoring system that provides knowledge services
US20010049671A1 (en) * 2000-06-05 2001-12-06 Joerg Werner B. e-Stract: a process for knowledge-based retrieval of electronic information
US7062488B1 (en) 2000-08-30 2006-06-13 Richard Reisman Task/domain segmentation in applying feedback to command control
KR20000072482A (en) * 2000-09-06 2000-12-05 이재학 Internet searching system to be easy by user and method thereof
ATE321422T1 (en) * 2001-01-09 2006-04-15 Metabyte Networks Inc SYSTEM, METHOD AND SOFTWARE FOR PROVIDING TARGETED ADVERTISING THROUGH USER PROFILE DATA STRUCTURE BASED ON USER PREFERENCES
WO2002084590A1 (en) * 2001-04-11 2002-10-24 Applied Minds, Inc. Knowledge web
US7203909B1 (en) * 2002-04-04 2007-04-10 Microsoft Corporation System and methods for constructing personalized context-sensitive portal pages or views by analyzing patterns of users' information access activities
US8001567B2 (en) * 2002-05-02 2011-08-16 Microsoft Corporation Media planner
AU2003239385A1 (en) * 2002-05-10 2003-11-11 Richard R. Reisman Method and apparatus for browsing using multiple coordinated device
BRPI0407451A (en) 2003-02-14 2006-02-07 Nervana Inc System and method for knowledge retrieval, semantics, management and presentation
WO2004097664A2 (en) * 2003-05-01 2004-11-11 Axonwave Software Inc. A method and system for concept generation and management
US8589373B2 (en) * 2003-09-14 2013-11-19 Yaron Mayer System and method for improved searching on the internet or similar networks and especially improved MetaNews and/or improved automatically generated newspapers
US7401068B2 (en) * 2003-09-30 2008-07-15 International Business Machines Corporation Method, system, and storage medium for providing web-based electronic research and presentation functions via a document creation application
US20050209983A1 (en) * 2004-03-18 2005-09-22 Macpherson Deborah L Context driven topologies
US7260568B2 (en) * 2004-04-15 2007-08-21 Microsoft Corporation Verifying relevance between keywords and web site contents
EP1612731B1 (en) * 2004-06-11 2008-08-13 Saab Ab Computer modeling of physical scenes
JP2008507792A (en) * 2004-07-26 2008-03-13 パンセン インフォマティクス インコーポレイテッド A search engine that uses the background situation placed on the network
WO2006011819A1 (en) * 2004-07-30 2006-02-02 Eurekster, Inc. Adaptive search engine

Patent Citations (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6041311A (en) * 1995-06-30 2000-03-21 Microsoft Corporation Method and apparatus for item recommendation using automated collaborative filtering
US6049777A (en) * 1995-06-30 2000-04-11 Microsoft Corporation Computer-implemented collaborative filtering based method for recommending an item to a user
US6112186A (en) * 1995-06-30 2000-08-29 Microsoft Corporation Distributed system for facilitating exchange of user information and opinion using automated collaborative filtering
US5867799A (en) * 1996-04-04 1999-02-02 Lang; Andrew K. Information system and method for filtering a massive flow of information entities to meet user information classification needs
US5983214A (en) * 1996-04-04 1999-11-09 Lycos, Inc. System and method employing individual user content-based data and user collaborative feedback data to evaluate the content of an information entity in a large information communication network
US5890149A (en) * 1996-06-20 1999-03-30 Wisdomware, Inc. Organization training, coaching and indexing system
US6016475A (en) * 1996-10-08 2000-01-18 The Regents Of The University Of Minnesota System, method, and article of manufacture for generating implicit ratings based on receiver operating curves
US5924105A (en) * 1997-01-27 1999-07-13 Michigan State University Method and product for determining salient features for use in information searching
US6070133A (en) * 1997-07-21 2000-05-30 Battelle Memorial Institute Information retrieval system utilizing wavelet transform
US20010018698A1 (en) * 1997-09-08 2001-08-30 Kanji Uchino Forum/message board
US6334124B1 (en) * 1997-10-06 2001-12-25 Ventro Corporation Techniques for improving index searches in a client-server environment
US6240407B1 (en) * 1998-04-29 2001-05-29 International Business Machines Corp. Method and apparatus for creating an index in a database system
US6842877B2 (en) * 1998-12-18 2005-01-11 Tangis Corporation Contextual responses based on automated learning techniques
US7031961B2 (en) * 1999-05-05 2006-04-18 Google, Inc. System and method for searching and recommending objects from a categorically organized information repository
US6493703B1 (en) * 1999-05-11 2002-12-10 Prophet Financial Systems System and method for implementing intelligent online community message board
US20070255735A1 (en) * 1999-08-03 2007-11-01 Taylor David C User-context-based search engine
US6839680B1 (en) * 1999-09-30 2005-01-04 Fujitsu Limited Internet profiling
US6418448B1 (en) * 1999-12-06 2002-07-09 Shyam Sundar Sarkar Method and apparatus for processing markup language specifications for data and metadata used inside multiple related internet documents to navigate, query and manipulate information from a plurality of object relational databases over the web
US6502091B1 (en) * 2000-02-23 2002-12-31 Hewlett-Packard Company Apparatus and method for discovering context groups and document categories by mining usage logs
US20020078091A1 (en) * 2000-07-25 2002-06-20 Sonny Vu Automatic summarization of a document
US6687696B2 (en) * 2000-07-26 2004-02-03 Recommind Inc. System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models
US7093012B2 (en) * 2000-09-14 2006-08-15 Overture Services, Inc. System and method for enhancing crawling by extracting requests for webpages in an information flow
US20020103698A1 (en) * 2000-10-31 2002-08-01 Christian Cantrell System and method for enabling user control of online advertising campaigns
US20020103789A1 (en) * 2001-01-26 2002-08-01 Turnbull Donald R. Interface and system for providing persistent contextual relevance for commerce activities in a networked environment
US20020116421A1 (en) * 2001-02-17 2002-08-22 Fox Harold L. Method and system for page-like display, formating and processing of computer generated information on networked computers
US20020199009A1 (en) * 2001-06-22 2002-12-26 Willner Barry E. Method and apparatus for facilitating the providing of content
US20040230572A1 (en) * 2001-06-22 2004-11-18 Nosa Omoigui System and method for semantic knowledge retrieval, management, capture, sharing, discovery, delivery and presentation
US20030005053A1 (en) * 2001-06-29 2003-01-02 International Business Machines Corporation Method and system for collaborative web research
US20030004996A1 (en) * 2001-06-29 2003-01-02 International Business Machines Corporation Method and system for spatial information retrieval for hyperlinked documents
US20050165805A1 (en) * 2001-06-29 2005-07-28 International Business Machines Corporation Method and system for spatial information retrieval for hyperlinked documents
US7092936B1 (en) * 2001-08-22 2006-08-15 Oracle International Corporation System and method for search and recommendation based on usage mining
US6938035B2 (en) * 2001-10-03 2005-08-30 International Business Machines Corporation Reduce database monitor workload by employing predictive query threshold
US7343365B2 (en) * 2002-02-20 2008-03-11 Microsoft Corporation Computer system architecture for automatic context associations
US20040039630A1 (en) * 2002-08-12 2004-02-26 Begole James M.A. Method and system for inferring and applying coordination patterns from individual work and communication activity
US20060259344A1 (en) * 2002-08-19 2006-11-16 Choicestream, A Delaware Corporation Statistical personalized recommendation system
US20040088322A1 (en) * 2002-10-31 2004-05-06 International Business Machines Corporation System and method for determining connections between information aggregates
US20040088315A1 (en) * 2002-10-31 2004-05-06 International Business Machines Corporation System and method for determining membership of information aggregates
US20040088325A1 (en) * 2002-10-31 2004-05-06 International Business Machines Corporation System and method for building social networks based on activity around shared virtual objects
US20040088276A1 (en) * 2002-10-31 2004-05-06 International Business Machines Corporation System and method for analyzing usage patterns in information aggregates
US20040088312A1 (en) * 2002-10-31 2004-05-06 International Business Machines Corporation System and method for determining community overlap
US20040088323A1 (en) * 2002-10-31 2004-05-06 International Business Machines Corporation System and method for evaluating information aggregates by visualizing associated categories
US20040117222A1 (en) * 2002-12-14 2004-06-17 International Business Machines Corporation System and method for evaluating information aggregates by generation of knowledge capital
US7162473B2 (en) * 2003-06-26 2007-01-09 Microsoft Corporation Method and system for usage analyzer that determines user accessed sources, indexes data subsets, and associated metadata, processing implicit queries based on potential interest to users
US20040263639A1 (en) * 2003-06-26 2004-12-30 Vladimir Sadovsky System and method for intelligent image acquisition
US6943877B2 (en) * 2003-06-30 2005-09-13 Emhart Glass S.A. Container inspection machine
US20050060312A1 (en) * 2003-09-16 2005-03-17 Michael Curtiss Systems and methods for improving the ranking of news articles
US20050108630A1 (en) * 2003-11-19 2005-05-19 Wasson Mark D. Extraction of facts from text
US20060200556A1 (en) * 2004-12-29 2006-09-07 Scott Brave Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge
US20070150466A1 (en) * 2004-12-29 2007-06-28 Scott Brave Method and apparatus for suggesting/disambiguation query terms based upon usage patterns observed
US20080040314A1 (en) * 2004-12-29 2008-02-14 Scott Brave Method and Apparatus for Identifying, Extracting, Capturing, and Leveraging Expertise and Knowledge
US20090037355A1 (en) * 2004-12-29 2009-02-05 Scott Brave Method and Apparatus for Context-Based Content Recommendation
US20070043609A1 (en) * 2005-07-18 2007-02-22 Razi Imam Automated systems for defining, implementing and tracking the sales cycle
US20070150515A1 (en) * 2005-12-27 2007-06-28 Scott Brave Method and apparatus for determining usefulness of a digital asset
US20070150465A1 (en) * 2005-12-27 2007-06-28 Scott Brave Method and apparatus for determining expertise based upon observed usage patterns
US20070150470A1 (en) * 2005-12-27 2007-06-28 Scott Brave Method and apparatus for determining peer groups based upon observed usage patterns
US20070150646A1 (en) * 2005-12-28 2007-06-28 Chi-Weon Yoon Semiconductor memory device using pipelined-buffer programming and related method

Cited By (80)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110196340A1 (en) * 1997-08-13 2011-08-11 Boston Scientific Scimed, Inc. Loading and release of water-insoluble drugs
US7698270B2 (en) 2004-12-29 2010-04-13 Baynote, Inc. Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge
US20070150466A1 (en) * 2004-12-29 2007-06-28 Scott Brave Method and apparatus for suggesting/disambiguation query terms based upon usage patterns observed
US8601023B2 (en) 2004-12-29 2013-12-03 Baynote, Inc. Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge
US20080040314A1 (en) * 2004-12-29 2008-02-14 Scott Brave Method and Apparatus for Identifying, Extracting, Capturing, and Leveraging Expertise and Knowledge
US20060200556A1 (en) * 2004-12-29 2006-09-07 Scott Brave Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge
US20090037355A1 (en) * 2004-12-29 2009-02-05 Scott Brave Method and Apparatus for Context-Based Content Recommendation
US7702690B2 (en) 2004-12-29 2010-04-20 Baynote, Inc. Method and apparatus for suggesting/disambiguation query terms based upon usage patterns observed
US20100312771A1 (en) * 2005-04-25 2010-12-09 Microsoft Corporation Associating Information With An Electronic Document
US7693836B2 (en) 2005-12-27 2010-04-06 Baynote, Inc. Method and apparatus for determining peer groups based upon observed usage patterns
US20070150470A1 (en) * 2005-12-27 2007-06-28 Scott Brave Method and apparatus for determining peer groups based upon observed usage patterns
US20070150515A1 (en) * 2005-12-27 2007-06-28 Scott Brave Method and apparatus for determining usefulness of a digital asset
US7856446B2 (en) 2005-12-27 2010-12-21 Baynote, Inc. Method and apparatus for determining usefulness of a digital asset
US8597720B2 (en) 2007-01-21 2013-12-03 Hemoteq Ag Medical product for treating stenosis of body passages and for preventing threatening restenosis
US20080306935A1 (en) * 2007-06-11 2008-12-11 Microsoft Corporation Using joint communication and search data
US8150868B2 (en) 2007-06-11 2012-04-03 Microsoft Corporation Using joint communication and search data
US9192697B2 (en) 2007-07-03 2015-11-24 Hemoteq Ag Balloon catheter for treating stenosis of body passages and for preventing threatening restenosis
US20090150380A1 (en) * 2007-12-06 2009-06-11 Industrial Technology Research Institute System and method for processing social relation oriented service
US20090150390A1 (en) * 2007-12-11 2009-06-11 Atsuhisa Morimoto Data retrieving apparatus, data retrieving method and recording medium
US20090248661A1 (en) * 2008-03-28 2009-10-01 Microsoft Corporation Identifying relevant information sources from user activity
WO2009145914A1 (en) * 2008-05-31 2009-12-03 Searchme, Inc. Systems and methods for building, displaying, and sharing albums having links to documents
US20090313325A1 (en) * 2008-06-17 2009-12-17 Mobile Tribe Llc Distributed Technique for Cascaded Data Aggregation in Parallel Fashion
WO2009155293A1 (en) * 2008-06-17 2009-12-23 Mobile Tribe Llc Distributed technique for cascaded data aggregation in parallel fashion
US20090319397A1 (en) * 2008-06-19 2009-12-24 D-Link Systems, Inc. Virtual experience
US8751559B2 (en) 2008-09-16 2014-06-10 Microsoft Corporation Balanced routing of questions to experts
US9773043B2 (en) 2008-12-10 2017-09-26 Gartner, Inc. Implicit profile for use with recommendation engine and/or question router
US10817518B2 (en) 2008-12-10 2020-10-27 Gartner, Inc. Implicit profile for use with recommendation engine and/or question router
US20100145937A1 (en) * 2008-12-10 2010-06-10 Gartner, Inc. Interactive peer directory
US8244674B2 (en) * 2008-12-10 2012-08-14 Gartner, Inc. Interactive peer directory
US20100169148A1 (en) * 2008-12-31 2010-07-01 International Business Machines Corporation Interaction solutions for customer support
US9195739B2 (en) 2009-02-20 2015-11-24 Microsoft Technology Licensing, Llc Identifying a discussion topic based on user interest information
US20100228777A1 (en) * 2009-02-20 2010-09-09 Microsoft Corporation Identifying a Discussion Topic Based on User Interest Information
US20100228743A1 (en) * 2009-03-03 2010-09-09 Microsoft Corporation Domain-based ranking in document search
US9836538B2 (en) 2009-03-03 2017-12-05 Microsoft Technology Licensing, Llc Domain-based ranking in document search
US20100262612A1 (en) * 2009-04-09 2010-10-14 Microsoft Corporation Re-ranking top search results
US8661030B2 (en) * 2009-04-09 2014-02-25 Microsoft Corporation Re-ranking top search results
US20100262610A1 (en) * 2009-04-09 2010-10-14 International Business Machines Corporation Identifying Subject Matter Experts
US20100274790A1 (en) * 2009-04-22 2010-10-28 Palo Alto Research Center Incorporated System And Method For Implicit Tagging Of Documents Using Search Query Data
US11278648B2 (en) 2009-07-10 2022-03-22 Boston Scientific Scimed, Inc. Use of nanocrystals for drug delivery from a balloon
US10369256B2 (en) 2009-07-10 2019-08-06 Boston Scientific Scimed, Inc. Use of nanocrystals for drug delivery from a balloon
US20110015664A1 (en) * 2009-07-17 2011-01-20 Boston Scientific Scimed, Inc. Nucleation of Drug Delivery Balloons to Provide Improved Crystal Size and Density
US10080821B2 (en) 2009-07-17 2018-09-25 Boston Scientific Scimed, Inc. Nucleation of drug delivery balloons to provide improved crystal size and density
US9477672B2 (en) 2009-12-02 2016-10-25 Gartner, Inc. Implicit profile for use with recommendation engine and/or question router
US8918391B2 (en) 2009-12-02 2014-12-23 Gartner, Inc. Interactive peer directory with question router
US20110160645A1 (en) * 2009-12-31 2011-06-30 Boston Scientific Scimed, Inc. Cryo Activated Drug Delivery and Cutting Balloons
US20110179002A1 (en) * 2010-01-19 2011-07-21 Dell Products L.P. System and Method for a Vector-Space Search Engine
US8661034B2 (en) 2010-02-03 2014-02-25 Gartner, Inc. Bimodal recommendation engine for recommending items and peers
US10102278B2 (en) 2010-02-03 2018-10-16 Gartner, Inc. Methods and systems for modifying a user profile for a recommendation algorithm and making recommendations based on user interactions with items
US8650172B2 (en) * 2010-03-01 2014-02-11 Microsoft Corporation Searchable web site discovery and recommendation
US20110213761A1 (en) * 2010-03-01 2011-09-01 Microsoft Corporation Searchable web site discovery and recommendation
US8972397B2 (en) 2010-03-11 2015-03-03 Microsoft Corporation Auto-detection of historical search context
US20110225192A1 (en) * 2010-03-11 2011-09-15 Imig Scott K Auto-detection of historical search context
US9785987B2 (en) 2010-04-22 2017-10-10 Microsoft Technology Licensing, Llc User interface for information presentation system
US20110307541A1 (en) * 2010-06-10 2011-12-15 Microsoft Corporation Server load balancing and draining in enhanced communication systems
US10628504B2 (en) 2010-07-30 2020-04-21 Microsoft Technology Licensing, Llc System of providing suggestions based on accessible and contextual information
US8889211B2 (en) 2010-09-02 2014-11-18 Boston Scientific Scimed, Inc. Coating process for drug delivery balloons using heat-induced rewrap memory
US9251185B2 (en) 2010-12-15 2016-02-02 Girish Kumar Classifying results of search queries
US20130185294A1 (en) * 2011-03-03 2013-07-18 Nec Corporation Recommender system, recommendation method, and program
US9569499B2 (en) * 2011-03-03 2017-02-14 Nec Corporation Method and apparatus for recommending content on the internet by evaluating users having similar preference tendencies
US20120317075A1 (en) * 2011-06-13 2012-12-13 Suresh Pasumarthi Synchronizing primary and secondary repositories
US8862543B2 (en) * 2011-06-13 2014-10-14 Business Objects Software Limited Synchronizing primary and secondary repositories
US20130024939A1 (en) * 2011-07-19 2013-01-24 Gerrity Daniel A Conditional security response using taint vector monitoring
US9465657B2 (en) 2011-07-19 2016-10-11 Elwha Llc Entitlement vector for library usage in managing resource allocation and scheduling based on usage and priority
US9460290B2 (en) * 2011-07-19 2016-10-04 Elwha Llc Conditional security response using taint vector monitoring
US9558034B2 (en) 2011-07-19 2017-01-31 Elwha Llc Entitlement vector for managing resource allocation
US9443085B2 (en) 2011-07-19 2016-09-13 Elwha Llc Intrusion detection using taint accumulation
US8930714B2 (en) 2011-07-19 2015-01-06 Elwha Llc Encrypted memory
US9798873B2 (en) 2011-08-04 2017-10-24 Elwha Llc Processor operable to ensure code integrity
US9575903B2 (en) 2011-08-04 2017-02-21 Elwha Llc Security perimeter
US8669360B2 (en) 2011-08-05 2014-03-11 Boston Scientific Scimed, Inc. Methods of converting amorphous drug substance into crystalline form
US9056152B2 (en) 2011-08-25 2015-06-16 Boston Scientific Scimed, Inc. Medical device with crystalline drug coating
US9183279B2 (en) 2011-09-22 2015-11-10 International Business Machines Corporation Semantic questioning mechanism to enable analysis of information architectures
US9471373B2 (en) 2011-09-24 2016-10-18 Elwha Llc Entitlement vector for library usage in managing resource allocation and scheduling based on usage and priority
US9170843B2 (en) 2011-09-24 2015-10-27 Elwha Llc Data handling apparatus adapted for scheduling operations according to resource allocation based on entitlement
US8955111B2 (en) 2011-09-24 2015-02-10 Elwha Llc Instruction set adapted for security risk monitoring
US9098608B2 (en) 2011-10-28 2015-08-04 Elwha Llc Processor configured to allocate resources using an entitlement vector
US20130132433A1 (en) * 2011-11-22 2013-05-23 Yahoo! Inc. Method and system for categorizing web-search queries in semantically coherent topics
US9298918B2 (en) 2011-11-30 2016-03-29 Elwha Llc Taint injection and tracking
US10600011B2 (en) 2013-03-05 2020-03-24 Gartner, Inc. Methods and systems for improving engagement with a recommendation engine that recommends items, peers, and services
US9836765B2 (en) 2014-05-19 2017-12-05 Kibo Software, Inc. System and method for context-aware recommendation through user activity change detection

Also Published As

Publication number Publication date
US7702690B2 (en) 2010-04-20
WO2006071931A3 (en) 2007-07-19
US20060200556A1 (en) 2006-09-07
US20080040314A1 (en) 2008-02-14
US8601023B2 (en) 2013-12-03
EP1839210A2 (en) 2007-10-03
CN101137980B (en) 2016-01-20
EP1839210A4 (en) 2010-01-06
US20070150466A1 (en) 2007-06-28
WO2006071931A2 (en) 2006-07-06
CN101137980A (en) 2008-03-05
EP2581839A2 (en) 2013-04-17
US7698270B2 (en) 2010-04-13

Similar Documents

Publication Publication Date Title
US8601023B2 (en) Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge
US7546295B2 (en) Method and apparatus for determining expertise based upon observed usage patterns
Salehi et al. Personalized recommendation of learning material using sequential pattern mining and attribute based collaborative filtering
US20180225712A1 (en) Systems and methods for targeted advertising
US8661050B2 (en) Hybrid recommendation system
US10268641B1 (en) Search result ranking based on trust
KR101284875B1 (en) Systems and methods for analyzing a user's web history
US20080306934A1 (en) Using link structure for suggesting related queries
US20170011112A1 (en) Entity page generation and entity related searching
Chung et al. Business stakeholder analyzer: An experiment of classifying stakeholders on the Web
Gasparetti Modeling user interests from web browsing activities
Kundu et al. Formulation of a hybrid expertise retrieval system in community question answering services
Herder Forward, back and home again: analyzing user behavior on the web
Serdyukov et al. Being omnipresent to be almighty: The importance of the global web evidence for organizational expert finding
Senthilkumar et al. Collaborative search engine for enhancing personalized user search based on domain knowledge
Vijaya et al. Metasearch engine: a technology for information extraction in knowledge computing
Serdyukov Search for expertise: going beyond direct evidence
Hu et al. A personalised search approach for web service recommendation
Mali et al. Implementation of multiuser personal web crawler
Berntsen What is in a recommendation? The case of the bX Article Recommender
Rivera Organic Search Engine Optimization for Museum Websites in 2023: Strategies for Improved Online Visibility and Access
Kacem Personalized information retrieval based on time-sensitive user profile
Geiger et al. Current State of Personalized Task Recommendation
Ament-Gjevick Using Web Analytics—Archival Websites
Tachau Analysis of Three Personalized Search Tools in Relation to Information Search: iGoogle, LeapTag, and Yahoo! MyWeb

Legal Events

Date Code Title Description
AS Assignment

Owner name: BAYNOTE, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BRAVE, SCOTT;BRADSHAW, ROBERT;JIA, JACK;AND OTHERS;REEL/FRAME:020714/0054

Effective date: 20071012

AS Assignment

Owner name: GLENN PATENT GROUP, CALIFORNIA

Free format text: LIEN;ASSIGNOR:BAYNOTE, INC.;REEL/FRAME:022835/0293

Effective date: 20090617

Owner name: GLENN PATENT GROUP,CALIFORNIA

Free format text: LIEN;ASSIGNOR:BAYNOTE, INC.;REEL/FRAME:022835/0293

Effective date: 20090617

AS Assignment

Owner name: BAYNOTE, INC., CALIFORNIA

Free format text: LIEN RELEASE;ASSIGNOR:GLENN PATENT GROUP;REEL/FRAME:022846/0020

Effective date: 20090618

Owner name: BAYNOTE, INC.,CALIFORNIA

Free format text: LIEN RELEASE;ASSIGNOR:GLENN PATENT GROUP;REEL/FRAME:022846/0020

Effective date: 20090618

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: KIBO SOFTWARE, INC., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BAYNOTE, INC.;REEL/FRAME:041525/0572

Effective date: 20170308

AS Assignment

Owner name: MONETATE, INC., NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KIBO SOFTWARE, INC.;REEL/FRAME:061632/0605

Effective date: 20221027