JSTOR’s Text Analyzer: Moving Search Forward

Perhaps one of the most difficult parts of performing a search is coming up with a list of keywords/search terms that will produce the most relevant results. It is challenging to know the search terms to use when you probably don’t know much about the topic in the first place. Using the wrong terminology or keywords when searching is one of the primary factors leading to poor or irrelevant results.

Now there is a tool that introduces an entirely new way to search and that is by uploading a document.

JSTOR is a highly respected scholarly research database used universally by higher education institutions that allows students to search and retrieve full text articles spanning 75 disciplines. Business and Economics topic areas include: Business, Developmental Studies, Economics, Finance, Labor & Employment, Management & Organizational Behavior, and Marketing & Advertising. (It’s important to note that these are scholarly publications, not news, or trade-type publications.)

JSTOR Labs has recently introduced JSTOR Text Analyzer. I like Berkeley librarian Cody Hennesy’s description of it being “a reverse search engine.”

How it works:

  1. Upload a document with text in it. This can be anything: a paper you’re writing, an outline of a work in progress, an article you just downloaded, even a picture of a page from a book. JSTOR does not store or share the text.
  2. The tool analyzes the text within the document to find key topics and terms used, and then uses the ones it deems most important — the “prioritized terms” — to find similar content in JSTOR.
  3. Review the results.
  4. Adjust the results you’re seeing by adding, removing or adjusting the importance of the prioritized terms.

You can upload or point to many kinds of text documents and if the file type you’re using isn’t included, just cut and paste any amount of text into the search form to analyze it.

I have experimented with the tool several times by uploading my various blog posts and even parts of my book. The results, if not always 100% correct, are always interesting. As stated on JSTOR’s website, “Text Analyzer is still in beta and is, frankly, a machine. It’s not perfect.” Let me give you a couple examples.

When I uploaded a recent post about procurement and innovation and the use of patent and business start up data, the result is shown above. (No adjustments had yet been made.) To me, what is interesting, is that my post was viewed by a “fresh set of eyes,” and not in the context of my blog, and still disruptive innovationpatents, patent searches, start up firms, and procurement were deemed prioritized terms. One term that did not fit, but was humorously included, is Cephalopods, which was picked up because of a quote I used from Octopus Intelligence. I also have no idea why food service is included. You can see JSTOR’s result list on the right and you get an idea of the articles retrieved from the unaltered first set of terms.

Another post, in which I refer quite often to TPD (Third Party Data Providers,) the results came up with many terms associated with “Tissue Procurement,” which threw the focus of the article off a bit. Even with that, there still were valuable search terms revealed that I would not have thought to be associated with the article.

The fact that the results are still a little rough and not quite 100% correct should not be a deterrent for using this tool. To address a recommended topic that’s not what you’re looking for, JSTOR recommends you remove it from the prioritized term list. If you’re still not seeing what you’re looking for, add a few terms that are more on-point. Also, my entries were not lengthy and the more text within your document, the better.

This is an exciting addition to the growing number of tools being developed to help researchers find the best “on target” articles and documents by using analysis and machine learning to produce high levels of relevant results, which in my opinion, is the future direction of search.

