“Check ’Em All!”: The Mathematics of Online Newspapers
Jill Ann Hurst • President, Hurst Associates, Ltd.
The smart-aleck response would be, “Go to a large structure outside the Harvard Book Co-Op in Cambridge, Massachusetts. You’ll find the walls covered with papers from everywhere on the planet.” Alas, travel budgets being what they are, you’ll have to do the next best thing: Go online. You might want to hit the Web, but searching individual newspaper Web sites, one at a time, downloading individual articles one after another, and enduring the World Wide Wait would make answering a “grab everything” request a mission impossible. Besides that, many — maybe most — newspaper Web sites do not provide lengthy archives. They may only load today’s paper or a week’s worth. Even for the archived sites, the sequence of searching would consume too much time.
So if you want
to search all the papers available online, you need to go to all
of the big commercial services, Dialog, LEXIS-NEXIS, and Dow Jones Interactive’s
Publication Library (DJI). One search on all three ought to give you everything
you can find online. Ah, but there’s a catch. How can you search the papers,
cover every paper available, but eliminate the substantial duplication
— and the amount of search charges connected to that duplication? All the
services are not equal; each has exclusives.
Searching ALL, and I Mean
With the caveat that nobody actually puts everything online, finding everybody’s view on the McCain-Bush race, the failed Mars probe, or Austrian right-wing politicians depends greatly on where you search. Which service do you start with? Where should you go next for the rest of the papers? How can you prevent duplicating search results in the second and third services? Do you want the latest news or information years old? Using two resources published by Hermograph Press — Net.Journal Directory and DialogWeb/FT — we have compiled a list of which service has what newspapers in full text. We have also created search strategies designed to simplify the task of searching for every full-text newspaper in these three services.
By the way, there are minor differences between the treatment of newspapers on the classic and Web versions of all the services. For example, three newspapers — The Times-Picayune (New Orleans), The Plain Dealer (Cleveland), and The Oregonian (Portland) — appear on DialogClassic, but not DialogWeb. For the sake of this exercise (actually, more like a tai-bo session), we will confine ourselves to the full range of periodicals on the classic versions with little fear of leading you wrong.
We must now distinguish between “full text” and “collections.” For the former, we want a file that comes directly from the publisher of the paper. On Dialog, one finds The Los Angeles Times in its own file (File 630); on LEXIS-NEXIS, as LAT in the NEWS library; on Dow Jones Interactive, you can search the Los Angeles Times individually or as part of a larger newspaper grouping (e.g., Top 50 U.S. Newspapers). But Dow Jones carries many more newspapers in its Publications Library than just the top 50. We define “collections” as files like Bell & Howell’s ProQuest Newsstand and Business Dateline, Gale Group’s Marketing & Advertising Reference Service (MARS) and Accounting & Tax Database, or sources such as the Gannett News Service, Responsive Database Services’ Business & Industry, and Dialog’s World Reporter. All these sources may contain full-text newspaper articles, but only for very selected coverage. They make no claims to comprehensive coverage, so we will not cover them in this article.
Getting all newspapers,
regardless of how many files and what kinds of files are on a service,
is pretty easy, as long as you don’t mind wading through a big gooey mess.
Oh, sure, all the newspapers on Dialog are in PAPERS and PAPERSNU. On LEXIS-NEXIS,
you can select the NEWS Library. The Dow Jones Interactive Publication
Library (hereinafter abbreviated DJI) only has one collection, of Top 50
papers; the rest you must select individually. Searching newspapers in
large merged files is easy at the search stage, but a mess when you have
to clean up the multiple search results, because many newspapers appear
in all or two of the services. So how about doing the search on one service
and then going to the others and NOT duplicating the effort? Unfortunately,
also a very messy option.
In Table 1 we list the newspapers found on each service. We’ve chosen only those newspapers listed as coming direct from the publisher in full text, not just selections from the newspapers based on an editor’s judgement call (such as the ill-defined “selected full text”).
For Dialog, 88 newspaper titles appear in the PAPERS and PAPERSNU OneSearch categories.
For DJI, we used the data that differentiates between abstract, full-text, and selected full-text resources. We pulled most titles out of their Top 50 Newspapers list and Regional News. This meant we also had to remove from listings of periodicals, such as local business and industry journals, and restrict coverage to general media newspapers. The final result for DJI was 106 titles.
For LEXIS-NEXIS we found some discrepancies in our listings depending on what resource we used to get the list. The online Source Locator used as a source for Net.Journal has dated information. Listings (one line per periodical) indicate the source of the data if it comes from a collection, but not the file name. We found listings obtained during an actual search that did not match the Source Locator listings exactly. The most accurate listings came from the PDF files downloaded from an insignificant link at the bottom of the Locator page. After much screaming and hollering, we wrestled up a list of 130 newspapers on LEXIS-NEXIS.
In some cases, more than one newspaper title appears in a file. For example, the Madison (WI) Capital Times and Wisconsin State Journals are separate files on LEXIS-NEXIS, but combined on the other services. For this experiment we consider them as one title.
Finally, it is rare that the services agree on how to label the newspapers. Is it the Daily News of Los Angeles, or the Los Angeles Daily News? How about Daily News (Los Angeles)? We chose whatever was convenient for us in making the table.
Only in a couple of outstanding exceptions did we allow other newspaper-like publications into this list. One could not justify leaving out The Wall Street Journal. And if it weren’t for the fact that some small, sharply focused papers (Deseret News, for example) actually made our criteria for inclusion (direct from the publisher), those papers would not have otherwise been considered, but there you have ’em.
There are a grand
total of 169 periodicals in which to check that John McCain story or any
other “check ’em all” request. For the statistically interested, just under
one-third (54 titles) appear on all three services. About 40 percent are
unique to just one service. Dialog and LEXIS-NEXIS’ unique offerings represent
about one-quarter of their holdings. The DJI library has fewer unique titles,
Searching, Searching ... Where
Since comprehensive coverage of newspaper archives online mandates using multiple systems, we assume that most searchers will start with the system they use most often or feel most comfortable with. We begin each example searching as many newspapers as possible in the system selected first, thereby maximizing the use of the preferred system. However, the least painful would be to search Dialog first, because Dialog has two OneSearch categories that cover all of its full-text newspapers, while excluding the partial collection files. LEXIS-NEXIS and Dow Jones do not have file groupings that cover all of the full-text newspapers on their systems not part of the collection files, though LEXIS-NEXIS comes much closer than Dow Jones. Unfortunately, LEXIS-NEXIS’ documentation does not carry consistent listings on the full-text newspapers in its system (as mentioned above), so the safest method to ensure that you reach all those newspapers is to create your own file grouping. This, of course, takes time to do. Dow Jones, in our experience, was the slowest with which to work, because of the time it takes to create a custom group of files in that service.
Beginning with Dialog
In Dialog, search using the OneSearch Categories PAPERS (58 papers) and PAPERSNU (29 papers) (see Table 2). Both categories cover newspapers in the U.S. Because Dialog only allows you to work with 60 files at a time, you must search each grouping separately.1
Note that several newspapers are no longer being updated (e.g., The Houston Post). Rather than expending extra energy to remove these, keep them in your search since they will do no harm.
In DialogWeb, you can use the OneSearch category when doing a Command Search. The Command Search allows you to use most of the commands found in DialogClassic. In the Guided Search facility, the U.S. newspapers are categorized into four geographic regions, with a fifth covering those from the rest of the world. This leaves searchers looking for complete coverage of all newspapers conducting multiple searches, typing in requests over and over, region by region.
Once you have completed your search in Dialog, you should search next in LEXIS-NEXIS. We are using LEXIS Universe via the Internet.
Select Search from the menu on the left side of the screen. On the Search menu, select Advance Search.
We will search only those newspapers that appear in LEXIS-NEXIS, but not in DialogClassic.
Select Source Directory.
You must select each source separately, using the menu available. It will take time to do. Using the Find command can speed up the process a little. Select the newspapers, not the special sections (e.g., obituaries), that you see listed.
You will select 68 newspapers. The Hartford Courant cannot be combined with other sources in LEXIS-NEXIS due to license restrictions, so you will have to search it separately (bringing the count to 69 newspapers being searched in LEXIS-NEXIS).
Note: If you save your source list for future use, you cannot add to that source list or modify it. So you must get it right the first time.
You may add this personalized list to the online Source List for future use in Advanced Search. (Do it now! Save yourself work later!)
Finally, run a search in Dow Jones Interactive.
Once signed on, select Publication Library.
Select Change Publications. Remove all publications and publication groups listed in the window on the right. Begin selecting in the window on the left the unique 13 publications on Dow Jones that you need to search. (As with LEXIS-NEXIS, this will take time, so be patient.)
Once completed, select Save List to save this custom list. You will be prompted to name the list (up to 25 characters, including spaces). Once you have done this, this self-created source grouping will be available to you whenever you search the Publications Library.2
You can now run
your search on these 13 publications.
Beginning with LEXIS-NEXIS
LEXIS-NEXIS contains 130 full text, active U.S. newspapers, 32 of which appear only in LEXIS-NEXIS (including all the back files for The New York Times beyond 90 days), not in Dialog or Dow Jones. To run the search, we are using LEXIS Universe via the Internet.
As we did in the first example, select Search, then select Advance Search. Select Source Directory. You must select each source separately, using the menu available. It will take time. Use the Find command to speed up the process a little. Select the newspapers, not the special sections (e.g., obituaries), that you see listed.
You will select 129 newspapers. The Hartford Courant cannot be combined with other sources in LEXIS-NEXIS due to license restrictions, so you will have to search it separately, bringing our total to 130 newspapers.
Once your list is created, you may add this personalized list to the online Source List for future use in Advanced Search. (Again, do it NOW!)
Next search Dow Jones Interactive’s unique 13 publications.
This will not include the Dialog titles. Why not search the newspapers found on Dow Jones Interactive that also appear on Dialog? Because, frankly, it take more time to search them in DJI than in Dialog. If you prefer to search more than the “uniques” in DJI, the technique described here will still work, just adjust the number of newspapers being searched.
Once signed on, select Publication Library. Select Change Publications. Remove all publications and publication groups listed in the window on the right. Begin selecting in the window on the left the 13 publications that you need to search.
Once completed, select Save List to save this custom list. You will be prompted to name the list (up to 25 characters, including spaces). Once you have done this, you can use this self-created source grouping whenever you search the Publications Library.
You can now run your search on these 13 newspapers.
Finally, search DialogClassic for the remaining 29 full-text, active U.S. newspapers.
In DialogClassic, use the Begin (B) command to create an ad hoc OneSearch category of the 29 newspapers unique to the system. The Begin command allows you to create an ad hoc OneSearch category quickly, so you will spend less time here than you did in LEXIS-NEXIS and Dow Jones. You may save this personalized list of Dialog databases for future use. To do so, use the ALIAS command. The ALIAS command allows you to create a short-cut, single term to signify a much longer command. In this case, you could create a one-word command (e.g., JPAPERS) to Begin a group of database. For example:
ALIAS JPAPERS B 576, 708, 744, 644, 684, 487, 645, 488, 489, 643,
536, 478, 721, 539, 702, 633, 486, 634, 701, 720, 788, 490, 723, 642.
Beginning with Dow Jones Interactive
Dow Jones Interactive (DJI) contains 106 full-text, active U.S. newspapers, 13 of which appear only in Dow Jones, not in LEXIS-NEXIS or Dialog. DJI in its Publications Library presents a variety of publication groupings, but does not present a grouping that only contains these active full-text newspapers. Therefore you will need to create your own list.
In the Publications Library, view the sources by title and select “Change Publications” to create your own source list. DJI allows you to create a source list that contains up to 50 publication groups or individual publications. So covering all their active, full-text newspapers will require you to create three source lists (no matter their geographic location).
Once in the Change Publications menu, remove all publications and publication groups listed in the window on the right. Begin selecting in the window on the left publications that you need to search (this will take time, so be patient).
Once you have a list created, click “Save List” towards the bottom of the screen. You will be prompted to name the list (up to 25 characters, including spaces). Do this two more times to include all the papers. Once you have done this, these self-created source groupings will be available to you whenever you search the Publications Library.
Once you have created your custom publication lists, you can run your search. Since you can only search one publications list at a time, you will need to run three separate searches.
Next go to LEXIS-NEXIS Universe for 31 newspapers.
Select Search, then select Advance Search. Select Source Directory. You must select each source separately, using the menu available. It will take time. Using the Find command can speed up the process a little. Select the newspapers, not the special sections (e.g., obituaries) that you see listed.
You will select 31 newspapers (not including those available in Dialog). You will have already searched the Hartford Courant in Dow Jones and so will not have to search it separately here.
Once your list is created, you may add it to the Source List for future use in Advanced Search.
After you have run your search here, go to DialogClassic to search the remaining 32 full text, active newspapers.
use the Begin (B) command to create an ad hoc OneSearch category of these
32 newspapers. Unlike working in LEXIS-NEXIS and Dow Jones Interactive,
you will be able to do this quickly. Run your search.
Caveats, Time Limits, and
Generally all full-text files include current issues (minus a short time-lag to upload the latest issues). All files are not equal in their start dates, however. Check out Table 3 below.
If your search focuses on current material — and that includes most of the 1990s — you are generally OK, no matter where you start your tri-service searching. But if you do deep background archival research, you may have to search on a service you don’t normally use, because your favorite online service doesn’t go back far enough. That, of course, will alter the above search strategies somewhat. Or, in desperation to reach any and every bit of newspaper coverage, you may reach for Business Dateline and other collections to insure absolutely complete coverage that may even include newspapers not used in this exercise.
Results were similar when we looked at non-U.S. newspapers, only in smaller quantities.
For example, if you want Australian news, you are pretty much limited to whatever DJI has. For nearby New Zealand, there is file 755 on Dialog, a file on LEXIS-NEXIS, and the material on DJI, all of which seem to cover the same newspapers and generally the same date ranges (mostly 1997 or 1998 to the present).
Except for the two main London papers, you basically have to go with the “collection” file content for the U.K. Bell & Howell’s ProQuest Newsstand and Dialog’s World Reporter cover the bulk of the rest on Dialog. Dow Jones and LEXIS-NEXIS go pretty much toe to toe in coverage. Except for the Times, Independent and Manchester Guardian, most only go back to 1997 or 1998.
A major file that we would classify as a “collection,” CANADIAN NEWSPAPERS, covers this country in full text on Dialog. Most of the other collection files actually stopped coverage in or around 1998.
newspapers can be found, if you are lucky, in World Reporter or ProQuest
Newsstand on Dialog. Dow Jones has made a special effort at tapping into
papers from smaller nations such as Egypt, Italy, and Jamaica. Nevertheless,
the pickings are a lot slimmer than what you’d find in that Harvard newspaper
is publisher at Hermograph Press and editor of Net.Journal Directory.
Jill Ann Hurst is president of Hurst Associates, Ltd. and author of Hermograph
1. In DialIndex, you can use the category PAPERSUS to search all U.S. newspapers as one group.
2. DJI allows you to create up to 25 publication lists.
1: System Coverage of Full-Text U.S. Newspapers
Key Guide: Bold means title found only on one service. Italics means title found on all three services.
|Table 2: OneSearch Categories PAPERS|
SYSTEM:OS - DIALOG OneSearch
File 146:Washington Post Online 1983-2000/Mar 04
(c) 2000 Washington Post
File 471:New York Times Fulltext-90 Day 2000/Mar 04
(c) 2000 The New York Times
File 489:The News-Sentinel 1991-2000/Mar 03
(c) 2000 Ft. Wayne Newspapers, Inc
File 490:Tallahassee Democrat 1993- 2000/Mar 03
(c) 2000 Tallahassee Democrat
File 492:Arizona Repub/Phoenix Gaz 1986-2000/Feb 06
(c) 2000 Phoenix Newspapers
File 494:St LouisPost-Dispatch 1988-2000/Mar 04
(c) 2000 St Louis Post-Dispatch
File 496:The Sacramento Bee 1988-1999/Jan 10
(c) 2000 Sacramento Bee
*File 496: This file will not be updated until further notice.
The database is current to Jan. 10, 2000.
Set Items Description
? SAVE TEMP
Set Items Description
data used in this article, especially in Table 1,
came from the following publications:
Net.Journal Directory is a semi-annual compilation of periodical archives on Web services such as DialogWeb, Dow Jones’ Publication Library (found on its Dow Jones Interactive, Wall Street Journal Interactive Edition, and other Factiva sites), LEXIS-NEXIS, various middling-sized Web services, and hundreds of freestanding Web sites. It lists not only the coverage dates but also the formats of the articles (text, text with graphics, PDF, RealPage, etc.) and cost per article. It only lists bona fide article archives, no advertising “brochures.”
Net.Journal Directory will be a Web service in the second quarter of 2000. This online version will be customizable to individual subscribers and available to library networks. Web-only users will pay a fee, but subscribers to the hard-copy editions will pay no charge for access.
DialogWeb/FT goes into great detail on the full-text periodical holdings of DialogClassic and DialogWeb, by year and file number and variations on the names. It also includes an illustrated tutorial on searching DialogWeb. Readers can compare file contents year by year, locating duplicate, alternative, and missing coverages and prices for articles.
|OUR FAVORITE TRIO|
Editor, Searcher Magazine
the commercial search services now offer flat-fee or subscription arrangements,
particularly to major clients. Almost all the subscription prices are negotiable
on a whole range of factors, including price — access to content, number
of users allowed, etc. The prices quoted in this article for LEXIS-NEXIS
(and other commercial services) reflect the retail price only. However,
even those who choose subscription options have to consider the official
price, since they will undoubtedly see it rise again when renewal time
rolls around in the form of vendor assessments of the value received by
the subscriber under the existing contract.
Nothing new about that. What does distinguish LEXIS-NEXIS from the other traditionals is the opportunity it offers for substantial savings.
“Savings! LEXIS-NEXIS!?,” I can hear some of you saying in shocked incredulity. Yes, you read me right. Admittedly, the cost savings depend on applying specific search techniques, which may not suit all searches, but the opportunity arises consistently enough to make it worth a professional searcher’s time to learn those techniques. Experienced LEXIS-NEXIS searchers have known about the technique for years, but “last gasp” Dow Jones Interactive/Factiva or Dialog searchers should probably take another look at LEXIS-NEXIS.
Basically, LEXIS-NEXIS’ standard transactional pricing model charges by the search and then adds a fixed, fairly high fee for every item printed or downloaded. In this article, the databases discussed cost $35 to search and $2.75 to print or download. The trick lies in the fact that, unlike the other commercial services, you don’t actually have to print or download results to get them on LEXIS-NEXIS. If you still use their proprietary dial-up software, you can conduct a search, turn on the “Session Record” capture function, and just page through all the results in whatever format you choose until you’ve gotten them all. All you will pay is the cost of the search, in this example, $35. Only if you order the system to print or download the items as a set will you pay the per-item fees.
Now this only works really well if you can put your search requests into one or two large statements that run against all the pertinent data sources. If your search request requires lots of back-and-forth interaction where the strategy evolves out of browsing results, or if LEXIS-NEXIS’ groupings do not tie together the sources you need, then this approach will not work. However, if you do know what you want to ask and the “library” carries all the material, try it out. By the way, you can still save the money and group results into sets convenient to the interest of one or more clients. LEXIS-NEXIS has a Focus command that lets you move through a set of results looking for specific terms, sort of a postsearch subsearch. So you could do one giant search statement for five topics and then display five sets of downloaded results.
What’s the catch? Carpal tunnel syndrome and a forefinger one digit shorter on one hand than the other. Oh yes, and the endless time you will spend plunking for pages. Sigh. On the other hand, if you charge by the hour, as information brokers often do, you’ll put a lot more money in your own pocket than in the vendor’s. Now, if the client wants it right away or you — like all of us — “have better things to do with your time than sit around here all day...,” well, then, LEXIS-NEXIS is not for you.
Suffice it to say, however, that the other day one searcher of my acquaintance got 128 trade press journal articles for her client for which she paid LEXIS-NEXIS $53 and the client paid her $97.50. Total price to the client? $150.50. Using Dow Jones Interactive’s standard $2.95 price for the same kind of material from the same general suppliers, those 128 articles would have cost $377.60. So the client still saved $227.10. On the other hand, the searcher may never play the piano again (“Next Page...Next Page...Next Page...Next...”), but — what the heck! — now she can afford a boom box.
So why haven’t
you seen LEXIS-NEXIS trumpeting this technique in the press and lording
it over their traditional competitors? Frankly, I don’t know, but I have
my suspicions. Could it be that they don’t want the database producers
to find out “how low they can go,” as the poet sayeth? Well, the cat’s
out of the bag now. “Hello, Dayton!! Send me a form.”