ATdt ********* CONNECT 1200 8p8p8p8p8p8p8p8p8p8p8p8p8p8p8p8p8p8p Login: XXXXX Password: Please wait.... Welcome to the Special Intelligence Government NETwork.... ****** ***** ***** ** * ****** ******** * * * * * * * ** ****** * * *** * * * ****** ** * * * * * * * * ** ****** ***** ***** * ** ****** ** [256/879/SIGNET001/1H6C6L] : P.S.T.N. Access logged, last used 05/FEB/1990 13:20 GMT. > dir ? unknown command 'DIR' > ls ? unknown command 'LS' > help HELP knows about : CRIS CCN STATUS IVAN CEDRIC PNC CAFS FTR CDIIIU CODA CODIN1 COP > help ftr . . FTR [ Free Text Retrieval System] [SIGNET/help/0065458inf/ftr] FTR makes searching for random items of information much faster than previous methods. A search taking half an hour is a very long time indeed by computer standards. The use of such time on a large system such as SIGNET is also costly. To make searching a database more efficient, more than one element may be indexed. FTR takes an alternative approach. Suppose, for example, that we want to perform a search on unstructered data like the text in this file. Free text retrieval can do that for us. A newpaper article - or thousands of other potential data sources like it - could be fed into a database, without predefining any structure or context for the data concerned, and every substantive word of every record would be indexed. So every occurence of any data item - whether it be in a newspaper report, a criminal records file, a report from an informant, the electoral register - can rapidly be located. The characteristic of free text storage is that there is no need to define in advance what data will be entered or to define any structure within which the given data will appear. Because every significant word (other than common words like 'the', 'of' or 'for') in the SIGNET FTR database is indexed unless the user chooses otherwise, a lot of extra space is required. Instead of, say, one 5 gigabyte disc store, we should probably need three, for the same amount of basic data stored. The SIGNET computers' processor also has to be larger, since as well as answering the terminal operators enquiries, it would have to maintain the many indexes, keeping them up to date as new data was entered, deleted, amended or moved around the storage system. For this reason, the extra expense of operating an FTR system can only be met by organisations - such as SIGNET - who expect many of their enquiries of the database to be of the unstructured, unpredictable kind. Another aspect of FTR is the ability to provide a dictionary, thesaurus or 'concordance' of equivalent or similar terms or phrases. Different people entering data into the system may use different terms or descriptions for the same attribute - for example, by describing eye colour variously as 'blue-grey', 'grey' or 'blue-green'; or light brown hair as 'fair' or just 'brown'. Such a dictionary system will also make an allowance for such things as phonetically equivalent or near-equivalent names - for example, by treating Smythe, Smith, Smiths and Schmitt as the same when searching the database. The SIGNET computer uses a particularly extensive system of this kind, called Soundex, when searching its criminal names or 'marked persons' indexes. When making an enquiry of the SIGNET FTR database the usual practice is to specify various words, names or attributes, and the ways in which they might occur together. The separate paragraphs of this text file form some of the many records in the SIGNET database which usesd FTR. An Operative arrives with news that a reliable informant has phoned to say that a man called Young and, of all people, a vicar or a priest, whose name is unknown, plan to murder a man known as Sandy. Typed on the VDU screen, the enquiry could look something like this : FIND : Young + [vicar,priest] + Sandy This is an instruction to the FTR software to look for any record which contains the name Young, refers to a vicar or a priest and to someone called Sandy. There is no point in looking at everybody called Young - there would be too many. But someone who is called Young and who is associated with a priest or a vicar and with a man called Sandy, might be a very good bet indeed. The SIGNET FTR system should search and reply within twelve seconds. Other FTR systems which can be accessed via SIGNET : STATUS - Met. Special Branch & 'C Department'. IVAN - Home Office (immigration service). CEDRIC - Customs and Excise. Also, of course, PNC, the Police National Computer. ADDENDUM : The power of computers to handle and analyse large quantities of personal data was - until recently - constrained by technical limitations on the absorption of information. Printed information, such as a magazine article, was not 'machine-readable'. Until recently this meant that a human operator had to enter information into the computer's memory store. Database operators can now feed a magazine, newspaper or ordinary typed report page by page into a scanner; the computer 'reads' the page using optical character recognition (OCR), no further typing is needed. End. . . > logout OK