|
|
|
Project Description · Introduction · Running Time of the Project · Design Criteria · Text Encoding · Linguistic Annotation of Texts · Storage and Documentation of Texts · Corpus Retrieving Software · List of Text Distributors Introduction The Corpus of Professional English (CPE) is a major research project of PERC currently underway that, when finished, will consist of a 100-million-word computerized database of English used by professionals in science, engineering, technology and other fields. The CPE will be used for research as well as for the development of educational resources, such as specialized dictionaries, handbooks, language tests, and other materials that will be useful to working professionals and professionals-in-training. When complete, portions of the corpus will be made available to researchers, enabling them to retrieve various kinds of linguistic information via the Internet with our original search software. The software is programmed so as not to allow users to extract complete sample texts that might infringe on copyright laws. A minimal charge will be made for access to the CPE for general researchers to cover the running costs of the online search system. The publishers in the consortium clearly recognize the dangers inherent in electro-copying and are as concerned as you are that the CPE should not allow the abuse of copyrighted texts. A text sample, by being included in the CPE, loses none of the protection afforded by copyright law. The End User license strictly controls the use of the CPE and the text samples it contains. The right of reproduction of individual original text samples by any means is explicitly forbidden. None of the text samples in their original form will be incorporated into any product. Quotations from text samples will be strictly limited by the fair dealing provisions of copyright law. The uses to which the CPE will be put will typically include the following: Professional English research and the development of educational materials, such as specialized dictionaries and other educational resources which require accurate information about word meanings and usage, collocations, and other relevant linguistic data Running Time of the Project December 2001-December, 2003 (1st phase for science and technology texts) Design Criteria (1st phase) A) Monolingual B) Professional writing (academic standard of texts) C) Synchronic (1995-2001) D) Regional variety (AmE/ BrE/ etc.) E) Sample (50,000 words per text/ full text/ etc.) F) Selection criteria domains: science and technology including life science (based upon "Journal Citation Report") media: academic journals, trade magazines, textbooks, web pages, etc. Text Encoding The following information will be indicated by the mark-up: 1. Boundaries and parts of speech 2. Sentence structure identified by a POS tagger 3. Paragraphs, sections, headings and similar features in written texts 4. Meta-textual information about the source or encoding of individual texts The XML format will be adopted for text encoding. Linguistic Annotation of Texts The grammatical tagging of the text will be done in collaboration with Lancaster University (UCREL). Storage and Documentation of Texts Detailed descriptive information will be added to each text, in the form of a header: the author's name, title, publication year, journal title, etc. Corpus Retrieving Software Web-based multi-functional search software developed by Shogakukan Multimedia Department will be used. The software is also to be adoped for the online BNC search service and the online CobuildDirect search service administered by Shogakukan for Japanese users with authorization from the BNC and HarperCollins. List of Text Distributors as of September 30, 2004 Acta Biochimica Polonica/ Adis International Ltd./ Agricultural Economics Society/ A K Peters, Ltd./ American Association of Avian Pathologists/ American Association of Neurological Surgeons/ American Astronautical Society/ American College of Allergy, Asthma & Immunology/ American College of Veterinary Pathologists/ American Meteorological Society/ American Physiological Society/ American Public Health Association/ American Society for Biochemistry and Molecular Biology/ American Society for Pharmacology and Experimental Therapeutics/ Annals Publishing Company/ Bailliere Tindall/ CAB International HQ, UK/ Canadian Medical Association/ Canadian Meteorological and Oceanographic Society/ Cell Press/ CIC Edizioni Internazionali s.r.l./ CRC Press UK, Parthenon Pubishing/ CRYO Letters/ CSIRO Publishing/ Editio Cantor Verlag/ Editions Scientifiques et Medicales Elsevier/ Elsevier Science/ Elsevier Science Academic Press/ Elsevier Science BV/ Elsevier Science Churchill Livingstone/ Elsevier Science Inc./ Elsevier Science Ireland Ltd./ Elsevier Science London/ Elsevier Science Ltd./ Entomological Society of America/ Fund for the Replacement of Animals in Medical Experiments/ Georg Thieme Verlag.KG/ Harcourt Health Sciences/ Heron Publishing/ Hodder Arnold/ IEEE/ Institut Mittag-Leffler/ Institute of Mathematics, Polish Academy of Science/ International Statistical Institute/ International Union of Pure and Applied Chemistry/ IP Publishing Ltd/ Laser Institute of America/ Medscape/ Mineralogical Society of America/ National Research Council of Canada/ New Zealand Veterinary Association Inc./ Pharmacotherapy/ Project HOPE-The People-to-People Health Foundation, Inc./ Pulp and Paper Technical Association of Canada/ Psychology Press Ltd./ Society for Applied Spectroscopy/ Society for Nueroscience/ Society for Range Management/ Society of Glass Technology/ The American Ceramic Society/ The American Society of Plant Biologists/ The Biochemical Society,Portland Press/ The Company of Biologists Ltd./ The Current Science Association/ The European Respiratory Society/ The Geological Society/ The Histochemical Society/ The Instituion of Chemical Engineers/ The Journal of Bone & Joint Surgery/ The Lancet/ The Mayo Foundation for Medical Education and Research/ The Medical Letter, Inc./ The Mineralogical Society/ The National Association of Corrosion Engineers International/ The National Marine Fisheries Service/ The Rockefeller University Press/ The Royal College of Psychiatrists/ The Royal Society/ The Royal Society of Chemistry/ The Royal Society of New Zealand/ The Wildlife Society/ Urban & Fischer Verlag GmbH & Co.KG/ VSP/ W.B. Saunders |