MIA: Volunteers: Transcription

How to Transcribe


This document describes a three step procedure for scanning documents. The intention of this document is to give potential volunteers an idea of the time and effort it takes to publish a document online, and to help transcribers with planning a transcription project. These are simply the areas you need to cover, but split them up however you'd like and take them in an order that works for you. Keep in mind -- this is a time consuming process. The more thought and planning you give it, the more time you save! :)

Step 1: Scanning
  Labour Time: 10-30 seconds per page
  Requirements: Scanner (optional: with document feeder) & OCR software
A 500 page book, for example, can take around 4 hours to complete. Pace yourself! :) These times do not include OCR recognition, but all that can be done without you doing the work -- tell the OCR program to recognise the scanned materials while you pet your cat, have a cup of joe, and go out to your local union meeting. By the time you get back, depending on the size of the document versus the speed of your computer, it will be done. Now, if you are scanning a large book, some OCR programs allow you to train them. This is highly recommended -- do this for a few chapters, make corrections, and you will save yourself alot of time on proof reading.

Optional -- a document feeder: A document feeder on a scanner can make this process somewhat faster, but you do have the possibilites of jams. Scanning a 500 page book on a fast scanner with a document feeder can be as quick as a 2 hour process! To accomplish this, follow these steps:
A. Rip the cover off the book.
B. Books are bound together differently, typically, a book is bound into small packets. Thus, in a 500 page book, you may see 10 little packets. One by one, gently separate these from the glue that holds all the packets together.
C. Next, use scissors or a cutting board with each separate packet to separate the pages from each other.
D. When you take these to the scanner, put in no more than one packet at a time. Jams or the feeder taking two pages instead of one is fairly common, so shuffling through the papers, giving them so wear, will help keep this from happening. As frustrating as jams may be -- this process ends up being ALOT faster!

Step 2: HTML
  Labour Time: 1 minute per 4 pages
  Requirements: Text editor or HTML authoring program.
Please read the detailed instructions on HTML work. Here you'll place special tags on works quoted, emphasis (bold & italic), various headings (H1, H2, etc.), tables, and footnotes. The conversion to HTML on a 500 page document takes around 4 hours if you are doing it by hand. If you use the perl script we have, it will take about 5 minutes with the script, and maybe 30 minutes for some touching up. Some helpful tips are to use the find and replace feature as intelligently as you can, and learning things like regular expressions can be a huge time saver! It is healthy too -- save your hands from all that typing! :)

Step 3: Proofread
  Labour Time: 1-5 minutes per page
  Requirements: Spell checker, patience, patience, patience.
This is the most time consuming step. Alot of it is dependent on the quality of the scan, the OCR, the condition of the book, the type set of the book, and the content of the book. The most difficult scenario is when you are working with a very old book in poor condition (so the scanner picks up all kinds of marks -- which you can help to avoid by changing the threshold of the scanner!), the book is economics, and the typeset is old -- so ones are written as the letter I, and so on. Proofing a book like this could easily consume over 100 hours of work. Don't do this without help! :) If you are in such a situation, set a time for yourself that is acceptable, and work up to that time, then move on. In the best case scenario, with a great quality book, when you did a good job scanning at a good threshold and with training, and the content is fairly uniform -- proof-reading can go as quickly as an 8 hour day for a 500 page book.

Step 4: Publish & link
Finally, you get to write up a notice for the what's new page, and send a link into the archive maintainer so they can link to the book. It doesn't hurt to look over your document or book once again to make sure the html and everything looks good. After an hour or two, give a look to our Link reports and HTML errors page; an automated system that will let you know if you've made any mistakes.


Contact the Marxists Internet Archive Admin Committee for further information