Speech Recognition Technology

Back-End Speech Recognition Implementation Best Practices

Whitepaper & Toolkit

As our society continues to move toward healthcare’s vision for a national health information network, the need for digital methods of dictation/transcription grows, and the use of back-end speech recognition technology (SRT) continues to gain momentum in replacing traditional transcription. Within this process, we have seen how the role of healthcare documentation specialist has changed to that of SRT editor. In order to be an effective SRT editor, it takes the medical knowledge, training, and an eye for detail that HDSs already possess, but it also takes enhanced critical thinking skills, focus, and concentration to ensure a quality outcome. While variability of technologies and originator’s dictation habits impact the level of accuracy achieved through the process of voice to text, it will be the SRT editor’s ability to successfully interpret what the eyes see against what the ears hear and the brain knows that will produce desired results.

Back-end speech recognition (BESR) has been recognized by the medical community as “the technology” for creating healthcare documentation. BESR has been promoted as a technology tool to increase productivity, improve turnaround, decrease costs, and a provide a pleasing new production environment for the entire healthcare documentation team.

Access the Toolkit

Speech Recognition Technology

Overview, Editing, Compensation

Speech recognition technology (SRT), also known as automated speech recognition (ASR), continuous speech recognition (CSR) or voice recognition (VR), refers to computer software systems that convert the spoken word to text. This technology is becoming more and more prevalent in the healthcare field, as it is being marketed to institutions and physicians as a way to increase productivity and lower costs. Already many MTs are using SRT in their jobs and that trend will continue to increase in future.

There are two basic categories of SRT: front-end and back-end.

Front-End SRT

Front-end SRT includes such consumer applications as Dragon Naturally Speaking by ScanSoft, Via Voice by IBM, etc. Users of front-end SRT dictate into a PC microphone and the spoken words are converted to text in a word processing application in real time. This effectively eliminates the need for a transcriptionist. In order for front-end SRT to be as accurate as possible, a user must immediately correct the errors made by the software so the program will “learn” the nuances of the user’s speech patterns. Only a relatively few physicians use front-end SRT at this time, as it takes considerably longer to use front-end SRT effectively than it does to simply dictate.

Back-End SRT

The category of SRT used by large institutions and clinics is back-end SRT. With this method, the actual speech-to-text conversion takes place after the speaker has dictated, rather than concurrently. The dictation is recorded in digital form at the time of dictation, and then the digital voice files are processed by a powerful computer running SRT software and converted to a draft text document. A human speech recognition editor must then listen to the voice file while proofreading the draft document because even the most sophisticated SRT applications are not nearly accurate enough to eliminate the need for human review.

Speech Recognition Editing

Performing SRT editing successfully requires a somewhat more specialized skillset than that involved in traditional, manual transcription. There is a different eye/ear/brain coordination dynamic at work in SRT editing compared to transcribing, which often makes it more challenging to identify errors in an SRT-draft document. It is more common in SRT editing for the brain to be “tricked” into thinking that the eye has seen what the ear has heard, due to the lack of the tactile component of the fingers manipulating the keyboard to make the text characters appear on the screen. It is a common misconception that SRT editing is somehow easier than manual transcription or requires less knowledge and skill; this is most certainly not the case. It is true that the physical demands are somewhat decreased with SRT editing, as it does not require as much keyboarding, but the mental demands of SRT editing are greater, requiring more intense focus and concentration.

Compensation for SRT Editors

The methods and rates of compensation for SRT editors vary greatly, as is true for medical transcriptionists in general. Some employers pay an hourly rate, with or without benefits; others pay on production, by the line or character or some other unit of measure, again with or without benefits. Compensation for SRT editors working on production is often less per unit than for transcribing, with the assumption being that editing SRT is faster than manual transcription and therefore editors can process a greater number of lines, characters, etc., in the same amount of time. While this may be true to one degree or another, it is not unusual to find that the increase in productivity with SRT does not make up for the lower compensation rate, with the end result being that editing pays less than manual transcription. This is clearly inequitable, given that SRT editing is in some ways even more demanding than manual transcription.

MTs should evaluate SRT editing positions very carefully before signing a long-term contract, since it may not be known until after the fact whether or not the rate of compensation for an editing position is adequate. Again, for an editing position that pays on production, the rate per unit itself is not the key factor; it is the level of productivity you achieve that will determine your actual compensation for a given period of time. If the compensation for SRT editing is half the rate for manual transcription, an MT must be able to process at least twice as many lines, characters, etc., in a given period of time in order to receive an equivalent compensation. Whether or not this is a realistic goal depends a great deal upon the accuracy of the speech recognition application and the user-friendliness of the transcription software and whether or not it allows the use of macros, text expanders and spell checkers, etc.