Transcription of Genetic Code

The process of copying genetic information from one strand of the DNA into RNA is called transcription. Here also, the principle of complementarity governs the process of transcription. However, unlike in the process of replication, which once set in, the total DNA of an organism gets duplicated, in transcription only a segment of DNA and only one of the strands is copied into RNA. This necessitates defining the boundaries that would demarcate the region and the strand of DNA that would be transcribed.

Why both the strands are not copied during transcription has the simple answer. First, if both strands act a template, they would code for RNA molecule with different sequences, and in turn, if they code for proteins, the sequence of amino acids in the proteins would be different. Hence, one segment of the DNA would be coding for two different proteins, and this would complicate the genetic information transfer machinery. Second, if two RNA molecules are produced simultaneously, they would be complementary to each other; and would end up forming a double-stranded RNA. This would prevent translation of RNA into protein.

Transcription Unit

A transcription unit in DNA is defined primarily by the three regions in the DNA:

A Promoter
The Structural gene
A Terminator

The strand that has the polarity 3' → 5' acts as a template, and is called the template strand. The other strand which has the polarity (5'→ 3') and the sequence same as RNA, is displaced during transcription. Strangely, this strand (which does not code for anything) is called coding strand. All the reference point while defining a transcription unit is made with coding strand. To explain the point, a hypothetical sequence from a transcription unit is represented below:

3'-ATGCATGCATGCATGCATGCATGC-5' Template Strand
5'-TACGTACGTACGTACGTACGTACG-3' Coding Strand

The promoter and terminator flank the structural gene in a transcription unit. The promoter is said to be located towards 5'-end (upstream) of the structural gene. The terminator is located towards 3'-end (downstream) of the coding strand and it usually defines the end of the process of transcription. There are additional regulatory sequences that may be present further upstream or downstream to the promoter.

Transcription Unit and the Gene

A gene is defined as the functional unit of inheritance. Though there is no ambiguity that the genes are located on the DNA, it is difficult to literally define a gene in terms of DNA sequence. The DNA sequence coding for tRNA or rRNA molecule also define a gene. However by defining a cistron as a segment of DNA coding for a polypeptide, the structural gene in a transcription unit could be said as monocistronic (mostly in eukaryotes) or polycistronic (mostly in prokaryotes). In eukaryotes, the monocistronic structural genes have interrupted coding sequences – the genes in eukaryotes are split.

Exons: The coding sequences or expressed sequences are defined as exons. Exons are said to be those sequence that appear in mature or processed RNA. The exons are interrupted by introns. Introns or intervening sequences do not appear in mature or processed RNA. The split-gene arrangement further complicates the definition of a gene in terms of a DNA segment.

Inheritance of a character is also affected by promoter and regulatory sequences of a structural gene. Hence, sometime the regulatory sequences are loosely defined as regulatory genes, even though these sequences do not code for any RNA or protein.

Types of RNA and the process of Transcription

In bacteria, there are three major types of RNAs: mRNA (messenger RNA), tRNA (transfer RNA), and rRNA (ribosomal RNA). All three RNAs are needed to synthesise a protein in a cell.

The mRNA provides the template.
The tRNA brings aminoacids and reads the genetic code.
The rRNAs play structural and catalytic role during translation.

There is single DNA-dependent RNA polymerase that catalyses transcription of all types of RNA in bacteria. RNA polymerase binds to promoter and initiates transcription (Initiation). It uses nucleoside triphosphates as substrate and polymerises in a template depended fashion following the rule of complementarity. It somehow also facilitates opening of the helix and continues elongation. Only a short stretch of RNA remains bound to the enzyme. Once the polymerases reache the terminator region, the nascent RNA falls off, so also the RNA polymerase. This results in termination of transcription.

The RNA polymerase is only capable of catalysing the process of elongation. It associates transiently with initiation-factor (s) and termination-factor (σ) to initiate and terminate the transcription, respectively. Association with these factors alters the specificity of the RNA polymerase to either initiate or terminate.

In bacteria, since the mRNA does not require any processing to become active, and also since transcription and translation take place in the same compartment (there is no separation of cytosol and nucleus in bacteria), many times the translation can begin much before the mRNA is fully transcribed. Consequently, the transcription and translation can be coupled in bacteria.

In eukaryotes, there are two additional complexities

There are at least three RNA polymerases in the nucleus (in addition to the RNA polymerase found in the organelles). There is a clear cut division of labour.

The RNA polymerase I transcribes rRNAs (28S, 18S, and 5.8S).
The RNA polymerase III is responsible for transcription of tRNA, 5srRNA, and snRNAs (small nuclear RNAs).
The RNA polymerase II transcribes precursor of mRNA, the heterogeneous nuclear RNA (hnRNA).

The second complexity is that the primary transcripts contain both the exons and the introns and are non-functional. Hence, it is subjected to a process called splicing where the introns are removed and exons are joined in a defined order. hnRNA undergo two additional processing called as capping and tailing. In capping an unusual nucleotide (methyl guanosine triphosphate) is added to the 5'-end of hnRNA. In tailing, adenylate residues (200-300) are added at 3'-end in a template independent manner. It is the fully processed hnRNA, now called mRNA, that is transported out of the nucleus for translation.

Genetic Code

George Gamow argued that since there are only 4 bases, if they have to code for 20 amino acids then the code should constitute a combination of bases. He suggested that in order to code for all the 20 amino acids, the code should be made up of three nucleotide. This meant that a permutation combination of 4³ would generate 64 codons. Thus, many more codons would be generated than required.

Based on the work of many scientists, a checker board for genetic code was prepared which is given in following table.

U	C	A	G
U	UUU Phe	UCU Ser	UAU Tyr	UGU Cys	U
	UUC Phe	UCC Ser	UAC Tyr	UGC Cys	C
	UUA Leu	UCA Ser	UAA Stop	UGA Stop	A
	UUG Leu	UCG Ser	UAG Stop	UGG Trp	G
C	CUU Leu	CCU Pro	CAU His	CGU Arg	U
	CUC Leu	CCC Pro	CAC His	CGC Arg	C
	CUA Leu	CCA Pro	CAA Gin	CGA Arg	A
	CUG Leu	CCG Pro	CAG Gin	CGG Arg	G
A	AUU Lle	ACU Thr	AAU Asn	AGU Ser	U
	AUC Lle	ACC Thr	AAC Asn	AGC Ser	C
	AUA Lle	ACA Thr	AAA Lys	AGA Arg	A
	AUG Met	ACG Thr	AAG Lys	AGG Arg	G
G	GUU Val	GCU Ala	GAU Asp	GGU Gly	U
	GUC Val	GCC Ala	GAC Asp	GGC Gly	C
	GUA Val	GCA Ala	GAA Glu	GGA Gly	A
	GUG Val	GCG Ala	GAG Glu	GGG Gly	G

The salient features of genetic code are as follows:

The codon is triplet. 61 codons code for amino acids and 3 codons do not code for any amino acids, hence they function as stop codons.
One codon codes for only one amino acid, hence, it is unambiguous and specific.
Some amino acids are coded by more than one codon, hence the code is degenerate.
The codon is read in mRNA in a contiguous fashion. There are no punctuations.
The code is nearly universal: for example, from bacteria to human UUU would code for Phenylalanine (phe). Some exceptions to this rule have been found in mitochondrial codons, and in some protozoans.
AUG has dual functions. It codes for Methionine (met), and it also act as initiator codon.

Mutations and Genetic Code

Effect of point mutations that inserts or deletes a base in structural gene can be better understood by following simple example.

Consider a statement that is made up of the following words each having three letters like genetic code.

RAM HAS RED CAP

If we insert a letter B in between HAS and RED and rearrange the statement, it would read as follows:

RAM HAS BRE DCA P

Similarly, if we now insert two letters at the same place, say BI'. Now it would read,

RAM HAS BIR EDC AP

Now we insert three letters together, say BIG, the statement would read

RAM HAS BIG RED CAP

The conclusion from the above exercise is very obvious. Insertion or deletion of one or two bases changes the reading frame from the point of insertion or deletion. Insertion or deletion of three or its multiple bases insert or delete one or multiple codon hence one or multiple amino acids, and reading frame remains unaltered from that point onwards. Such mutations are referred to as frame-shift insertion or deletion mutations. This forms the genetic basis of proof that codon is a triplet and it is read in a contiguous manner.

tRNA– the Adapter Molecule

tRNA reads the codon and binds to specific amino acid. tRNA has an anticodon loop that has bases complementary to the code, and it also has an amino acid accepter end to which it binds to amino acids. tRNAs are specific for each amino acid . For initiation, there is another specific tRNA that is referred to as initiator tRNA. There are no tRNAs for stop codons.