The assembly of (or assembler ) language , often abbreviated asm , is a low-level programming language, where there is a corresponding very powerful (but often not one-to-one) between the language and machine code architecture instructions. Each assembly language is specific to a particular computer architecture. In contrast, most high-level programming languages ââare generally portable in some architectures but require interpretation or compilation. Assembly language can also be called symbolic machine code .
The assembly language is converted into machine code that can be executed by a utility program called as assembler . The conversion process is referred to as assembly , or assembling the source code. Assembly time is a computational step in which assemblers are run.
Assembly language uses mnemonic to represent every instruction or low level machine opcode, usually every architecture register, flag, etc. Many operations require one or more operands to form complete instructions. Most assemblers can take the expression of numbers, named constants, registers, and labels as operands. Thus, the programmer is freed from the tedious repetitive calculations. Depending on the architecture, these elements can also be combined for specific instructions or addressing modes using offsets or other data and fixed addresses. Many assemblers offer additional mechanisms to facilitate program development, to control assembly processes, and to aid debugging.
Video Assembly language
Terminology
- The macro assembler includes a macro construction facility so that assembly language text (parameterized) can be represented by a name, and the name can be used to insert text extended to another code.
- The cross assembler (see also cross compiler) is an assembler that runs on a computer or operating system (host system) from a different type of system where the generated code is executed (target system ). Cross-assembling facilitates the development of programs for systems that do not have the resources to support software development, such as embedded systems. In such cases, the generated object code must be transferred to the target system, either through read-only memory (ROM, EPROM, etc.) or data link using the exact bit-by-bit copy of the object code or text-based representation of the code it is, like the Motorola S-record or Intel HEX.
- High-level assemblers are programs that provide language abstractions more often associated with higher level languages, such as advanced control structures (IF/THEN/OTHERS, CASES, etc.) and high-level abstract data types , including structures/records, unions, classes, and groups.
- microassembler is a program that helps set up a microprogram, called firmware , to control the computer's low-level operation.
- meta-assembler is a term used in some circles for "a program that receives syntactic and semantic descriptions of the assembly language, and generates an assembler for that language."/i>
Maps Assembly language
Main concept
Assembler
The assembler program creates object code by translating a combination of mnemonics and syntax for operations and overcoming the modes into its numerical equivalents. This representation usually includes the operating code ("opcode") as well as bits and other control data. Assembler also computes constant expressions and completes symbolic names for memory locations and other entities. The use of symbolic references is a key feature of assemblers, storing boring calculations and manual address updates after program modifications. Most assemblers also include a macro facility for textual substitution - for example, to generate a series of general short instructions as inline instead of a subroutine called . .
Some assemblers can also perform some kind of special optimization of a simple instruction set. One concrete example of this is the ubiquitous assembler of x86 from various vendors. Most of them are capable of performing instructional jumps (long jumps replaced by short or relative jumps) in a number of tracks, on request. Others can even perform simple rearrangements or instruction insertions, such as some assemblers for RISC architectures that can help optimize the scheduling of plausible instructions to exploit CPU pipes as efficiently as possible.
Like early programming languages ââlike Fortran, Algol, Cobol and Lisp, assemblers have been available since the 1950s and the first generation of text-based computer interfaces. However, assemblers go first because they are much easier to write than compilers for high level languages. This is because every mnemonic along with the addressing and operand modes of the instruction is translated more directly to the numerical representation of certain instructions, without much context or analysis. There are also several classes of semi-automatic translators and semi-authors with properties similar to assembly language and high-level, with speedcode as one of the better known examples.
There may be multiple assemblers with different syntax for CPU architecture or specific instruction sets. For example, the instruction to add memory data to a register in an x86 family processor might add eax, [ebx]
, in the original Intel syntax , whereas this would be written addl (% ebx),% eax
in the syntax of AT & amp; < used by GNU Assembler. Despite the different looks, different syntactic forms generally produce the same numerical code engine, see further below. A single assembler may also have different modes to support variations in syntactic form as well as proper semantic interpretation (such as FASM-syntax, TASM-syntax, ideal mode etc., in the case of specialized x86 assembly programming).
Amount passed
There are two types of assemblers based on how many passes the required source (how many times the assembler reads the source) to generate the object file.
- One-pass assembler through the one-time source code. Any symbol used before it is defined will require "errata" at the end of the object code (or, at least, not earlier than the point at which the symbol is defined) that tells the linker or the loader to "return" and overwrites the abandoned placeholder in which the symbol undefined is used.
- Multi-pass assembly creates a table with all the symbols and their values ââin the first run, then use the table in the later path to generate the code.
In either case, the assembler must be able to specify the size of each instruction on the initial pass to calculate the address of the next symbol. This means that if the operating size refers to the specified operand then depending on the type or distance of the operand, the assembler will make a pessimistic forecast when first encountering the operation, and if necessary pad with one or more "non-operating" instructions later on or errata. In assembler with peephole optimization, addresses can be recalculated between passes to allow to replace pessimistic code with code tailored to the exact distance of the target.
The original reason for the use of a single pass assembly is the speed of the assembly - often the second track will require rewinding and re-reading the program source on a tape or rereading a stack of cards or perforated paper ribbons. Then a computer with much larger memory (especially disk storage), has the space to perform all necessary processes without the rereading. The advantage of a multi-pass assembler is that the absence of errata makes the linking process (or program load if the assembler instantly generates executable code) faster.
Example: in the code snippet following the pass assembler will be able to specify the reverse reference address BKWD when assembling the S2 statement, but will not be able to specify the reference address to front FWD when assembling branch statements S1 ; indeed FWD may not be undefined. The two-way assembly will determine both addresses in pass 1, so they will be known when generating the code in pass 2,
S1 B FWD A A... FWD EQU * A A... BKWD EQU * A A... S2 B BKWD
High-level assembly
Higher-level advanced assemblers provide language abstractions such as:
- High-level procedure and function declarations
- Advanced control structure (IF/THEN/ELSE, SWITCH)
- High-level abstract data types, including structures/records, unions, classes, and sets
- A sophisticated macro process (although it's available to ordinary assemblers since the late 1950s for the IBM 700 series and since the 1960s for IBM/360, among other machines)
- Object-oriented programming features such as classes, objects, abstractions, polymorphisms, and inheritance
See the Language design below for more details.
Assembly language
A program written in assembly language consists of a series of instructions for mnemonic processors and meta-statements (known as directives, pseudo-instructions and pseudo-ops), comments and data. The assembly language instructions usually consist of a mnemonic opcode followed by a list of data, arguments or parameters. This is translated by assembler into machine language instructions which can be loaded into memory and executed.
For example, the instructions below tell the x86/IA-32 processor to move the 8-bit value directly into the register. The binary code for this instruction is 10110 followed by the 3-bit identifier you are applying for. The identifier for the AL register is 000, so the following machine code contains the AL register with 01100001 data.
10110000 01100001
This binary computer code can be made more human readable by declaring it in hexadecimal as follows.
B0 61
Here, B0
means 'Move copies of the following values ââto AL' , and 61
is a hexadecimal representation of the value 01100001, which is 97 in decimal. Assembly language for the 8086 family provides MOv mnemonic (abbreviation of move ) for instructions like this, so the above machine code can be written as follows in assembly language, complete with commentary explanation if required, after semicolon. It's much easier to read and remember.
In some assembly languages, the same mnemonics as MOV can be used for families of related instructions for loading, copying and moving data, be it a direct value, a value in a register, or a memory location indicated by a value in a register. Other assemblers can use separate opcode mnemonics like L to "move memory to register", ST to "move list to memory", LR for "register move to register", MVI to "move operand directly to memory", etc.
The x86 opcode 10110000 ( B0
) copies the 8-bit value into the AL register, while 10110001 ( B1
/i> and 10110010 ( B2
) do so in DL . Examples of assembly languages ââfor this follow.
MOV syntax can also be more complex as shown in the following example.
In each case, mnemonic MOV is translated directly into the opcode in the range 88-8E, A0-A3, B0-B8, C6 or C7 by assembler, and the programmer should not know or remember which one.
Converting assembly language into machine code is an assembler job, and vice versa can be at least partially achieved by a disassembler. Unlike high level languages, there is a one-to-one correspondence between many simple assembly statements and machine language instructions. However, in some cases, the assembler can provide pseudoinstructions (essentially macros) that extend into several machine language instructions to provide normally required functionality. For example, for machines that do not have a "branch if greater or equal" instruction, the assembler can provide pseudoinstruction that extends to "set if less than" and "branch if zero (on the result of the specified instruction)". Most full-featured assemblers also provide rich macro languages ââ(discussed below) used by vendors and programmers to generate more complex code and data sequences.
Each computer architecture has its own machine language. The computer differs in the number and types of operations it supports, in different sizes and register numbers, and in the data representation in storage. While most general-purpose computers can perform essentially the same functions, the way they do it is different; appropriate assembly language reflects these differences.
Some mnemonic sets or assembly language syntax may exist for a set of instructions, usually used in different assembler programs. In this case, the most popular is usually supplied by the manufacturer and used in its documentation.
Design language
Basic elements
There is a great degree of diversity in the way the assembler authors categorize the statements and in the nomenclature they use. In particular, some describe anything other than a mnemonic machine or an extended mnemonic as a pseudo-op operation. A typical assembly language consists of 3 types of instruction statements that are used to define program operations:
- Mnemonik opcode
- Data definition â ⬠<â â¬
- The assembly command
Expanded mnemonic opnode and mnemonics
Instructions (statements) in assembly language are generally very simple, unlike those in high level languages. Generally, mnemonics is a symbolic name for an executable machine language instruction (an opcode), and at least one mnemonic opcode is specified for each machine language instruction. Each instruction usually consists of operations or opcode plus zero or more operand . Most instructions refer to a single value, or a pair of values. Operands can be direct (values ââencoded in the instruction itself), registers specified in instructions or implied, or data addresses that are elsewhere in storage. This is determined by the underlying processor architecture: assembler only reflects how this architecture works. Extended mnemonics is often used to specify combinations of opcodes with specific operands, for example, System/360 assemblers using B as an extended mnemonic for BC
with mask 15 and NOP
("NO OPERATION" - do nothing for one step) for id BC
with mask 0.
Extended mnemonics is often used to support the use of special instructions, often for the obscure purpose of the instruction name. For example, many CPUs do not have clear NOP instructions, but have instructions that can be used for that purpose. In 8086 CPU instructions xchg ax , ax
is used for nop
, with xchg ,
. Some disassemblers recognize this and will decode xchg axe , ax
instructions as nop
. Similarly, IBM assemblers for System/360 and System/370 use the mnemonic extension NOP
and NOPR
for BC
and BCR
with zero mask. For SPARC architecture, this is known as synthetic instructions .
Some assemblers also support simple built-in macro instructions that generate two or more machine instructions. For example, with some Z80 assemblers, the ld hl, bc
is recognized to generate ld l, c
followed by ld h , b
. This is sometimes known as pseudo-opcodes .
Mnemonics is a random symbol; in 1985 the IEEE published Standard 694 for a uniform set of mnemonics for use by all assemblers. Standards have been withdrawn since then.
Data referrals â ⬠<â â¬
There are instructions used to define data elements for storing data and variables. They define the data type, length and alignment of the data. This instruction can also determine whether data is available for external programs (programs are assembled separately) or only for programs where the data section is defined. Some assemblers classify this as pseudo-ops.
The assembly command
Assembly directives, also called pseudo-opcodes, pseudo-operations or pseudo-ops, are commands given to assemblers "directing them to perform operations other than assembling instructions.". Referrals affect how the assembler operates and "may affect object code, symbol tables, list files, and internal assembler value values." Sometimes the term pseudo-opcode is reserved for referrals that generate object code, such as those that generate data.
The names of pseudo-ops often begin with a point to distinguish them from machine instructions. Pseudo-ops can make program assemblies dependent on the input parameters by the programmer, so one program can be assembled in different ways, possibly for different applications. Alternatively, pseudo-op can be used to manipulate program presentations to make them easier to read and maintain. Another common use of pseudo-ops is to store the storage area for run-time data and optionally initialize its contents to a known value.
The symbolic assembler let the programmer associate the arbitrary name ( label or symbol ) with the memory location and the various constants. Typically, any constants and variables are named so that instructions can reference the location by name, thereby promoting self-documenting code. In the executable code, the name of each subroutine is associated with the entry point, so each call to the subroutine can use its name. Inside the subroutine, the destination GOTO is labeled. Some assemblers support local symbols that are lexically different from normal symbols (eg, Usage "10 $" as GOTO destinations).
Some assemblers, such as NASM, provide flexible symbol management, let programmers manage different namespaces, automatically compute offsets in the data structure, and assign labels referring to literal values ââor results from simple calculations performed by the assembler. Labels can also be used to initialize constants and variables with relocation addresses.
Assembly language, like most other computer languages, allows comments to be added to the source code of the program to be ignored during assembly. Wise comments are very important in assembly language programs, because the meaning and purpose of the binary machine instruction sequence can be difficult to determine. The "raw" assembly language (not commented) generated by the compiler or the disassemblers is quite difficult to read when changes need to be made.
Macros
Many assemblers support predefined macros , and others support a programmer-defined (and repeatedly redefined) macro that involves a series of text lines where variables and constants are embedded. This line of text sequence may include opcode or landing. Once a macro is defined, its name can be used instead of mnemonic. When an assembler processes such a statement, it replaces the statement with lines of text associated with the macro, then processes it as if it were in the source code file (including, in some assembler, the expansion of any macros present in the replacement text). Macros in this sense date for IBM autocoders of the 1950s.
In assembly language, the term "macro" represents a more comprehensive concept than in some other contexts, as in the C programming language, where the #defin directive is usually used to create a short one-line macro. Macro assembler instructions, such as macros in PL/I and some other languages, can be a long "program", executed by interpretation by the assembler during assembly.
Because macros can have a 'short' name but extend to some or indeed many lines of code, they can be used to make assembly language programs seem much shorter, requiring fewer lines of source code, as with higher level languages. They can also be used to add a higher level of structure to an assembly program, optionally introducing embedded debugging code through parameters and other similar features.
Macro assemblers often allow the macro to take parameters. Some assemblers include fairly sophisticated macro languages, which combine high-level language elements such as optional parameters, symbolic variables, conditionals, string manipulations, and arithmetic operations, all can be used during certain macro implementations, and allow macros to store context or exchange information. So macros may generate many assembly language instructions or data definitions, based on macro arguments. This can be used to generate image-style data structures or loops "uncontrolled", for example, or can generate an entire algorithm based on complex parameters. For example, a "sort" macro can accept complex key type specifications and generate code created for that specific key, requiring no run-time testing required for general procedures that interpret the specification. An organization that uses highly expanded assembly language using a macro suite can be considered working in a higher level language, since the programmer does not work with the lowest-level conceptual elements of the computer. Underlining this point, macros are used to implement early virtual machines in SNOBOL4 (1967), written in SNOBOL Implementation Language (SIL), assembly language for virtual machines, which are then targeted to physical machines by transpiling to the original assembler via macro assembler. This allows a high degree of portability for the moment.
Macros are used to customize large-scale software systems for specific customers in the mainframe era and are also used by customer personnel to meet their employer's needs by creating a special version of the manufacturer's operating system. This is done, for example, by system programmers working with the IBM Virtual Conversion/Monitoring System (VM/CMS) and with IBM's "real transaction processing" add-on, CICS Customer Control Information System and ACP/TPF, a financial system that started in 1970 and still runs many of the big computer reservation systems (CRS) and credit card systems today.
It is also possible to use only the macro processing capabilities of the assembler to produce code written in completely different languages, for example, to produce program versions in COBOL using a pure macro assembler program containing a COBOL code line inside the assembly timer instructing the assembler to generate arbitrary code. IBM OS/360 uses a macro to perform system generation. The user specifies the option by coding a series of assembler macros. This macro assemble generates a work stream for building the system, including job control language and utility control reports.
This is because, as was realized in the 1960s, the concept of "macro processing" does not depend on the concept of "assembly", the first in modern terms more word processing, text processing, rather than generating object code. The macro processing concept appears, and appears, in the C programming language, which supports "preprocessor instructions" to define variables, and make conditional tests of value. Note that unlike certain earlier macro processors in the assembler, preprocessor C is not Turing-complete because it does not have either loop capability or "go to", the latter allows the program to loop.
Despite the power of macro processing, it falls into not used in many high-level languages ââ(the main exception being C, C and PL/I) while remaining immutable for assemblers.
Substitution of macro parameters is strictly named: at the time of macro processing, the textual parameter value is renamed to its name. The most famous class of bugs generated is the use of the parameter itself is an expression and not a simple name when a macro author expects a name. Macro:
foo: macro a load a * b
the point is that the caller will give the variable name, and the "global" or b constant variable will be used to multiply "a". If foo is called with the a-c
parameter, macro expansion load a-c * b
occurs. To avoid possible ambiguities, macro processor users can insert formal parameters within the macro definition, or callers can create parentheses of input parameters.
Support for structured programming
Some assemblers have incorporated structured programming elements to encode the flow of execution. The earliest example of this approach is on the set of Concept-14 macros, originally proposed by Dr. Harlan Mills (March 1970), and implemented by Marvin Kessler in IBM's Federal Systems Division, which extends the S/360 macro assembler with IF/ELSE/ENDIF and similar control flow blocks. This is a way to reduce or eliminate the use of GOTO operations in assembly code, one of the main factors causing spaghetti code in assembly language. This approach was widely accepted in the early '80s (the last days of large-scale assembly language use).
The strange design is the "stream-oriented" assembler for the 8080/Z80 processor from Whitesmiths Ltd. (developer of the Unix-like Idris operating system, and what was reported as commercial commercial C first). The language is classified as an assembler, since it works with crude machine elements such as opcode, register, and memory reference; but it does include a syntax expression to indicate the order of execution. The brackets and other special symbols, along with structured block oriented programming constructs, control the sequence of the resulting instructions. A-natural was built as the object language of the C compiler, not for hand coding, but its logical syntax won several fans.
There has been little demand for more sophisticated assemblers since the decline of large-scale assembly language development. Nevertheless, they are still developed and applied in cases where resource constraints or specificities in the target system architecture prevent the effective use of high-level languages.
Assemblers with powerful macro engines allow structured programming via macros, such as a macro switch provided with Masm32 packets (note this code is a complete program):
Use of assembly language
Historical Perspective
Assembly language, and the use of the word assembly , the date for the introduction of the stored program computer. The first assembly language was developed in 1947 by Kathleen Booth for ARC2 at Birkbeck, University of London after working with John von Neumann and Herman Goldstine at the Institute for Advanced Study. The Electronic Auto Suspendance Delivery Calculator (EDSAC) has an assembly called original order featuring a one-letter mnemonic in 1949. SOAP (Symbolic Optimal Assembly Program) is an assembly language for IBM 650 computers written by Stan Poley on year 1955.
Assembly language eliminates much of the first generation of error-prone, boring, and time-consuming programming with the earliest computers, freeing the programmer from boredom like remembering numeric codes and counting addresses. They were once used extensively for all types of programming. However, in the 1980s (1990s on microcomputers), its use has largely been superseded by higher level languages, in search of improved programming productivity. The current assembly language is still used for direct hardware manipulation, access to special processor instructions, or to address critical performance issues. Common uses are device drivers, low-level embedded systems, and real-time systems.
Historically, many programs have been written entirely in assembly language. The Burroughs MCP (1961) was the first computer whose operating system was not fully developed in assembly language; it is written in the Executive Language System of Problem Oriented (ESPOL), an Algol dialect. Many commercial applications are written in assembly language as well, including a large number of IBM mainframe software written by large corporations. COBOL, FORTRAN and some PL/I ultimately moved much of this work, although a number of large organizations retained the application infrastructure of languages ââuntil the 1990s.
Most early microcomputers rely on hand-assembled language assemblies, including most major operating systems and applications. This is because these systems have severe resource constraints, impose idiosyncratic memory and display the architecture, and provide limited buggy system services. Perhaps more important is the lack of high-end first-class language compilers suitable for microcomputer usage. Psychological factors may also play a role: the first generation of micro programmers retain the hobby, "cable and pliers" attitude.
In a more commercial context, the biggest reasons for using assembly language are minimal bloat (size), minimal overhead, greater speed, and reliability.
Common examples of major assembly language programs from now on are IBM PC DOS operating system, Pascal Turbo compiler and initial applications such as Lotus 1-2-3 spreadsheet programs. According to some industry insiders, assembly language is the best computer language used to get the best performance from Sega Saturn, a well known console challenging to develop and program games. The 1993 arcade game NBA Jam is another example.
Assembly language has long been the main development language for many popular home computers in the 1980s and 1990s (such as MSX, Sinclair ZX Spectrum, Commodore 64, Commodore Amiga, and Atari ST). This is largely due to the interpretation of the BASIC dialect on this system offering insufficient execution speed, as well as insufficient facilities to make full use of the hardware available on this system. Some systems even have an integrated development environment (IDE) with highly advanced debugging and macro facilities. Some compilers are available for Radio Shack TRS-80 and its successors have the ability to combine inline assembly sources with high-level program reports. After the compilation, the built-in assembler generates inline machine code.
Current use
There is always debate about the usefulness and performance of assembly language relative to high level languages. The assembly language has the use of a particular niche where it matters; See below. In July 2017, the TIOBE index of the programming language popularity ranking assembly language at 11, ahead of Visual Basic, for example. Assembler can be used to optimize speed or optimize size. In the case of speed optimization, modern optimizer compilers are claimed to make high-level languages ââinto code that can run as fast as a hand-written assembly, even though a match example can be found. The complexity of modern processors and memory sub-systems makes effective optimization even more difficult for compilers, as well as assembly programmers. Additionally, improving processor performance means that most CPUs are idle almost all the time, with delays caused by predictable congestion such as cache misses, I/O operations and paging. This has made the speed of raw code execution to be no problem for many programmers.
There are some situations where a developer may choose to use assembly language:
- Self-executable compact size resolutions required to run without the help of run-time components or libraries associated with high-level languages; this is probably the most common situation. For example, firmware for phones, car fuel and ignition systems, air conditioning control systems, security systems, and sensors.
- Codes that must be directly interacted with hardware, such as in device drivers and interrupt handlers.
- In an embedded processor or DSP, a highly reputable interrupt requires the shortest number of cycles per interrupt, such as interruptions that occur 1000 or 10000 times per second.
- Programs that need to use processor-specific instructions are not implemented in the compiler. A common example is the bitwise rotation instruction at the core of many encryption algorithms, as well as questioning the parity of bytes or additional 4-bit carry.
- Programs that create functionality visualized for programs in higher level languages ââsuch as C. In higher level languages ââthis is sometimes assisted by intrinsic functions of the compiler that map directly to the mnemonic of the SIMD, but still produce one-to-one a special assembly conversion for a given vector processor.
- Programs that require extreme optimization, such as inner loops in the processor-intensive algorithm. Game programmers take advantage of hardware features in the system, allowing the game to run faster. Large scientific simulations also require highly optimized algorithms, e.g. linear algebra with BLAS or discrete cosine transforms (eg assembly version of SIMD from x264)
- Situations where there are no high-level languages, on new or custom processors, for example.
- Programs that require precise timing such as
- real-time programs such as simulations, flight navigation systems, and medical equipment. For example, in a fly by wire system, telemetry must be interpreted and acted upon within strict time limits. Such systems should eliminate unpredictable sources of delay, which can be made by (multiple) interpreted languages, automatic garbage collection, paging operations, or preemptive multitasking. However, some higher level languages ââincorporate run-time components and operating system interfaces that can introduce such delays. Selecting a lower level assembly or language for such a system gives the programmer greater visibility and control over processing details.
- The cryptographic algorithm should always take the same time to execute, preventing time attacks.
- Change and renew legacy code written for IBM mainframe computers.
- Situations in which complete control of the environment is required, in very high security situations where nothing can be considered normal.
- Computer viruses, bootloaders, device-specific drivers, or other items very close to hardware or low-level operating systems.
- Simulator instruction set for monitoring, tracing and debugging where additional costs are kept to a minimum
- Perform reverse-engineering and modify program files such as
- existing binaries that may or may not be originally written in a high-level language, for example when trying to recreate programs whose source code is unavailable or missing, or break the copy protection of proprietary software.
- Video games (also called ROM hacking), which is possible through several methods. The most widely used method is to change the program code at the assembly language level.
- Self-modifying code, whose assembly language fits well.
- Other games and software for the graphic calculator.
Source of the article : Wikipedia