Live chat

Customer center

Our customer center allows you to contact your personal writer directly, upload any additional documents for your order, check order status, download a complete order, request a revision, extend the deadline, etc.

Your email:

Order number:

Highlights

View sample papers:

Additional Options

Essay writing, free sample essay topics, research papers

You are welcome to search the collection of free essays and term papers. Thousands of essay topics are available. Order unique, original custom papers from our essay writing service.

Sample essay topic, essay writing: Cryptography - 3964 words

NOTE: Essay you see on this page is free essay, available to anyone. We strongly do not recommend using any direct quotes from these essays for credit - you will most probably be caught for copying/pasting off the Internet, as it is very easy to trace where the essay has been taken from by a plagiarism detection program. You are welcome to use these samples for your research, but if you want to be sure that your essay is 100% original and one of a kind, we highly recommend to order a custom essay from us.

Tomorrow's Cryptography:Parallel Computation via Multiple Processors, VectorProcessing, and Multi-Cored ChipsEric C. Seidel, advisor Joseph N. Gregg PhDDecember 30, 2002Abstract. This paper summarizes my research during my independent study on cryptographyin the fal l term of 2002. Here I state the growing need for better cryptography, introduce con-sumer hardware architectures of near future, and identify the growing discrepancy betweenthe hardware on which current cryptographic standards were designed and the hardware thefuture consumer wil l be using. I note then the need for a new 'modern' cryptography basedon the presence of paral lel processing capabilities in forthcoming consumer machines and thelack of support of such capabilities in some current and al l legacy crypto algorithms.

I listapproaches used in past research to paral lelize cryptographic algorithms. I then summarizevarious current algorithms and potential implementation changes to ready them for tomor-row's machines. I conclude with some brief discussion of newer cryptographic algorithms,particularly AES and AES finalists and how they wil l fare on the machines of the ; 1Contents1 The future of crypto 32 Parallel crypto of to day 63 The imp ortance of data-level changes 84 Making data-level changes 94.1 Hashing Algorithms . .

Order custom essays brand-new and 100% original, tailored to your needs, price quote

. . . . .

. . . . . .

. . . . .

. . . . .

. . . . .

. . . 124.1.1 MD5 - Message Digest 5 . .

. . . . .

. . . . .

. . . . . .

. . . . .

. . 134.1.2 SHA-1, Secure Hash Algorithm - Revision 1 . . .

. . . . .

. . . . .

. 144.1.3 RIPEMD-160 . . . . .

. . . . .

. . . . .

. . . . . .

. . . . .

. . . . .

154.1.4 Tiger . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . .

. . . . 164.2 Block Cyphers (Secret Key Cryptography) .

. . . . .

. . . . .

. . . . .

. . . 174.2.1 DES - Data Encryption Standard . . .

. . . . . .

. . . . .

. . . . .

. 184.2.2 3DES - 'Tripple-Des' ................. 204.2.3 Serpent . . .

. . . . .

. . . . .

. . . . .

. . . . . .

. . . . .

. . . . .

214.2.4 Twofish . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . .

. . 224.2.5 Rijndael - the American Encryption Standard . . .

. . . . .

. . . . . 224.2.6 RC6 .

. . . . .

. . . . .

. . . . .

. . . . . .

. . . . .

. . . . .

. . . . 234.3 Public-Key Cryptography .

. . . . .

. . . . .

. . . . .

. . . . . .

. . . . .

. 244.3.1 RSA - Prime Factorization . . . .

. . . . .

. . . . .

. . . . . .

. . . . 265 Final Thoughts 2721 The future of cryptoFrom bank accounts, to medical records, personal emails, and more, increasingly more andmore sensitive data is stored digitally.

With the continued growth of the Internet, moreand more of this data resides in places which themselves may not be secure from intruders,and much of this data is transfered daily from place to place across connections inherentlyinsecure. To solve these problems of digital data security, we have cryptography. Most cryp-tography however has historically been used by governments, larger business and computergeeks and not by the average consumer. But, needs are shifting, and consumers are increas-ingly using encrypted emails, encrypted file systems, and soon smart cards: the most drasticchange to consumer crypto. Today, the most commonly seen form of crypto to the con-sumer is the DES1 encryption used between the customer's local ATM his/her bank, or the2SSL/TLS protocol securing the credit card transaction as they purchase over the internet.Now, and in the future, the consumer will be using his/her home computer, and eventuallyhis/her personal smart card, for more data security.

Both the load on the consumer's PCand the load on the central servers of the internet, in terms of cryptographic computationswill increase. This growth will demand a new level of cryptography, a consumer level cryp-tography, one designed to work on modern architectures, and one which will be fast, cheap,secure, and transparent.In addition to the stress caused on our cryptographic world by explosive usage, whenone looks at the computers in use today and those which will be in use in the future, andthen compares them both to the computers for which our current cryptographic standards1Data Encryption Standard2Secure Socket Layer/Transport Layer Security3were designed, one finds great discrepancies. This discrepancy will cause future computingstress due to inefficient cryptographic implementations on future hardware. Prior to thesolicitation for the American Encryption Standard (AES), the cryptographic world was basedon a single, 32, 16, or even 8 bit processor. But increasingly the computer world of today,and most definitely that of tomorrow, is not one of the 32-bit desktop, but rather one ofmulti-cored chips3 , multiple processor machines, and larger 64 or even 128-bit processors,many with a Vector Processing Unit (VPU)4 . This a large change in the machinery of theconsumer, and cryptography must be make ready for this change.It is important to note that parallel-ready cryptographic algorithms have a vast numberof uses, ranging from high-end, high-load servers, down to the averaged consumer's desktop,and many systems in between.

Currently, high end servers inefficiently perform cryptousing per-process or per-connection based parallelism5 , using algorithms not designed to runefficiently on their hardware. Provided algorithms designed (or modified) to run on newerhardware, the same servers could serve many more clients, or serve with the same efficiencya higher percentage of clients using encryption.Making crypto ready for tomorrow's architectures also allows greater speed of cryptog-raphy at the consumer's end. Already we are seeing interesting fast applications of AES such3An IntelTM technology. By placing multiple processor cores on the same piece of silicon Intel can dras-tically reduce the cost of having more than one processor. They reduce cost associated with the amountof silicon used and the cost of all the additional architecture (buses, memory, caches, etc.) associated witha completely separate processor.

Itanium (Intel's new 64-bit RISC processor) based, multi-cored chips arescheduled to ship by 2005.4In contrast to scalar processing, a Vector Processing Unit (VPU) works on 'vectors' of data, andperforms the same operation (add, multiply, AND, OR, etc.) over a set of processor words, just as would beperformed on a single processor word, except now multiple words are operated on all in a single cycle. Thismethod of parallelization is commonly referred to as Single Instruction, Multiple Data (SIMD) computing.5Described later in greater detail: each processor receives a single connection or process for which thatprocessor processes all instructions, including encryption and description for the connection.4as Apple Computer's encrypted disk image technology: an encrypted virtual disk that canbe read nearly as fast as unencrypted disk access due to an efficient AES implementationon their hardware6 . Such technologies will become the norm, not the exception. With fastenough cryptographic algorithms, all data written out (to storage media, networks etc) couldbe done so in an encrypted fashion. Currently the number of simultaneous secure connec-tions the average consumer has open at one time is small.

But, that number will increasewith things such as encrypted chat, secure video streaming, secure email, VPNs, and secureconnections to handheld computers, cell phones, and other devices. Until cryptographic al-gorithms are ready to be performed at great speeds on the computers of the future, many ofthese applications listed remain very slow and difficult if not impossible on today's systemswith current crypto algorithms.This leaves us then with an interesting problem. We have a world increasingly in needof greater crypto speed, and yet one which is at the same time radically changing hardwarearchitecture, and yet is still using 28 year old crypto algorithms. We are entering into a worldin which parallel ready software is a must, and it is thus time that our cryptography softwarebe brought into the 21st century. Some, perhaps much, of this shift to better, more flexiblecrypto has already begun by the influx of new algorithms via the AES solicitation.

Butrecognizing that not all systems will be as quick to change, and that often new cryptographicalgorithms take years, if not decades to be accepted, it is an important question to explore asto what if any changes, and amendments can we make to existing cryptographic standardsto bring them into the future.6I personally watched a technology demonstration in which a high data-rate movie was played directlyfrom such an encrypted disk.52 Parallel crypto of to dayThere are a number of approaches already made to adapt our aging cryptographic algorithmsto newer hardware, and it is those that I will first look at here [13]. I will classify methods ofparallelization based on at what 'level' of the computer system parallelism is applied. I willbe referring specifically to network cryptography, but analogous examples exist in single user(local) cryptography. The levels of classification I will use are: per-connection (per-process),per-packet (per-file), and inter-packet (data level) parallelism [14].A per-connection parallelism method is one whereby each connection to the server, orfrom the client, is given its own thread or process that runs exclusively on one processor. Thisis the most common method of parallelization, and requires no modification to the existingalgorithm and often no modification to the existing server software. By running multipleinstances of the server process one can often achieve this level of parallelism.

This method islimited in its ability to speed up a single connection, offering high-speed, single-connectioncryptography no benefits (i.e. writing/reading to/from local media). This level of parallelismalso does not address the question of running older algorithms on newer processors. Bysimply running a single algorithm instance per processor, this method makes no attempt tomake the old algorithm run any more efficiently on each newer processors than the algorithmdid on the previous architecture. This is the most common method of parallelism found incryptography today, and the one which is most often the selling angle for multi-processormachines which run cryptographic applications. The per-connection parallelization methodmakes no attempt to fully utilize modern architectures and will not be discussed further inthis paper.6Per-packet parallelism is a method in which connections disperse their packet process-ing load over multiple processors, wherein each packet is treated individually. One examplewould be a group of processors (or threads) handling the actual logic for all connections, thenplacing each prepared packet in a buffer, where it is handled (encrypted) by a group of pro-cessors.

In this design a single processor might handle packets from various connections, ora single connection might use packets encrypted by various processors. This is similar to thedesign of specialized cryptographic hardware, where the cryptography portion of an appli-cation is offloaded to (a) specialized processor(s). Many current algorithms lend themselveswell to this kind of parallelization, but surprisingly I encountered found no implementationsin software of this per-packet parallelism.Intra-packet parallelization is the most difficult type of parallelism to introduce post-facto, dependent heavily on algorithm design. This type of parallel processing is one whichhas historically not been addressed as much as other methods, but is part of the main focusof this paper. An example of this method would be a block-cypher such as DES runningin ECB (Electronic Code Book) mode, whereby each block of the message is computedindependent of the others and could be computed in parallel. This level of parallelizationrequires changes to the implementation of the cryptographic algorithm itself, depending nolonger on the flexibility of the hardware or operating system on which it is run. The variousmethods by which this type of parallelism can be achieved will be discussed below.73 The imp ortance of data-level changesMoving beyond the per-process, per-connections models, and down to the data levels allowus to fully exploit some of the growing technologies on the market today.

At least oneCPU - Motorola's G4 - already ships with a VPU (the AltiVec Engine), making vectorprocessing power available to the consumer. Intel has promised to begin shipping a multi-cored version of its new Itanium processor by the year 2005 and we will undoubtedly seeother parallel architectures continue to enter the consumer market. In order to fully exploitthese technologies, we can no longer depend on the flexibility of operating systems, or theseeming unending megahertz climb, but must reconsider our cryptographic algorithms toutilize these future architectures.I list here two of many important reasons which support a reassessment of cryptographyat this level:1. Many cryptographic algorithms are processor inefficient on modern hardware. (Oneexample is DES, which at most times uses only 4 or 6 bits of any processor register,when running on a 32-bit processor, that's only 12% - 16% efficiency! When runningon a 64-bit processor we see an efficiency of half that - ignoring any potential RISC7optimizations.2. Cryptographic algorithms in general make no accommodations for parallelization, thus7Reduced Instruction Set Computing - allows a processor to perform several small (4,8, or 16 bit) oper-ations in parallel in a single clock cycle.

This is in contrast to Complex Instruction Set Computing used oncurrent 32-bit Intel processors which only allows a single large (32-bit) operation per clock cycle. Instruc-tions which on a complex instruction set processor exist as single instruction, often require more than oneinstruction on a RISC processor. Those same sets of instructions on the RISC processor can often still beperformed over a single cycle in parallel. If the set of RISC instructions are dependent on one another, thanthey can be interleaved with other small (! 64 bit) operations in the pipeline.8neglecting possible gains on long term computations under multi-processor environ-ments.These reasons and more lead cryptographers to seek changes to our legacy algorithms at thedata level. It is how we go about making these changes that I will now explore.4 Making data-level changesParallelization at the data-level can allow algorithm speedup in the following ways:1.

By performing the same calculation on a larger amount of data. Performing the samecalculation on large amounts of data concurrently is the most common technique dis-cussed in this paper and is the technique used by Vector Processing Units and SIMDarchitectures. Multi-cored chips and true multiple processor architectures also can usethis type of parallelism. Utilizing the advantages in this type of computing is impor-tant for cryptography because it is these SIMD or VPU architectures which are themost common form of parallelism available on modern computers.2. By performing two distinct parts of a single algorithm at once. This is only possible intrue multiple processor environments, allowing multiple individual processors to handleseparate parts of an algorithm.

A common technique of this type is pipelining: sendingdata from one processor to the next down an assembly chain of sorts. Allowing thecomputation of n sequential steps of the algorithm in parallel over a single clock cycleon n processors. An example of this is to let each processor do a single cryptographicround on data passed to it from a high data-rate network stream. If each processor9is able to complete a single round of the cypher in time t, we can add n rounds ofencryption to our final cyphertext by adding n processors [4]. By doubling the numberof processors we can in effect double the security of the data stream with no effecton data-rate. Other techniques of this type often require specific algorithm designmodifications and introduce processor scheduling concerns, and therefore remain lesscommon.3.

By making a single complex calculation faster (e.g. BigNum exponentiation) - dis-tributing it's load over multiple processors. This is actually a layer below algorithmdesign, and depends on the implementations of the library from which the algorithm'simplementation draws. This is useful in areas of cryptography where math intensiveoperations are performed over large data sets. A good example of such a area is PublicKey Cryptography. Here math speed gains can be exploited from any VPU or set ofprocessors as long as one has the knowledge and/or the vendor supplied math librariesto take advantage of the parallel processing power.There are a couple common techniques and pitfalls in applying parallel processing tovarious cryptographic algorithms which I will mention here:. Hardware in Software - Sometimes when moving from a system designed for a singlesmaller processor to an architecture including larger processors (or parallelism of anyform) it is useful to take a step backwards before proceeding forwards.

Such waswork of Eli Biham, when he noticed that speed gains could be achieved for DES byimplementing the hardware (logic gate) version of DES in software running on 1-bitor larger processors. Biham noticed that by viewing a larger than 1 bit processor as10an array of 1-bit SIMD processors, and processing the algorithm according to the logicgate implementation substantial speed could be gained. This approach is commonlyreferred to as the 'BitSlice' implementation and is described in greater detail belowin reference to DES. BitSlice ideas also can have applications in other algorithms. SIMD on any processor - Another technique when moving from a smaller processor (orsingle processor) to a lager (or multiple) processor(s), is to view the larger processoras an array of SIMD processors the size of the original smaller processor.

This allowspacket-level (file-level) parallelization of an algorithm, by computing two or more in-stances of the same algorithm at the same time across multiple packets or files. Thisimplementation is only efficient under certain algorithmic design constraints and failsunder such circumstances as when value based lookups are necessary. In cases whereparts of the vector must be treated differently based on their 32-bit (or smaller) value,the implementation fails. Using lookups as an example, there are workarounds, butthose workarounds often loose much efficiency. Lookups could also be translated intomuch larger entire-vector based lookups, but the tables required for such would beenormous. This implementation also runs into similar difficulties with other opera-tions such as rotations, seeing as that rotations performed on larger words than theoriginal intended will require significant intra-word adjustments8 , 9 . This method of8On an n-bit processor emulating r q -bit processors where r .

q = n. This can be accomplished in atmost 5 operations, assuming rotation over n bits costs 1 operation and the mask words M and M -1 arepre-computed. Reduce the rotation to its smallest equivalent right rotation e, ROTATE the larger n bitword e places, AND the result R with a 'mask' word M consisting of r sets of e 1's followed by q - e 0's. ,ROTATE the result R n - e places to form R , AND R with the inverse mask M -1 to form U , OR U andR for your final result.9Without full knowledge of the bitwise implementation of PLUS and MULT operations, I must make this'SIMD on any processor' assertion with reservation. Without more exact knowledge I am unable to statethe cost of performing r simultaneous q bit MULT or PLUS operations on an n bit processor.

It may be11SIMD on any processor is very effective, but only in special cases and depends heavilyon the processor on which it's implemented. The problem of chaining - Many cryptographic algorithms, in order to achieve in-creased security, or simply by there fundamental design constraints (e.g. hashing),involve chaining of information from one cypherblock to the next, introducing 'recur-10sive dependancy' into the algorithm. This dependancy makes applying block levelparallelism to the algorithm impossible and will be seen in many algorithms.With these methods and my further comments to techniques and pitfalls in mind, I thenmade a systematic review of various algorithm types attempting to apply these principles toeach. The following are the results of my review:4.1 Hashing AlgorithmsHashing algorithms take in a large block of data (normally a file, or a network packet) andcompute a unique 'hash' value, much shorter than the original data. This 'hash' can bethen passed around with (or separate from) the original data, and be used to verify theintegrity of the data set. Hashing functions are often used in conjunction with Public KeyCryptography to produced 'signed hashes' - short secure representations of the larger data.The hash is first computed, and then 'signed' to prevent a man-in-the-middle from simplere-computing a hash for the altered data.

Signing only the hashes saves both parties frommuch greater than I have assumed here. RISC processor may also have no problems with these.10Lacking a better word, I will refer to the round-to-round and block-to-block dependancy of variousfunctions in various algorithms as 'recursive dependancy.' Hinting to the dependancy introduced by applyingthe function to the same (or parts of the same) data in a recursive fashion.12the enormous expense of signing, or verifying a signature over a large block of data, but stilloffers similar integrity and authenticity verification, due to the uniqueness of the hash.The speedup of hash functions is important to allow greater speed and security for thosewishing to hash and sign each packet, or when hashing extremely large chunks of data, orsets of chunks of data (e.g. storing and verifying hashes for all executables on a publicserver). Hash functions are already in general quite fast, but many do not support modernarchitectures well, and thus waste many unnecessary CPU cycles. Making hashing evenfaster on modern processors would open the doors to potentially more new uses, and makecurrent uses more convenient.4.1.1 MD5 - Message Digest 5MD5 is a 128-bit hash function operating on 32-bit words. MD5 was designed by RonaldR.

Rivest and is the successor to MD4 (also designed by Rivest). MD4 however is no longerconsidered useful for security after several successful collision attacks in years past and somewonder if MD5 isn't reaching the end of it's useful days[1]. MD5 involves a sequence of11XORs and 32-bit addition on 32-bit blocks of data producing a series of four 32-bitchaining variables. These chaining variables are carried through the entire hashing process,and compose the final 128-bit signature. MD5 like nearly all hash functions is based offof these chaining variables which introduce recursive dependancy and prevent any directparallel implementation.11I refer to the following common binary operations throughout:10 10 10AND 1 1 0 OR 1 1 1 XOR 1 0 1000 010 01013MD5 could potentially allow the SIMD implementation described in the previous section.This would allow packet-level (file-level) parallel computation on a single 64-bit or largerprocessor (grouping 32-bit data blocks as vectors of data blocks and performing the sameMD5 calculations based on those vectors). Computing MD5 over multiple buffers in parallelon an SIMD architecture supporting 32-bit adds, could theoretically cost exactly the sameas running a single buffer on the same n .

32 bit chip, and would thus offer an n fold speedupover a normal implementation on that same chip. MD5 allows for possible SIMD architecturepacket-level optimizations, but shows no promise for other parallelization techniques.4.1.2 SHA-1, Secure Hash Algorithm - Revision 1SHA-1 is 32-bit dependent 160-bit hash function operating on 32-bit words. SHA-1 wasdesigned by the NSA as the successor to SHA-0 which was replaced due to an undisclosedvulnerability resulting in a collision of the hash function at under 280 blocks12 . SHA-1 is quitepopular (used commonly in SSL and distributed on nearly every *nix13 distribution) and isregarded as very secure. SHA-1 provides a slightly more complex non-linear function f , anda larger sized hash than MD5.

SHA-1 relies on five 32-bit chaining variables, introducing arecursive dependancy. This recursive dependancy again preempts any attempt to process agroup of blocks from a single packet in parallel. SHA-1, like MD5, can be implemented in anSIMD fashion on 64-bit or larger chips to achieve packet level parallelism, but I know of no 64-bit native implementation or other multiprocessor or larger processor optimizations. SHA-112To the best of my knowledge, this vulnerability has still not been discovered. However SHA-0 is nolonger commonly in use.13*nix is used to denote any of a variety of UNIXTM like operating systems, including the BSDs, Solaris,Linux, and most recently Mac OS X.14was designed for software implementations on 32-bit little endian machines and is regardedas the fastest of the commonly used hash functions. SHA-1 like MD5 allows for possibleSIMD architecture packet-level optimizations, but shows no promise for other parallelizationtechniques.4.1.3 RIPEMD-160RIPEMD-160 is a 160-bit hash function operating on 32-bit words.

RIPEMD-160 was de-signed by the RIPE consortium as a more secure replacement for RIPEMD (a 128-bit hashfunction with similarities to MD4). The number of chaining variables (32 bits each), is in-creased from three to five from RIPEMD to RIPEMD-160 and the number of rounds foreach block from three to five [10]. Like nearly all hash functions RIPEMD-160 has block-to-block recursive dependance (provided by the chaining variables) preventing any attempt ofcomputing blocks in parallel. RIPEMD and RIPEMD-160 both however have some intrinsicparallelism, computing two halves of the each block in parallel. These two sets of 32-bitoperations could theoretically be done in parallel on a 64-bit processor viewing the 64-bitprocessor as a 2 x 32-bit SIMD processor. Processors and VPUs larger than 32 or 64 bitscould also use a packet-level parallelism as suggested for other hash algorithms, applying thesame algorithm to two or more blocks at once on a 128-bit or larger processor or VPU. I foundno mention of the number of gates required to implement RIPEMD in hardware, however itshould also be possible to implement an efficient BitSlice version of this depending on thenumber of gates required for RIPEMD-160.

RIPEMD, like SHA-1 and MD5 shows potentialfor packet-level SIMD optimizations, but also due to its minimal intra-round parallelismshould be more efficiently implemented as-is on 64 bit RISC architectures.154.1.4 TigerTiger is a 192-bit hash function operating on 64-bit words. Tiger was designed by RossAnderson and Eli Biham to work efficiently on 16, 32 and 64-bit processors. Tiger per-forms eight parallel 8 bit lookups for each round using 64-bit S-Boxes14 which take 8-bitlookup values, thus effecting every bit of the final word with each lookup. The results ofth ...

Research paper and essay writing, free essay topics, sample works Cryptography

Top

Essay help, free essay samples:

Kids Vs Kids, Exegetical Analysis Of Colossians, Of Mice And Men Diary Entry, Electronic Voting And What Should Be Done, Submarines, Sigmund Freud, The Myth Of Immortality Summary By Clarence Darrow, A Story, The Assassination Of Franz Ferdinand And The Start Of Wwi, Dance With The Music, Juevenile Delinquency, Santiagos Character, Cuban Revolution, For Whom The Bell Tolls: A Study Of Psychology, Ozone, and much more...

All rights reserved © 2004-2013 essaypride.com, links